5,649 Matching Annotations
  1. Sep 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In particular, theoretical analysis of the extant evidence and formulation of the hypothesis remains elusive in terms of the potential mechanisms of updating/maintaining balance in obesity

      We thank the reviewer for their feedback regarding the theoretical analysis and hypothesis formulation in our manuscript. We have attempted to build our hypothesis based on established correlations between dopamine levels and working memory capabilities, as seen in various populations affected by dopaminerelated changes (e.g. Parkinson’s disease (Fallon et al. 2017), older individuals (Podell et al., 2012), or more generally, in individuals with lower dopamine synthesis capacity (Colzato et al., 2013)). Our hypothesis — that individuals with higher BMI might show impaired updating — is an extrapolation from observed patterns in these conditions. We recognize that the evidence connecting obesity to similar neuropsychological profiles may seem preliminary. We have tried to elaborate more clearly on how we reached our hypotheses in the revised version of the introduction. 

      “Based on the above considerations these inconsistencies may be due to prior studies not clearly differentiating between distractor-resistant maintenance and updating in the context of working memory. This distinction may be crucial, however, as indirect evidence hints at potential specific alteration in these two sub-processes in obesity. For instance, obesity has been associated with aberrant dopamine transmission, with there being an abundance of literature linking obesity to changes in D2 receptor availability in the striatum (see e.g. Horstmann et al., 2015). However, results are not consensual, with studies reporting decreased, increased, or unchanged D2 receptor availability in obesity (Ribeiro et al., 2023; Janssen & Horstmann, 2022; see Darcey et al. (2023) for a potential explanation). Additionally, there are reports of differences in dopamine transporter (DAT) availability in both obese humans (Chen et al., 2008; but also see Pak et al., 2023) and rodents (Narayanaswami et al., 2013; Jones et al., 2021; Hamamah et al., 2023). The observed changes in dopamine are often interpreted as being due to chronic dopaminergic overstimulation resulting from overeating (Volkow & Wise, 2005; Volkow et al., 2008) and altered reward sensitivity as a consequence thereof (Blum et al., 1996). Considering that working memory gating is highly dependent on dopamine signaling, such changes could theoretically alter the balance between maintenance and updating processes in obesity. Next to this, obesity has frequently been associated with functional and structural changes in WM gating-related brain areas, implying another pathway through which working memory gating might get affected. At the level of the prefrontal cortex (PFC), studies have reported reduced gray matter volume and compromised white matter microstructure in individuals with obesity (Debette et al., 2014; Kullmann et al., 2016; Morys et al., 2024; Lv et al., 2024), and functional changes become evident with frequent reports of decreased activity in the dorsolateral PFC during tasks requiring cognitive control (e.g., Morys et al., 2018; Xu et al., 2017). Notably, Han et al. (2022) observed significantly lower spontaneous dlPFC activity during rest, potentially indicating reduced baseline dlPFC activity in obesity. On the level of the striatum, gray matter volume seems to correlate positively with measures of obesity (Horstmann et al., 2011), and individuals with obesity show greater activation of the dorsal striatum in response to high-calorie food stimuli compared to normal-weight individuals, indicating a stronger dopamine-dependent reward response to food cues (Stice et al., 2008; Small et al., 2003). Additionally, changes in connectivity between and within the striatum and PFC in obesity, both structurally (Li et al., 2023) and functionally (Verdejo-Román et al., 2017a, 2017b; Contreras-Rodríguez et al., 2017) have been reported. Although these studies mostly investigate brain function in relation to food and reward processing, changes in these areas may also impair the ability to adequately engage in working memory gating processes, as activity in affective (reward) and cognitive fronto-striatal loops immensely overlap (Janssen et al., 2019). On the behavioral level, individuals with obesity consistently demonstrate impairments in food-specific (Janssen et al., 2017) but also non-food specific goal-directed behavioral control (Janssen et al., 2020) and reinforcement learning (Weydmann et al., 2023). It seems that difficulties with integrating negative feedback may be central to these alterations (Mathar et al., 2017; Kastner et al., 2017), which could explain a potential insensitivity to the negative consequences associated with (over) eating. Crucially, in humans, a substantial contribution to (reward) learning is mediated by working memory processes (Moustafa et al., 2008; Collins & Frank, 2012, 2018; Collins et al., 2014, 2017; Westbrook et al., 2024). The observed difficulties in reward learning in obesity may hence partly be rooted in a failure to update working memory with new reward information, suggesting cognitive issues that extend beyond mere difficulties in valuation processes. However, empirical support for this interpretation is currently lacking. A more nuanced understanding of the effects of obesity on working memory is crucial, however, as it could lead to more targeted intervention options.”

      The result that Taq1A and DARPP-32 moderated the interaction between WM condition and BMI requires intricate post hoc analysis to understand the bearings to update. The authors found that Taq1A or DARPP32 genotype moderated the negative association between BMI and WM exclusively in the update condition (significant two-way interaction effect), suggesting that the BMI-WM associations in other conditions were similar across genotypes. Importantly, visual inspection of the relationship between WM and BMI (Fig 4 & 5) suggests more prevalent positive effects of the putatively advantageous Taq1A-A1 and DARPP-32-AA genotypes to the overall negative relationship between WM and BMI in updating, but not in the other conditions. Given that an overall negative relationship was statistically supported across all conditions (model 1), a plausible interpretation would be that the updating condition stands out in terms of a positive moderation by putative advantageous genotypes, rather than compound negative consequences of BMI and genotype in updating. Critically, this interpretation stands in stark contrast with the interpretation put forth by the authors suggesting a specifically negative association between BMI and WM updating.

      We are grateful for the reviewers’ thorough review and insightful comments. We appreciate the attention to detail and the opportunity to improve our manuscript. We agree that further examination of the relationship between Taq1A, DARPP-32, and BMI, particularly in the update condition, is crucial for a comprehensive understanding of our results. In response to your feedback, we have conducted additional post hoc analyses, which indeed revealed the effects anticipated by the reviewer. Accordingly, we have revisited our discussion and conclusions to ensure that they accurately reflect the complexities of our findings, particularly regarding the positive moderation by putative advantageous genotypes in the update condition. Once again, we appreciate your thoughtful review and are grateful for the opportunity to strengthen the manuscript based on your feedback.

      In the results section we added: 

      “Further post hoc examination of the effects on updating revealed that, the association between BMI and performance was significant for A1-carriers (95%CIs: -0.488 to -0.190), with 33.9% lower probability to score correctly per unit change in BMI, but non-significant for non-A1-carriers (95%CIs: -0.153 to 0.129; 1.22% lower probability). Interestingly, compared to all other conditions, in the update condition, the negative association between BMI and task performance was weakest for non-A1-carriers (estimate = -0.012, SE = 0.072, but strongest for A1-carriers (estimate = -0.339, SE = 0.076; see Figure 3 and Table S6), emphasizing that genotype impacts this condition the most. To further check if this difference in slope was statistically significant across conditions, we stratified the sample into Taq1A subgroups (A1+ vs. A1-) and assessed whether BMI affected task performance differently across conditions separately for each subgroup. This analysis revealed no significant difference in the relationship between BMI and task performance across conditions among A1+ individuals (pBMI*condition = 0.219). However, within the A1- subgroup, a significant interaction effect between BMI and condition emerged (pBMI*condition = 0.049). Collectively, these findings suggest that the absence of the A1-allele is linked to improved task performance, particularly in the context of updating, where it seems to mitigate the otherwise negative effects of BMI.” 

      “Once more, further examination of the observed DARPP-32, BMI, and condition interaction showed that, in the update condition, the negative association between BMI and task performance was weakest and nonsignificant for A/A (estimate = -0.044, SE = 0.066; 95%CIs: -0.174 to 0.086), but strongest and significant for G-carrying individuals (estimate = -0.324, SE = 0.079; 95%CIs: -0.478 to – 0.170). See Table S7 and Figure 5.  Splitting the sample in to DARPP subgroups (A/A vs. G-carrier) revealed that in both subgroups, there was significant interaction effect of BMI and condition on task performance (pA/A = 0.034, pG-carrier = 0.003). In the case of DARPP, it hence appears that carrying the disadvantageous G-allele could exacerbate the negative effects of BMI, while the more advantageous allele (A/A) might mitigate them - once again particularly in the context of updating.” 

      Following from this, we added the following text snippets to the discussion:

      “Noteworthy, our data revealed that differences in updating appeared to be driven by the non-risk allele groups. Despite increasing BMI, performance remained stable.” 

      “However, as BMI increases, the possession of a greater D2 receptor density seems to become advantageous, as evidenced by the lack of a negative correlation between BMI and updating performance in non-A carriers. We speculate that this phenomenon could potentially be attributed to the compensating effects of this genotype. While individuals with fewer D2 receptors (A1+) may have quicker saturation of receptors regardless of dopamine levels, in those with more D2 receptors (A1-) saturation may be slower. This could contribute to a more finely tuned balance between "go" and "no-go" signaling, despite potential alterations in dopamine tone in obesity (Horstmann et al., 2015; but also see Darcey et al., 2023 or Janssen & Horstmann, 2022). Clearly, the current data cannot provide empirical evidence for these speculations, and further discrete research is needed to establish firm conclusions. 

      Regarding DARPP, we found that carrying the G-allele significantly exacerbated the negative effects of BMI, while the more advantageous allele (A/A) mitigated them, once again particularly in the context of updating.”

      “Collectively, our observations hint at the potential of advantageous genotypes to moderate the adverse impacts of high BMI on cognitive functions.” 

      In conclusion, in its current form the title of the present work is ambivalent in terms of 1) the use of the term "impaired" in the context of cognitively normal individuals, 2) a BMI group difference specifically in the updating condition, and 3) the dopaminergic mechanisms based on observational data

      Given the results of the additional post hoc analyses, we agree with the reviewer and have refined the title of our work to be less misleading. The title now reads:     

      “Working Memory Gating in Obesity is Moderated by Striatal Dopaminergic Gene Variants” 

      Reviewer #1 (Recommendations for the Authors):

      Beyond the issues raised in the public review, I recommend the authors adjust the use of pathologizing terminology in the context of a clinically healthy population. In particular, terms like "dopaminergic abnormalities" and "working memory deficits/impairment" seem pathologizing in a healthy, non-morbidly obese cohort. To that end, despite a negative continuous association between BMI and WM, there are high and low-performing individuals in all BMI segments, and group differences (high vs low BMI; not reported) do not seem as dramatic as between healthy controls and say Parkinson's disease patients. Furthermore, owing to the observational design of the present study the authors should pay attention to the use of terms suggesting causal relationships, such as "influence" in the context of statistical associations. Also, sentences like "Our study is the first to show such selective effects" seem problematic not only in terms of claims of primacy, but also in terms of the selectivity of the effects (associations). See the public review for an alternative interpretation of selectivity to updating conditions.   

      Of minor importance are the occasional spelling errors, that should be carefully checked by the authors. Also, I would like the authors to double-check the model configurations reported in the main text and the supplementary material. According to the supplement model 1 contains task condition by subject as a random effect (random slope model), whereas the main text states that this model configuration didn't converge and therefore only subject-specific intercepts are included. Hence, there seems to be discordance between the model descriptions in the main text and supplement. To that end, it would seem appropriate to briefly motivate the use of LME and the random effect for subject (within-subject correlation between conditions). Also, the origin of the odds ratios (OR) reported in the results section is not explicitly defined in the methods or results.

      We appreciate the reviewer's thoughtful recommendations and have taken several steps to address the concerns raised:

      (1) We have revised our manuscript to ensure that the language is less pathologizing and avoids suggesting causal relationships where only associations are indicated.  

      For example: 

      In the abstract, we replaced “abnormalities” with “alterations”:   

      “Dopaminergic alterations have emerged as a potential mediator. However, current models suggest these alterations should only shift the balance in working memory tasks, not produce overall deficits”

      In the introduction we replaced “impairments” with “alterations”:               

      “This distinction may be crucial, however, as indirect evidence hints at potential specific alteration in these two sub-processes in obesity.

      Generally, we took care to replace terms like 'dopaminergic abnormalities' and 'working memory deficits/impairments' with more neutral descriptors suitable for a clinically healthy population in the whole manuscript. 

      (2) We have modified primacy statements to be more nuanced. In the discussion, for example, we now say “This finding is compelling as it demonstrates a rarely observed selective effect.” Instead of “This finding is compelling as we are the first to show such selective effects.”

      (3) We have conducted an additional thorough review of our manuscript to correct any spelling errors.

      (4) Upon reevaluation, we corrected the inconsistencies with respect to the random structure of model 1. We therefore have revised the supplementary material to now accurately reflect that the model did not converge when including condition as a random factor, and thus, only subject-specific intercepts are included.

      (5) We have expanded our methods section to better explain the use of linear mixed effects models (LMEs) and the inclusion of random effects for subjects to account for within-subject correlation between conditions. We added the following text:

      “Given the within-subject design of our study, we used generalized linear mixed models (GLM) […]” and

      “The random structure of the model was thus reduced to include the factor ‘subject’ only, thereby accounting for the repeated measures taken from each subject.”

      (6) We have clearly defined the derivation of the odds ratios reported in our results in the methods section of our manuscript. We added the following text to the methods section:

      “Reported odds ratios (OR) are retrieved from exponentiating the log-odds coefficients called with the summary() function.”

      Reviewer #2 (Public Review):

      The majority of participants seem to fall within the normal BMI range, whereas the interaction between BMI and genetic variations or amino acid ratio particularly surfaces at higher BMI. As genetic variations are usually associated with small effect sizes, the effective sample size, although large for a behavioral analysis only, might have been too small to detect meaningful effects of risk alleles of COMT and C957T.

      We thank the reviewer for the valuable feedback. We concur that the effective sample size may have posed a limitation in detecting meaningful effects of COMT and C957T, particularly given the skewness of our data towards participants within the normal BMI range. In response to the reviewer’s comments, we have refined the relevant paragraph in the limitations section of our manuscript, emphasizing the importance of recruiting a more balanced sample, including individuals with higher BMI, in future studies.

      “Furthermore, an additional limitation is that our data is slightly skewed towards participants within the normal BMI range. The effective sample size to detect meaningful genotype effects (e.g. for COMT or C957T) might thus have been too small, particularly at higher BMI levels. Future studies may address this limitation by recruiting a more balanced sample, including more individuals with higher BMI.”

      The relationships between genetic variations, BMI, and specific disturbances in dopamine signaling are complex, as compensating mechanisms might be at play to mitigate any detrimental effects. The results would therefore benefit from more direct measures or manipulations of dopaminergic processes.

      We thank the reviewer for this valuable input. We acknowledge the potential benefits of employing a more direct measure, or ideally, a dopaminergic manipulation, to establish a clearer causal link between dopamine processes and working memory gating in the context of obesity. In response to the reviewers' constructive feedback, we have addressed this limitation in the discussion section of our manuscript, emphasizing the need for further research in this area:

      “Additionally, the correlational nature of our findings highlights the need for more direct experimental manipulations of dopaminergic processes in obesity. Previous studies have established a causal link between dopamine and WM gating through drug manipulations (Fallon et al., 2017, 2019). Applying a similar approach to an obese sample could help establish a clearer causal link between dopamine activity and WM gating in the context of obesity.”

      The introduction could benefit from a more elaborate description of the predicted effects: into which direction (better or worse updating) would the authors predict each effect to go and why? This is clearly explained for COMT, but not for e.g. DARPP-32.

      We thank the reviewer for their valuable feedback. We appreciate the suggestion to provide a more detailed description of the predicted effects for each genetic marker in the introduction. We would like to note, however, that the analyses involving markers such as DARPP-32 were inherently exploratory in nature. Consequently, we intentionally refrained from formulating directed hypotheses, as our primary aim was to observe and report any emergent patterns.

      Reviewer #2 (Recommendations for the Authors):              

      To what extent are the polymorphisms or amino acid ratios associated with BMI? For example, when including C957T polymorphism in the analysis, the detrimental effect of BMI on working memory is no longer statistically significant. Could this be due to a relatively strong relationship between C957T polymorphism and BMI? Could the authors provide figures showing how BMI relates to the genetic polymorphisms and amino acid ratio?

      We appreciate the reviewer's insightful comment and have thoroughly investigated the potential relationship between the polymorphism and BMI. Our analysis did not reveal any direct association between C957T and BMI. We have included this analysis in our manuscript. The reviewer’s comment strengthened the comprehensiveness of our study.

      “Because the main effect of BMI dissipated when including C957T in the model, we ran an additional exploratory analysis to check whether this polymorphism directly related to BMI. Linear regression, predicting BMI by genotype, showed no association between the two (p = 0.2432), indicating that BMI effect is probably not masked by the presence of the C957T polymorphism. See Table S8.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This valuable manuscript reports alterations in autophagy present in dopaminergic neurons differentiated from iPSCs in patients with WDR45 mutations. The authors identified compounds that improved the defects present in mutant cells by generating isogenic iPSC without the mutation and performing an automated drug screening. The methodological approaches are solid, but the claims still need to be completed: showing the effects of the identified compounds on iron-related alterations is crucial. The effects of these drugs in vivo would be a great addition to the study. 

      Thank you for this assessment. We agree that further hit validation would be a great addition to the study. At present, we provide this through RNAseq data but not at the protein level. Further validation using in vivo models would also be warranted but is beyond the scope of the current work.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      In the current study, Papandreou et al. developed an iPSC-based midbrain dopaminergic neuronal cell model of Beta-Propeller Protein-Associated Neurodegeneration (BPAN), which is caused by mutations in the WDR45 gene and is known to impair autophagy. They also noted defective autophagy and abnormal BPAN-related gene expression signatures. Further, they performed a drug screening and identified five cardiac glycosides. Treatment with these drugs effectively in improved autophagy defects and restored gene expression. 

      Strengths: 

      Seeing the autophagy defects and impaired expression of BPAN-related genes adds strength to this study. Importantly, this work shows the value of iPSC-based modeling in studying disease and finding therapeutic strategies for genetic disorders, including BPAN. 

      Weaknesses: 

      It is unclear whether these cells show iron metabolism defects and whether treatment with these drugs can ameliorate the iron metabolism phenotypes. 

      We are pleased to ascertain that the reviewer feels the work is an important step in the field for BPAN. We also absolutely agree that secondary hit validation assays showing cardiac glycoside efficacy in restoring patient-related in vitro phenotypes would be very valuable. 

      We set up  assays to investigate iron metabolism phenotypes, including  western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) at day 65 of differentiation, but found no significant difference when comparing patient lines to controls (data not shown). 

      We also performed cell viability studies using the Alamar Blue assay on Day 11 ventral midbrain progenitors after 24 hour exposure to a) glucose starvation, b) media with no antioxidants (L-ascorbic acid and B-27 supplement), c) oxidative stressors MPP+ 1mM and FeCl3 100 uM (MPP+ and FeCl3 as suggested by  Seibler et al  (Brain 2018 PMID: 30169597). We found no difference in cell viability between patients, age-matched controls and CRISPR lines (data not shown). Additionally, we examined lysosomal function in BPAN Day 11 progenitors (2 age-matched controls, 3 patient lines, 2 isogenic controls); again, using the autophagy flux treatments mentioned above) via LAMP1 high content imaging immunofluorescence. We have seen no difference in LAMP1 puncta production between patient lines and controls and, therefore, have not included this data in our revision.

      Overall, we agree with the reviewer that  more validation of the compound hits’ ability to restore robust BPAN-related in vitro and in vivo phenotypes (including studies of iron metabolism/ homeostasis) will be needed in the future – this could be undertaken in more mature 2D culture systems, 3D organoid models and disease-relevant animal models.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors aim to demonstrate that cardiac glycosides restore autophagy flux in an iPSC-derived mDA neuronal model of WDR45 deficiency. They established a patientderived induced pluripotent stem cell (iPSC)-based midbrain dopaminergic (mDA) neuronal model and performed a medium-throughput drug screen using high-content imaging-based IF analysis. Several compounds were identified to ameliorate disease-specific phenotypes in vitro. 

      Strengths: 

      This manuscript engaged in an important topic and yielded some interesting data. 

      Weaknesses: 

      This manuscript failed to provide solid evidence to support the conclusion. 

      We are pleased that the reviewer assesses the work as conceptually important and interesting. We also agree that more work to understand the pathophysiology underpinning BPAN, and the mechanisms through which cardiac glycosides help restore affected intracellular pathways are warranted. More validation of the compound hits’ ability to restore broader disease-specific in vitro and in vivo phenotypes is also needed in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, this is a nicely executed study. Here are my suggestions:

      (1) Showing the iron phenotypes in these cells and testing if treatment with these drugs rescues iron-related phenotypes will add significant value to this work. 

      We absolutely agree that secondary hit validation assays showing  glycoside efficacy in restoring disease-related in vitro phenotypes is warranted. The main issue here is identifying how WDR45 deficiency leads to cellular dysfunction or dyshomeostasis and early death. Unfortunately, the mechanism by which this happens is not yet delineated, and more relevant future work is needed. 

      In our lab, we set up such assays. Regarding iron metabolism-related phenotypes, we performed western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) but found no significant difference when comparing patient lines to controls (data not shown). We also performed cell viability studies using the Alamar Blue assay on Day 11 ventral midbrain progenitors after 24 hour exposure to a) glucose starvation, b) media with no antioxidants (L-ascorbic acid and B-27 supplement), c) oxidative stressors MPP+ 1mM and FeCl3 100 uM (MPP+ and FeCl3, as suggested by the Seibler et al paper, Brain 2018 PMID: 30169597). We found no difference in cell viability between patients, age-matched controls and CRISPR lines (data not shown). Additionally, we examined lysosomal function in BPAN Day 11 progenitors (2 age-matched controls, 3 patient lines, 2 isogenic controls; again, using the autophagy flux treatments mentioned above) via LAMP1 high content imaging immunofluorescence. We have seen no difference in LAMP1 puncta production between patient lines and controls and, therefore, have not included this data in our revision.

      (2) Assessing the effects of these drugs in an in vivo model will strengthen this study. 

      This is a valid point, and we agree that further validation using in vivo models such as the reported BPAN mouse models, would be warranted in the future.

      Reviewer #2 (Recommendations For The Authors): 

      While this manuscript engaged in an important topic and yielded exciting data, there are still some concerns for the authors to address. 

      (1) The biggest concern is that the characterization of autophagic flux solely with LC3 is not convincing enough. Although ATG2A and ATG2B are required for phagophore formation during autophagy, their interaction with WDR45 seems dispensable for phagophore formation for a mild autophagy defect observed in WDR45 knockout cell models and mouse models. All wdr45/- mice are born normally and survive the postnatal starvation period, unlike mice lacking essential ATG proteins, like ATG5, ATG7, and VMP1. The functional relevance of WDR45 and autophagy remains to be fully established. Overall, this manuscript failed to provide solid evidence to support the conclusion. 

      This is a valid point. We have looked at autophagy flux in fibroblasts and Day 11 ventral midbrain stage. For fibroblasts, 1 control line and three patient lines were used; for Day 11 progenitors, 2 control lines, 2 patient lines and one isogenic control were used. Cells from different lines were cultured on the same 96-well plates, at the same plating density, and treated concurrently to minimise fluctuations in flux due to unaccounted factors, e.g., confluence, incubator temperature etc. Treatments consisted of a) DMSO (basal condition), b) Bafilomycin A1 (flux inhibition via autophagosome/ lysosome fusion blockage), c) Torin A1 (mTOR inhibitor, flux inducer) and d) combination of Bafilomycin A1 and Torin 1, for a total of 3 hours. In all these conditions, LC3 puncta production in BPAN lines was reduced when compared to controls. We believe that these results indicate defective autophagy flux in BPAN in different cell types.

      Moreover, we have demonstrated defects in autophagy-related gene (ATG) expression through RNA sequencing, that is restored after CRISPR/Cas9-mediated correction of the disease-causing mutation in a patient derived line, but also after treatments with torin 1 and digoxin. These results suggest a dysregulated ATG network in WDR45 deficiency. 

      (2) WDR45 is linked to BPAN. Do the authors detect any iron accumulation in DA progenitors or mDA neurons? 

      Regarding iron metabolism-related phenotypes, we performed western blotting for Ferritin Heavy Chain 1, Transferrin and Ferroportin 1 (SLC40A1) but found no significant difference when comparing patient lines to controls (data not shown). We agree that more studies into the links between WDR45 deficiency, iron metabolism and neurodegeneration are needed. 

      (3) It is necessary to detect LC3 protein levels by western blot to distinguish LC3I and LC3II and gain a more accurate understanding for the process of LC3 - marked autophagosome. 

      Thank you for this valid point. 

      Due to the very dynamic nature of autophagy, and many factors influencing flux , we have not been able to meaningfully examine autophagy-related markers in an iPSC-derived system that is also inherently prone to variability.  Therefore, LC3 and p62 values exhibited high variability, and hence we are unable to adequately interpret them (data not shown). Instead, in this manuscript we have focused on high-content assays with cells cultured and treated simultaneously at Day 11 of differentiation, which have shown autophagy flux defects.

      We have looked at autophagy flux in fibroblasts and at Day 11 ventral midbrain stage. For fibroblasts, 1 control line and three patient lines were used; for Day 11 progenitors, 2 control lines, 2 patient lines and one isogenic control were used. Cells from different lines were cultured on the same 96-well plates, at the same plating density, and treated concurrently to minimise fluctuations in flux due to unaccounted factors, e.g., confluence, incubator temperature etc. Treatments consisted of a) DMSO (basal condition), b) Bafilomycin A1 (flux inhibition via autophagosome/ lysosome fusion blockage), c) Torin A1 (mTOR inhibitor, flux inducer) and d) combination of Bafilomycin A1 and Torin 1, for a total of 3 hours. In all these conditions, LC3 puncta production in BPAN lines was reduced when compared to controls. We believe that these results indicate defective autophagy flux in BPAN in different cell types.

      (4)  Some methodological details need to be included - detailed descriptions of various quantifications for IF staining should be provided. For example, it is unclear how "% cells+ ve for marker combination" (Fig.1B) was quantified, and there are many unconventional units such as "% cells+ ve for marker combination "; please check and correct them. 

      Thank you for pointing this out. We have changed the legends in Figure 1B and Supplementary Figure 2C to ‘percentage of cells positive for marker combination’. Moreover, in our Methods section (Immunocytochemistry sub-section), we have updated the text as follows, to give more clarification on the process of marker quantification (Page 25, Paragraph 2): ‘For quantification, 4 random fields were imaged from each independent experiment. Subsequently, 1200 to 1800 randomly selected nuclei were quantified using ImageJ (National Institutes of Health). Manual counting for nuclear (DAPI) staining and co-staining with the marker of interest was performed, and percentages of cells expressing combinations of markers were calculated as needed.’

      (5) In Figure 3 and Figure 4, the quantifications for IF images were inconsistent with the shown IF image, for example, the representative IF image for detection of LC3 with Tor1 treatment. 

      Due to space restrictions, we have not included representative images from all patient lines, and every treatment condition depicted in the graphs. In Figure 3 (describing the set-up of the LC3 screening assay), only one control line and one patient line is shown in basal (DMSO-treated) conditions. In Supplementary Figure 4D, only one patient line and the corresponding isogenic control line are depicted after Torin 1 treatments.

      Quantification of the LC3 puncta in this assay (20 fields per well, each condition in a technical duplicate, n=8 biological replicates) was automated, using ImageJ and R Studio, with subsequent statistical significance calculation on GraphPad Prism. Hence, the immunofluorescence figures depict a reduction in LC3 puncta per nuclei numbers in patient-derived lines versus controls, but not the exact difference after automated image analysis. We have detailed this in the Methods section (High content imaging-based immunofluorescence subsection) of our manuscript (Page 26, Paragraph 2): ‘For all high content imaging-based experiments, the PerkinElmer Opera Phenix microscope was used for imaging. 20 fields were imaged per well, at 40 x magnification, Numerical Aperture 1.1, Binning 1. Image analysis was performed using ImageJ and R Studio.60 For the drug screen, puncta values were normalised according to positive and negative controls from each plate and Z-scores for each compound screened were generated.  Statistical significances were calculated on GraphPad Prism V.

      8.1.2. software (GraphPad Software, Inc.; https://www.graphpad.com/scientific-software/prism/).’

      (6)  In Figure 4C, LC3 should be co-stain with the DA progenitor maker to indicate that the intercellular LC3 level within the projectors. 

      Thank you for raising this point. The images from Figure 4C were obtained during the medium throughput drug screen, where the FOXA2 co-stain was not used. The FOXA2 stain was only used during the initial set-up of the LC3 screening assay, to confirm that the Day 11 cells had ventral midbrain identities. Indeed, most of the Day 11 cells used in the high content imaging-related experiments were FOXA2-positive, as shown in Figure 3 and Supplementary Figure 4.

      (7) Examining P62, one of the most important indicators for autophagic flux, should be parallel with LC3 detection. In Figure 5A, P62 accumulation seems not significant in patient 02 Day 11 ventral midbrain projectors; how about that in Day 65? 

      The reviewer is raising a valid point. We have not examined p62 and LC3 staining in parallel in high content imaging-based experiments but agree that this would be good to examine in future studies. 

      Some other minor points 

      (8) It needs to give a more detailed description of the tested compounds you mentioned in the text. 

      Thank you for this point. We have elaborated on the contents of the Prestwick library used for the screening, as below (Page 9, Paragraph 3): ‘We then utilised this high-content imaging LC3 assay to identify novel compounds of potential therapeutic interest for BPAN by screening the Prestwick Chemical Library containing 1,280 compounds, of which more than 95% FDA/ EMA approved.’

      In the Methods Section, Page 25, Paragraph 5, we also detail the library as follows: ‘For drug screening, the Prestwick Chemical Library (1,280 compounds, 95% FDA/ EMA approved, 10 mM in DMSO, https://www.prestwickchemical.com/screening-libraries/prestwick-chemical-library/) was used; cells were treated with compounds for 24 hours at 10 μM final concentration.’

      (9) Please pay attention to the abbreviation; many gene names only have abbreviations without full names when they first appear in the context. 

      Thank you for this point. We have corrected this in various places throughout the manuscript and especially in the introduction section.

      (10) Almost all figures have the problem of insufficient image resolution, or the font of the indicated words needs to be bigger to be distinguished clearly, like in Fig.1B, 1C, 1E. 

      Thank you for this point, we have ensured that all figures have adequate image resolution as specified by the journal requirements. 

      (11) The sample size or biological repeated times should be given in figure legends. 

      Thank you for this point. We have now indicated numbers of biological replicates where appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Below, I will list the points that should be addressed by the authors:

      (1) Line 139: The authors conclude that the lack of a phenotype induced by knockdown of Polr1F is due to reduced baseline sleep because of the leakiness of the Genswitch system. However, it is not clear why the argument of the SybGS being leaky should not apply to all experiments done with this tool. The authors should comment on that aspect. Furthermore, this claim is testable since it should be detectable against genetic controls. An alternative explanation to the proposed scenario is that the Polr1F sleep phenotype observed in the constitutive knockdown experiment is based on developmental defects. The authors should provide additional evidence to explain the discrepancy.

      We appreciate the reviewer’s insightful feedback. We assume the reviewer is referring to Regnase-1 RNAi (and not Polr1F) as Regnase-1 RNAi flies exhibit reduced sleep before dusk, potentially hindering further detection of sleep reduction. The leaky sleep reduction was based upon comparison with genetic controls in that experiment. Nevertheless, to discern whether our observations stem from developmental effects, we conducted adult-specific knockdowns of both Polr1F and Regnase-1 using the TARGET system. We generated the R35B12-Gal4:TubGal80ts line and crossed it with the UAS-Polr1FRNAi and UAS-Regnase-1RNAi lines. We confirmed that Polr1F RNAi promotes sleep when knocked down in adults (Figure 3 - supplemental figure 1). Conversely, Regnase-1 showed no effect on sleep in the adult stage, which is consistent with our nSyb-GS experiments, and suggests, as noted by the reviewer, that the Regnase-1 RNAi sleep effect is likely developmental (Figure 3 – supplemental figure 3).

      (2) Line 170: Regnase1 knockdown affects all memory types, including short-term and long-term memory. The authors conclude that these genes are involved in consolidation. However, besides consolidation, it has been shown that α′β′ KCs are involved in short-term appetitive memory retrieval. Thus, an equally possible explanation is that the knockdown impairs the neuronal function per se, which would lead to a defect in all behaviors related to α′β′ KCs, rather than a specific role for consolidation. The authors have to provide additional evidence to substantiate their claim.

      The exact role of Regnase-1 in the α′β′ KCs remains unclear.  We acknowledge the reviewer’s concern and have amended our conclusion to include this potential explanation suggested by the reviewer.

      (3) Line 87-88: For the protocol used, it was reported that GFPnls cannot be used for FACS sorting. The authors might want to comment/clarify that aspect. https://star-protocols.cell.com/protocols/1669.

      For our RNA-seq experiments, we conducted single cell isolation by FACS sorting cells, instead of nuclei, labeled with GFP.nls. The protocol mentioned that GFP.nls is not effective for single nuclear RNA-seq as it is not specific for nuclei, but for our cell sorting purposes that did not matter.

      (4) Line 131: The authors should report the concentration of RU486.

      Sorry, this is now in methods.

      (5) Line 155: Is that really 42 hours? This might be a typo. If not, it would be good to justify the prolonged re-starvation period.

      Flies fed after training form sleep-dependent memories but did not show robust long-term memory after 30 h of restarvation. As starvation is a requisite for appetitive memory retrieval (Krashes and Waddell 2008), the low memory scores after 30 h could be due to inadequate starvation. Therefore, we starved flies for 42h, which is similar to the sleep-independent memory paradigm in which flies are starved for 18 h before training and then tested 24 h after training; this protocol resulted in robust long-term memory performance. These flies were fine and able to make choices in a T-maze after 42 h starvation.

      (6) I will be listing mistakes/unclear points in the figures. However, all figures should be checked very carefully for clarity.

      Thanks for these valuable comments. We have gone over the figures carefully and fixed any issues we found.

      (7) Figure 1C: It is not entirely clear to me how this heatmap was created and what the values mean.

      The 59 differentially expressed genes (DEGs) were selected based on DESeq2 described in the methods. For the heatmap, Transcripts per million (TPM) of these 59 DEGs were log-transformed and then scaled row-wise and plotted with IDEP v0.95 (http://bioinformatics.sdstate.edu/idep95/).

      (8) Figures 2A and 2B: The units might be missing. For Supplementary Figure 2, it is not clear what the different groups are without looking at the main figure.

      Fixed.

      (9) Figure 3: The panel arrangement is confusing. Furthermore, the "B)" is cut. The same issue is present in the Supplementary Figure.

      Sorry! We rearranged the panels, and fixed the issue in both figures.

      (10) Figure 5B: It is not clear what the scale bar means.

      Now indicated

      (11) Line 119: The citation "Marygold et al n.d."?

      Fixed

      (12) Line 620: I'm not sure that the rate and localization of nascent peptide synthesis are measured.

      Great point. We used the puromycin assay to estimate significant changes in translation. However, we did not measure the absolute translational rate or the localization of newly synthesized proteins. We rephrased this in the updated manuscript.

      (13) Line 627, the authors should give the NA of the objective, further the authors should double-check the information they provide on the resolution.

      Fixed, it was 20X.

      (14) Line 629 "Fuji" is unclear, it might refer to the Fiji software, and in that case, it should be listed in the used software. Further, the authors have to check on the information they provide on the intensity, e.g. is that GFP fluorescence?

      Yes, it was Fiji and GFP. The manuscript has been updated accordingly.

      (15) Line 634, It is stated that two concentrations of CX-5461 are used, however, as far as I can see only data for the 0.2 mM.

      We apologize for the confusion. Data are indeed only shown for 0.2 mM. We also tested 0.4 mM and 0.6 mM under fed conditions once and 0.1 mM under starved conditions twice. Since all effects were not significant, we only presented the complete 0.2 mM results in the supplementary figure.

      (16) Line 352 "Marygold et al nd" is probably a glitch in the citation?

      It’s a citation tool issue and has been fixed.

      (17) The authors use apostrophe rather than a prime in describing the α "prime" β "prime" KCs

      We have corrected this.

      Reviewer #2 (Recommendations For The Authors):

      The authors have generated an interesting study that promises to advance the understanding of how context-dependent changes in sleep and memory are executed at the molecular level. The manuscript is well-written and the statistical analyses appear robust. Major and minor comments are detailed below.

      Overall, I would suggest that the authors try to obtain additional evidence that Pol1rF modulates sleep and test the effect of acute adult-stage knockdown of Polr1F and Regnase-1 specifically in ap α'β' MBNs rather than pan-neuronally.

      Major comments

      (1) In Figures 2 and 3 and associated supplemental figures, the authors first test for a role for Polr1F and Regnase-1 specifically in ap α'β' MBNs (Fig. 2), then test for an acute role for these proteins via pan-neuronal drug inducible expression (Fig. 3). Because the former manipulation is cell-specific and the latter is pan-neuronal, it is hard for the reader to draw conclusions pertaining to ap α'β' MBNs from the second dataset. Perhaps Regnase-1 indeed acutely regulates sleep in ap α'β' MBNs, but that effect is masked by counteracting roles in other neurons? Conversely, it remains possible that Polr1F and Regnase-1 act during development in ap α'β' MBNs to modulate sleep. Indeed, since silencing the output of ap α'β' MBNs using temperature-sensitive shibire does not alter baseline sleep (Chouhan et al., (2021) Nature), the notion that Regnase-1 could act acutely in ap α'β' MBNs to reduce baseline sleep is somewhat surprising.

      The authors could address this by using a method such as TARGET (temperature-sensitive GAL80) to acutely reduce Polr1F and Regnase-1 expression specifically in ap α'β' MBNs and test how this impacts sleep.

      Thanks for the very helpful suggestions. We have done the suggested experiments and discuss them above in response to Reviewer 1. They are included in the manuscript as Figure 3 – supplemental figure 1 and figure 3 – supplemental figure 3.

      (2) Figure 4 presents data examining whether Polr1F and Regnase-1 knockdown suppresses training-induced increases in sleep. For the untrained flies, based on the data in Fig. 2C, E I expected that Polr1F knockdown flies would exhibit more sleep than their respective controls (Fig. 4E), but this was not the case. These data suggest that more evidence may be warranted to strengthen the link between Polr1F (and potentially Regnase-1) knockdown and sleep. Could the authors use independent RNAi constructs or cell-specific CRISPR (all available from current stock centres) to validate their current results? Related to this, it would be useful to know whether the authors outcrossed any of their transgenic reagents into a defined genetic background.

      The untrained flies in figure 4E are not equivalent to flies tested for Polr1F effects on sleep in figure 2C. In Figure 4E, flies were starved for 18 h and then exposed to sucrose without an odor at ZT6. Following sucrose exposure, flies were moved to sucrose locomotor tubes, and sleep was assessed only in the ZT8-12 interval. Sleep was not significantly different between untrained R35B12>Polr1FRNAi and Polr1FRNAi/+ flies, and while it was higher in R35B12>Polr1FRNAi than in R35B12/+ untrained flies, the data overall indicate that Polr1F downregulation has no impact on sleep under these conditions and at this time. Similarly, in fully satiated settings (Figure 2C), we found no difference in sleep during the ZT8-12 period between R35B12>Polr1FRNAi flies and genetic controls. We did not outcross our transgenic lines but have now tested another available Polr1F RNAi (VDRC: v103392) (Figure 3 – supplemental figure 1). As shown in the figure, adult-specific knockdown of Polr1F by this RNAi line promoted sleep, as did the initial RNAi line.

      (3) Could the authors provide additional evidence that Polr1F knockdown in ap α'β' MBNs does not enhance sleep by reducing movement? A separate assay such as climbing would be beneficial. Alternatively, examining peak activity levels at dawn/dusk from the 12L: 12D DAM data.

      We checked the peak activity per minute per day for adult specific knockdown of PorlF1 and Regnase-1 (data shown in Figure 3 – supplemental figure 4). The results show that Polr1F knockdown in ap α'β' MBNs does not enhance sleep by reducing movement.

      (4) In terms of validating their proposed model, over-expressing of Polr1F during appetitive training might be predicted to suppress training-induced sleep increases and potentially long-term memory. Do the authors have any evidence for this?

      We were unable to find any Pol1rF overexpression line. However, we obtained the Regnase-1 over-expression line from Dr. Ryuya Fukunaga’s lab and found that Regnase-1 OE does not affect sleep (Figure 4 – supplemental figure 1).

      Minor comments

      (1) Abstract: can the authors please define 'ap' as anterior posterior?

      Fixed.

      (2) Figure 2 Supplemental 1: can the authors please denote the genotypes each color refers to in?

      Fixed.

      (3) In Figure 3 Supplemental 1, the authors state that acute Regnase-1 knockdown did not reduce sleep, but sleep during the night period does appear to be reduced (panel A). Was this quantified?

      We quantified this, and it was not significant.

      (4) Discussion, line 234: the heading of this section is 'Polr1F regulates ribosome RNA synthesis and memory' but the data presented in Figure 4 suggests that Polr1F does not affect memory. Can the authors clarify this?

      We made an adjustment to the title and acknowledge that at the present time we cannot say Polr1F affects memory.

      (5) Methods, Key Resource Table: can the authors please identify which fly lines were used for Polr1F and Regnase-1 knockdown experiments?

      Fixed. Fly line BDSC64553 was used for Polr1F RNAi except in Figure 3 – supplemental figure 1 and 4, where VDRC 103392 was used. VDRC 27330 was used for Regnase-1 knockdown experiments.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 1B: This plot is currently labelled as PCA of DEGs, which I believe is inaccurate, as such a plot is a quality control that examines the overall clustering of samples by using all read counts (not just the DEGs). In addition, the color key value of this Figure 1B is not provided.

      Thank you for the insightful suggestion. The reviewer’s comment here that typically PCA plots are used for overall clustering of RNA-seq samples is indeed valid. We've acknowledged that our samples, due to their high similarity in cell populations and mild treatments, do not exhibit clear separation when we use all genes. However, we show a PathwayPCA plot of all DEGs. We aim to highlight that RNA processing pathways enriched among the DEGs account for much of the separation of the groups.

      (2) A reviewer token is not provided to examine the sequencing data set.

      The RNA-seq data has been submitted to the Sequence Read Archive (SRA) with NCBI BioProject accession number PRJNA1132369. The reviewer token is https://dataview.ncbi.nlm.nih.gov/object/PRJNA1132369?reviewer=cvqkddp8rjuebsjefk0f19556r.

      (3) In the discussion, the author pointed out that many of the 59 DEGs have implicated functions in RNA processing. To strengthen the statement, it would be beneficial to conduct the Gene Ontology analysis to test whether the DEGs are enriched for RNA processing-related GO terms.

      We have included the GO analysis results in Figure1 and another GO analysis of all DEGs in Figure 1 – supplemental figure 1.

      (4) Figure 4E presents an intriguing finding because it shows that the untrained R35B12>Polr1FRNAi flies exhibit reduced sleep (instead of increased sleep) when compared to untrained Polr1/+ control flies.

      Please see above response to reviewer #2 question2.

      (5) For the memory assay method, the identity of odor A and odor B is not provided.

      We used 4-methylcyclohexanol and 3-octanol; this information has been added into the methods section.

      (6) Female flies were used for the sleep assay. However, it is not clear whether only female flies were used for the memory assay.

      Mixed sexes are used for memory assays because a huge number of files is needed for these experiments. We added this information in the methods.

      (7) It is important to provide olfactory acuity data on control and experimental animals to rule out that the learning/memory phenotype is caused by defects in sensing the odor used for training and testing.

      Since Polr1F RNAi flies perform well, odor acuity is not an issue. Regnase1RNAi affects both short-term and long-term memories, but this seems to be a developmental issue, so we did not do the odor acuity experiments here.

      (8) Line 20: "ap alpha'/beta'" neurons should be spelled as "anterior posterior (ap) alpha'/beta' neurons", as this is the first time that this anatomical name appears in this manuscript.

      Fixed.

      (9) Figure 2C and 2D labelling: R35B12>control; UAS control should be changed to R35B12/+ control; UAS-RNAi/+ control.

      Fixed.

      (10) Line 155: it is unclear why the flies were re-starved for 42hr before testing. Is this a different protocol from the 30hr re-starvation that was used by Chouhan et al., 2021?

      We have explained the rationale above. The starvation period was increased to get better memory scores.

      (11) Line 160: it is stated that knocking down Polr1F did not affect memory, which is consistent with Polr1f levels typically decreasing during memory consolidation. Is there a reference demonstrating that Polr1f levels typically decrease during memory consolidation?

      It’s from our RNA-seq dataset from Figure1C. The level of Polr1F decreased in fed trained flies compared with other control flies.

      (12)  Genotype labeling in Figure 4F is inconsistent with the rest of the manuscript.

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a very nice study of Belidae weevils using anchored phylogenomics that presents a new backbone for the family and explores, despite a limited taxon sampling, several evolutionary aspects of the group. The phylogeny is useful to understand the relationships between major lineages in this group and preliminary estimation of ancestral traits reveals interesting patterns linked to host-plant diet and geographic range evolution. I find that the methodology is appropriate, and all analytical steps are well presented. The paper is well-written and presents interesting aspects of Belidae systematics and evolution. The major weakness of the study is the very limited taxon sampling which has deep implications for the discussion of ancestral estimations.

      Thank you for these comments.

      The taxon sampling only appears limited if counting the number of species. However, 70 % of belid species diversity belongs to just two genera. Moreover, patterns of host plant and host organ usage and distribution are highly conserved within genera and even tribes. Therefore, generic-level sampling is a reasonable measure of completeness. Although 60 % of the generic diversity was sampled in our study, we acknowledge that our discussion of ancestral estimations would be stronger if at least one genus of

      Afrocorynina and the South American genus of Pachyurini could be included.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a combination of anchored hybrid enrichment and Sanger sequencing to construct a phylogenomic data set for the weevil family Belidae. Using evidence from fossils and previous studies they can estimate a phylogenetic tree with a range of dates for each node - a time tree. They use this to reconstruct the history of the belids' geographic distributions and associations with their host plants. They infer that the belids' association with conifers pre-dates the rise of the angiosperms. They offer an interpretation of belid history in terms of the breakup of Gondwanaland but acknowledge that they cannot rule out alternative interpretations that invoke dispersal.

      Strengths:

      The strength of any molecular-phylogenetic study hinges on four things: the extent of the sampling of taxa; the extent of the sampling of loci (DNA sequences) per genome; the quality of the analysis; and - most subjectively - the importance and interest of the evolutionary questions the study allows the authors to address. The first two of these, sampling of taxa and loci, impose a tradeoff: with finite resources, do you add more taxa or more loci? The authors follow a reasonable compromise here, obtaining a solid anchored-enrichment phylogenomic data set (423 genes, >97 kpb) for 33 taxa, but also doing additional analyses that included 13 additional taxa from which only Sanger sequencing data from 4 genes was available. The taxon sampling was pretty solid, including all 7 tribes and a majority of genera in the group. The analyses also seemed to be solid - exemplary, even, given the data available.

      This leaves the subjective question of how interesting the results are. The very scale of the task that faces systematists in general, and beetle systematists in particular, presents a daunting challenge to the reader's attention: there are so many taxa, and even a sophisticated reader may never have heard of any of them. Thus it's often the case that such studies are ignored by virtually everyone outside a tiny cadre of fellow specialists. The authors of the present study make an unusually strong case for the broader interest and importance of their investigation and its focal taxon, the belid weevils.

      The belids are of special interest because - in a world churning with change and upheaval, geologically and evolutionarily - relatively little seems to have been going on with them, at least with some of them, for the last hundred million years or so. The authors make a good case that the Araucaria-feeding belid lineages found in present-day Australasia and South America have been feeding on Araucaria continuously since the days when it was a dominant tree taxon nearly worldwide before it was largely replaced by angiosperms. Thus these lineages plausibly offer a modern glimpse of an ancient ecological community.

      Weaknesses:

      I didn't find the biogeographical analysis particularly compelling. The promise of vicariance biogeography for understanding Gondwanan taxa seems to have peaked about 3 or 4 decades ago, and since then almost every classic case has been falsified by improved phylogenetic and fossil evidence. I was hopeful, early in my reading of this article, that it would be a counterexample, showing that yes, vicariance really does explain the history of *something*. But the authors don't make a particularly strong claim for their preferred minimum-dispersal scenario; also they don't deal with the fact that the range of Araucaria was vastly greater in the past and included places like North America. Were there belids in what is now Arizona's petrified forest? It seems likely. Ignoring all of that is methodologically reasonable but doesn't yield anything particularly persuasive.

      Thank you for these comments.

      The criticism that the biogeographical analysis is “not very compelling” is true to a degree, but it is only a small part of the discussion and, as stated by the reviewer, cannot be made more “persuasive”, in part because of limitations in taxon sampling but also because of uncertainties of host associations (e.g. with ferns). We tried to draw persuasive conclusions while not being too speculative at the same time. Elaborating on our short section here would only make it much more speculative — and dispersal scenarios more so than vicariance ones (at least in Belinae).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments relative to this last point of a more general nature:

      - I think it would be informative in Figure 1 to present family names for the outgroups.

      Family names for outgroups have been added to Figure 1.

      - There is a summary of matrix composition in the results but I think a table would be better listing all necessary information for each dataset (number of taxa, number of taxa with only Sanger data, parsimony informative sites, GC content, missing data, etc...).

      We added Table S4 with detailed information about the matrices.

      - Perhaps I missed it, but I didn't find how fossil calibrations were implemented in BEAST (which prior distribution was chosen and with which parameters).

      We used uniform priors, this has been added to the Methods section.

      - I am worried that the taxon sampling (ca. 10% of the family) is too low to conduct meaningful ancestral estimations, without mentioning the moderately supported relationships among genera and large time credibility intervals. This should be better acknowledged in the paper and perhaps should weigh more into the discussion.

      Belidae in general are a rare group of weevils, and it has been a huge effort and a global collaboration to sample all tribes and over 60 % of the generic diversity in the present study. A high degree of conservation of host plant associations, host plant organ usage and distribution are observed within genera and even tribes. Therefore, we feel strongly that the resulting ancestral states are meaningful.

      Moreover, 70 % of the belid species diversity belongs to only two genera, Rhinotia and Proterhinus. Our species sampling is about 36 % if we disregard the 255 species of these two genera.

      However, we acknowledge that our results could be improved by sampling more genera of Afrocorynina and Pachyurini. However, these taxa are very hard to collect. We have acknowledged the limitation of our taxon sampling, branching supports and timetree credibility intervals in the discussion to minimize speculative in conclusions.

      - It might be nice to have a more detailed discussion of flanking regions. In my experience and from the literature there seems to be increasing concern about the use of these regions in phylogenomic inferences for multiple solid reasons especially the more you go back in time (complex homology assessment, overall gappyness, difficulty to partition the data, etc...)

      We tested the impact of flanking regions on the results of our analyses and showed this data did not having a detrimental impact. We added more details about this to the results section of the paper, including information about the cutoffs we used to trim the flanking regions.

      Reviewer #2 (Recommendations For The Authors):

      Line 42, change "recent temporal origins" to "recent origins".

      Modified in the text.

      Line 97-98, "phylogenetic hypotheses have been proposed for all genera" This is ambiguous. The syntax makes it sound like these were separate hypotheses for each genus - the relationships of the species within them, maybe. However, the context implies that the hypotheses relate to the relationships between the genera. Clarify. "A phylogenetic hypothesis is available for generic relationships in each subfamily. . . " or something.

      Modified in the text.

      Line 162, ". . . all three subtribes (Agnesiotinidi, Belini. . . " Something's wrong here. Change "subtribes" to "tribes"?

      Modified in the text.

      Line 219, the comma after "unequivocally" needs to be a semicolon.

      Modified in the text.

      Line 327 and elsewhere, the abbreviation "AHE" is used but never spelled out; spell out what it stands for at first use. Or why not spell it out every single time? You hardly ever use it and scientists' habit of using lots of obscure abbreviations is a bad one that's worth resisting, especially now that it no longer requires extra ink and paper to spell things out.

      Modified in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Minor

      (MN1) The segregants should be referred to as F2 segregants as they are derived from an F1 cross.

      We thank the reviewer for pointing out this important oversight. We indeed analyzed segregants of an F1 cross and have corrected this in the text.

      (MN2) The connections to eQTLs in other organisms should be addressed in the introduction and conclusion. For example, in humans, there has been little evidence for trans eQTLs in contrast to what has been found in yeast.

      We thank the reviewer for pointing this out and improved our introduction and conclusion with such connections.

      (M3) The authors state that an advantage of scRNAseq over bulk is that it captures rare cell populations (line 79), but this advantage is not exploited in this study.

      While we did not explicitly demonstrate the effect of using scRNA-seq on capturing variation in rare cell populations, the referenced literature (21, 40) provides evidence that pooled scRNA-seq captures important expression heterogeneity (which implicitly contains potentially rare expression states). In our study, this is leveraged on F2 segregants to assess expression variation within the same lineage (genotype). This impacts the partitioning of expression variance from genotype.

      Thus, we mentioned this point to further support the choice of using scRNA-seq for this analysis and showed that even a few single cells enable the reconstruction of the genome and expression profile of rare cell types.

      (MN4) The authors use ~5% of the lineages from the original study. There is no rationale for why this is an appropriate sample size. Is there an argument for using more cells in eQTL mapping or conversely could the authors ask if fewer cells would provide similar conclusions by downsampling?

      Although scRNA-seq is highly scalable, it has limitations in terms of throughput. Indeed, a single library with 10x Genomics generates data in the order of 10^4 wellcovered cells. With these limitations, our choice of ~5% of the lineages of the original study stems from the need to recover the same lineage multiple times within these 10^4 cells (in our study, each lineage is recovered on average 4 times). 

      While it is possible to run multiple libraries and sequencing lanes, budget limitations prevent us from running more libraries, especially since we expect power to scale with the square-root of the number of lineages (there is diminishing returns). 

      (MN5) I do not agree that the use of UMIs overcomes the challenges of low sequencing depth. UMIs mitigate the possible technical artifacts due to massive PCR amplification.

      We thank the reviewer for this comment and will clarify this in the manuscript. Indeed, we intended to refer to the breadth of coverage (instead of the depth), which would usually manifest with massive PCR amplification of few transcripts.

      (MN6) There is an inadequate reference to prior work on scRNAseq in yeast that established the methods used by the authors and eQTL mapping in human cells using scRNAseq.

      We thank the reviewer for this and have added more context on scRNA-seq methods benchmark in yeast (drop-seq etc) and sc-eQTL in human. Additionally, we have cited Jariani et al. (2020) in eLife where similar techniques were employed for scRNA-seq in yeast.

      (MN7) The use of empty quotes in Figure 4A is confusing and an alternative presentation method should be used.

      We will remove these empty quotes characters and replace them with a more meaningful representation like “none”.

      (MN8) The authors speculate about the use of predicted fitness instead of observed fitness, but this is something they could explicitly address in their current study.

      We thank the reviewer for this comment but have decided not to perform a whole new bulk-segregant analysis experiment (X-QTL) to identify QTL that way. However, we do agree that we could in principle use the QTL that were identified in our previous study (Nguyen Ba et al, 2022). Despite this, we do not see the need for this because the predicted fitness is the overlap between genotype and phenotype (within the variance partitioning framework, it is the ‘narrow-sense heritability’ if one ignores epistasis). Thus, the use of predicted fitness when partitioning for expression variation would be constrained to that overlap (as opposed to the real observed fitness). This means that within the variance partitioning framework, the overlap of genotype, expression, and fitness is fully recapitulated by using predicted fitness instead (given that this predicted fitness is accurate to the narrow-sense heritability). In our previous study, we found that the QTL essentially predict all of the narrow-sense heritability. We believe it is therefore evident that the use of predicted fitness would be sufficient if and only if the expression variation independent of genotype is not associated with observed fitness.

      We note that our study cannot generalize whether the overlap between genotype and expression fully captures fitness variation explained by expression. Indeed, we believe this is not generalizable to many other contexts (for example, in development). Thus, at present, the use of predicted fitness remains a speculation.

      Major:

      (MJ1) There is insufficient information provided about the nature of data. At a minimum, the following information should be provided to enable assessment of the study: What is the total library size, how many genes are identified per cell, how many UMIs are found per cell, what is the doublet rate, and how are doublets identified (e.g. on the basis of heterozygous calls at polymorphic loci?), how many times is each genotype observed, and how many polymorphic sites are identified per cell that are the basis of genotype inferences?

      We understand that these metrics are relevant to the reader to have an idea of the power of our approach and integrate them in the manuscript in Table 1.

      (MJ2) The prior study analyzed 18 different conditions, whereas this study only assays expression in a single condition. However, the power of the authors' approach is that its efficiency enables testing eQTLs in multiple conditions. The study would be greatly strengthened through analysis of at least one more condition, and ideally several more conditions. The previous fitness study would be a useful guide for choosing additional conditions as identifying those conditions that result in the greatest contrasts in fitness QTL would be best suited to testing the generalizations that can be drawn from the study.

      We agree that a major strength of our approach is that it rapidly allows eQTL mapping in several conditions. While the experiments presented here are likely less expensive than the classical eQTL mapping experiments, the cost of 10x genomics and sequencing is still an important consideration. The pleiotropy analysis of the prior study was substantially difficult to interpret and put in context, and thus we decided to focus on a proof of concept and leave room for a more thorough analysis of multiple environments for a future study. We acknowledge that this is a main weakness of our manuscript.

      (MJ3) Alternatively, the authors could demonstrate the power of their approach by applying it to a cross between two other yeast strains. As the cross between BY and RM has been exhaustively studied, applying this approach to a different cross would increase the likelihood of making novel biological discoveries.

      We thank the reviewers for this suggestion, and it is indeed something that our lab is considering. Currently, one of our main point of the manuscript still relies on growth measurements of segregants (the fitness), which we cannot obtain from segregants and scRNA-seq alone. 

      Unfortunately, in this experimental design, it is difficult to obtain the fitness of cells and the genotype simultaneously because the barcode of the segregant is not expressed and not frequently read during genotyping. Thus, we still need to perform a whole QTL panel for a new cross without substantial re-engineering. 

      That being said, we are working on this but feel that including a new panel in this study is beyond the scope of our manuscript. 

      (MJ4) Figure 1 is misleading as A presents the original study from 2022 without important details such as how genotypes were identified. It is unclear what the barcode is in this study and how it is used in the analysis. Is the barcode for each lineage transcribed so that it is identified in the scRNA-seq data? Or, does the barcode in B refer to the cell index barcode? A clearer presentation and explanation of terms are needed to understand the method.

      Because F2 segregant lineage barcodes are not expressed, the barcode indicated in Figure 1B refers to cell barcodes from 10x Genomics. Our present study does not make use of the lineage barcode. We clarified this in the figure clarifying that panel A refers to the original study from 2022 and explicitly mentioning ‘cell barcodes’. 

      (MJ5) The rationale for the analysis reported in Figure 2B is unclear. The fitness data are from the previous study and the goal is to estimate the heritability using the genotyping data from the scRNA-Seq data. What is the explanation for why the data don't agree for only one condition, i.e. 37C? And, what are we to understand from the overall result?

      The rationale of Figure 2A/B is to show that cell lineage genotyping with scRNA-seq yields consistent results with previous genotype-phenotype analyses of the same cross. While Figure 2A shows that the single-cell imputed genotypes resemble the reference panel (sequenced in the Nguyen Ba 2022 study), Figure 2B shows that the variance partitioning to associate genotype to phenotype can be performed using the single-cell genotypes themselves (bypassing the reference panel). We believe this is an interesting result given that the reads obtained by scRNA-seq are constrained to a subset of SNP. However, we note that if the imputed single-cell genotypes were perfectly matching with the reference panel, it would not be surprising that one could do genotype-phenotype mapping from the single-cell genotypes.

      In Figure 2B, we tested whether the similarity of the single-cell imputed genotypes to the reference panel was enough to estimate heritabilities (another summary statistic). 

      In the remaining paragraphs of that result section, we further discuss that the single-cell lineage genotypes can be used for QTL mapping as well, recapitulating many of the QTL identified in the reference panel (provided that one controls for power). This result did not make it as a main Figure but is included in Figure S4.

      That being said, we decided to update the figure by comparing the estimates in subsamples of batch1 scRNA-seq to subsamples of batch 1 reference panel and subsamples of the full reference panel. Subsamples were performed to control for power in the variance partitioning. We also noticed that the fitness of several F2 segregants is missing for the phenotypes 33C, 35C and 37C in the original study so we decided to exclude these environments.

      (MJ6) Figure 3 presents an analysis of variance partitioning as a Venn diagram. This summarized result is very hard to understand in the absence of any examples of what the underlying raw data look like. For example, what does trait variation look like if only genotype explains the variance or if only gene expression explains the variance? The presented highly summarized data is not intuitive and its presentation is poor - the result that is currently provided would be easier to read in a table format, but the reader needs more information to be able to interpret and understand the result.

      The Venn diagram is largely adopted in the context of variance partitioning (see Cohen, Jacob, and Patricia Cohen. 1975. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences.) but we realize that it has not been used often for displaying heritability estimates. To this end, we have added explanatory labels for the biological meaning of the areas or components of the diagram in the Figure and in the text. 

      (MJ7) I am concerned about the conclusions that can be drawn about expression heritability. The authors claim that expression heritability is correlated with expression levels. It seems likely that this reflects differing statistical power. How can this possibility be excluded?

      We thank the reviewer for highlighting this. We now explicitly acknowledge this potential confounding factor in the manuscript.

      (MJ8) Conversely, the authors claim that the genes with the lowest heritability are genes involved in the cell cycle. However, uniquely in scRNA-seq, cell cycle regulated genes appear to have the highest variance in the data as they are only expressed in a subset of cells. Without incorporating this fact one would erroneously conclude that the variation is not heritable. To test the heritability of cell cycle regulation genes the authors should partition the cells into each cell cycle stage based on expression.

      The reviewer is right to say that the low heritability of cell cycle control genes could be explained by the fact that these genes are only expressed in a subset of the dataset. Indeed, a high transcriptomic variance does not necessarily imply a low expression heritability: the cell cycle could be the residual of the expression heritability model, i.e. it explains expression variance with low association to genetic mutation.

      That being said, our result is consistent with results obtained from yeast bulk RNA-seq (Albert et al. 2018), in which cell cycle is averaged out. 

      In our study, we also average out the cell-cycle as we use the consensus expression and the consensus genome to estimate the heritability.

      (MJ9) I do not understand Figure S5 and how eQTL sites are assigned to these specific classes given that the authors say that causative variation cannot be resolved because of linkage disequilibrium.

      The rationale for Figure S5 is to show that the QTL model obtained from single-cell data is consistent with the reference panel QTL mapping experiment. Although there is uncertainty around the exact position of the QTL, we relied on the loci with the highest likelihood and showed that the datasets have consistent features. This is enabled by the fact that the QTL identified using the scRNA-seq genotypes are the ones with largest effect size in the reference panel, and are thus more likely to be mapped accurately.

      (MJ10) The paragraph starting at line 305 is very confusing. In particular, the authors state that they identify a hotspot of regulation at the mating type locus. It is not obvious why this would be the case. Moreover, they claim that they find evidence for both MATa and MATalpha gene expression. Information is not provided about how segregants were isolated, but assuming that the authors did not dissect 25,000 tetrads to obtain 100,000 segregants I would infer that random spore using SGA was used. In that case, all cells should be MATa. The authors should clarify and explain this observation.

      Although most of the cells have the MATa mating type (as selected by random spore using SGA), it is well known and discussed in Nguyen Ba et al. paper that there are few lineages with other mating types or diploids (they are leakers in the selection process). 

      Indeed, we verified that we can detect a small number of MATalpha cells or diploids within this pool.

      (MJ11) Ultimately, it is not clear what new biological findings the authors have made. There are no novel findings with respect to causative variation underlying eQTLs and I would encourage the authors to make clearer statements in their abstract, introduction, and conclusion about the key discoveries. E.g. What are the "new associations between phenotypic and transcriptomic variations" mentioned in the abstract?

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      Reviewer #2:

      (MJ1) Most of the figures center on methods development and validation for the authors' single-cell RNA-seq in the yeast cross […] One potential novelty of the study is the methods per se: that is, showing that scRNA-seq works for concomitant genotyping and gene expression profiling in the natural variation context. The authors' rigor and effort notwithstanding: in my view, this can be described as modest in terms of principles. That is, the authors did a good job putting the scRNA-seq idea into practice, but their success is perhaps not surprising or highly relevant for work outside of yeast (as the discussion says).

      Although the scope of the method is limited, we think that it can apply to any largescale dataset in which transcription variance and genetic diversity are not small. This can help reduce the lack of associations between trait heritability and expression regulation, which is frequent as these two parameters are often not measured within the same dataset. 

      We can, however, think of some other settings where a similar experiment may be interesting. This includes, for example, pooling cells from different human individuals (with enough genetic diversity) and applying the same scRNA-seq method to back-identify the individuals and matching them to a particular phenotype. We believe our proof of concept is therefore an important contribution as these other experiments might have broad implications.

      (MJ2) The more substantive claim by the authors for the impact of the study is that they make new observations about the role of expression in phenotype (lines 333-335). The major display item of the manuscript on this theme is Figure 4A, reporting which loci that control growth phenotype (from an earlier paper) also control expression. This is solid but I regret to say that the results strike me as modest.

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      (MJ3) The discussion makes some perhaps fairly big claims that the work has helped "bridge understanding of how genetic variation influences transcriptomic variation" and ultimately cellular phenotype. But with the data as they stand, the authors have missed an opportunity to crystallize exactly how a given variant affects expression (perhaps in waves of regulators affecting targets that affect more regulators) and then phenotype, except for the speculations in the text on lines 305-319. The field started down this road years ago with Bayesian causality inference methods applied to eQTL and phenotype mapping (via e.g. the work of Eric Schadt). The authors could now try Mendelian randomization-type fine-grained detailed models for more firepower toward the same end, and/or experimental tests of the genotype-to-expression-to-phenotype relationship. I would see these directions, motivated by fundamental questions that are relevant to the field at large, as leading to a major advance for this very crowded field. As it stands, I felt their absence in this manuscript especially if the authors are selling principles about linking expression and phenotype as their take-home.

      We thank the reviewer for this suggestion and agree that the analysis of the genotypeto-expression-to-phenotype relationship would benefit from a more fine-grain model. While we are interested in exploring this, we decided to limit the scope of this manuscript to the proof of concept that scRNA-seq can help gain insights about the genotypephenotype map at a broader scale.

      (MN1) I also wonder whether the co-mapping of expression and growth traits in Figure 4A would have been possible with e.g. the bulk RNA-seq from Albert et al., 2018, and I recommend that the authors repeat the Figure 4A-type analyses with the latter to justify their statement that their massive scRNA data set would actually be necessary for them to bear fruit (lines 386-388).

      By repeating our eQTL hotspot analysis with Albert et al. (2018) data, we observed a non-significant association between eQTL hotspot and QTL (χ2 p = 0.50). That being said, there are some differences in the Albert et al. Experiment that preclude us from conclusively saying whether the bulk RNA-seq experiments by Alberts would not bear fruit. Indeed, that experiment is only 4 times smaller in scale and so we would not expect dramatic differences. To highlight power differences, the Albert et al. Paper identified about 6 eQTL per gene, while our study identified about 21 which is consistent with the power differences.

      This highlights that this scRNA-seq experiment is scalable, so the technique may be useful for further studies. In addition, this pooled scRNA-seq strategy enables analysis of the association of transcription with phenotype.

      (MN2) I also read the discussion of the manuscript as bringing to the fore some of the challenges a reader has in judging the current state of the results to be of actionable impact. The discussion, and the manuscript, will be improved if the authors can put the work in context, posing concrete questions from the field and stating how they are addressed here and what's left to do.

      We agree with the reviewer and have summarized our answers to some of the questions in the field in the discussion section.

      All that being said, we acknowledge the limitations of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigated how root cap cell corpse removal affects the ability of microbes to colonize Arabidopsis thaliana plants. The findings demonstrate how programmed cell death and its control in root cap cells affect the establishment of symbiotic relationships between plants and fungi. Key details on molecular mechanisms and transcription factors involved are also given. The study suggests reevaluating microbiome assembly from the root tip, thus challenging traditional ideas about this process. While the work presents a key foundation, more research along the root axis is recommended to gain a better understanding of the spatial and temporal aspects of microbiome recruitment.

      We thank Reviewer #1 for their positive evaluation and critical feedback.

      Reviewer #2 (Public Review):

      Summary:

      The authors identify the root cap as an important key region for establishing microbial symbioses with roots. By highlighting for the first time the crucial importance of tight regulation of a specific form of programmed cell death of root cap cells and the clearance of their cell corpses, they start unraveling the molecular mechanisms and its regulation at the root cap (e.g. by identifying an important transcription factor) for the establishment of symbioses with fungi (and potentially also bacterial microbiomes).<br /> Strengths:

      It is often believed that the recruitment of plant microbiomes occurs from bulk soil to rhizosphere to endosphere. These authors demonstrate that we have to re-think microbiome assembly as a process starting and regulated at the root tip and proceeding along the root axis.

      Weaknesses:

      The study is a first crucial starting point to investigate the spatial recruitment of beneficial microorganisms along the root axis of plants. It identifies e.g. an important transcription factor for programmed cell death, but more detailed investigations along the root axis are now needed to better understand - spatially and temporally - the orchestration of microbiome recruitment.

      We appreciate Reviewers #2 insightful comments and agree that further investigations are needed to gain a deeper understanding of the intricate interplay between the spatial and temporal recruitment of the microbiome and developmental cell death in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Given that the smb-3 altered PCD phenotype has already been reported in several publications, the aim of using Evans blue staining to highlight LRC cell corpses along the root surface of smb-3 is not clear. Maybe S1 would be more informative as main figure.

      As an indicator of membrane integrity loss and cell death, Evans blue staining was used to characterize all dPCD mutants described in this study and their interactions with S. indica. To avoid redundancies with other publications, we restructured Figure 1, incorporating panel S1A to provide an introductory overview of the smb-3 phenotype. The former Figure 1B is now located in Figure S1.

      - It is not clear how the analysis of protein aggregates fits into the rationale, why analyze these formations? What role should they have in the process of PCD or interaction with microbes?

      The manuscript has been modified the following way to clarify the analysis of protein aggregates in the dPCD mutants: “The transcription factor SMB promotes the expression of various dPCD executor genes, including proteases that break down and clear cellular debris and protein aggregates following cell death induction. In the LRCs of smb-3 mutants, the absence of induction of these proteases potentially explains the accumulation of protein aggregates in uncleared dead LRC cells.”.

      - Is the accumulation of misfolded and aggregated proteins also present during physiological PCD of LRC cells in the WT?

      The biochemical mechanisms underlying PCD can vary depending on the affected cell types and tissues. Within the root tip of Arabidopsis, two different modes of PCD have been described, differentiating between columella root cap cells and LRC cells. For clarification the manuscript has been adjusted the following way:” Under physiological conditions in WT roots, we previously observed protein aggregate accumulation in sloughed columella cell packages, but not during dPCD of distal LRC clearance (Llamas et al., 2021). This aligns with the findings that dPCD of the columella is affected by the loss of autophagy, while dPCD of the LRC is not (Feng et al., 2022).”.

      - I suggest being more careful when using the term "root cap" instead of "LRC" to reduce ambiguity (i.e. lines 56; 137), maybe you need to double-check the text.

      We agree with the reviewer that a clear distinction between “root cap” and “LRC” is very important. We have adjusted the manuscript to avoid any misunderstandings.

      - A technical question regarding qPCR sample preparation: doesn't washing the smb-3 roots cause a loss of LRC stretched cells and would it therefore lead to an alteration of the results?

      The mechanical washing of roots is essential to ensure a clear distinction between intraradical fungal growth and accommodation around roots. While we cannot exclude the possibility that mechanical washing removes LRC cells, intraradical quantification of fungal biomass aims to measure S. indica growth in the epidermal and cortical cell layers, underneath the uncleared LRC cells. Thus, we complemented this assay with extraradical colonization assays to quantify external fungal biomass with intact LRC cells.

      - It is not clear if S. indica promotes PCD in wt and/or in smb-3, could you comment on it?

      It remains an open question whether and to what extent S. indica promotes PCD, although there are strong indications that this fungus activates different cell death pathways at various developmental stages, including dAdo mediated cell death. We posit that certain microbes have evolved to regulate and manipulate different dPCD processes to enhance colonization, implicating a complex crosstalk between various PCD pathways. We have adjusted the manuscript to underscore this perspective the following way:” Transcriptomic analysis of both established and predicted key dPCD marker genes revealed diverse patterns of upregulation and downregulation during S. indica colonization. These findings provide a valuable foundation for future studies investigating the dynamics of dPCD processes during beneficial symbiotic interactions and the potential manipulation of these processes by symbiotic partners.”.

      - How analysis of BFN1 expression in whole root confirms its downregulation at the onset of cell death in S. indica-colonized plants. Moreover, is the transcriptional regulation of BFN1 important for PCD, or is the BFN1 protein level correlated with the establishment of cell death?

      BFN1 gene expression in Arabidopsis shows a transient decrease around 6–8 days after S. indica inoculation, coinciding with the proposed onset of S. indica-induced cell death. While we can only speculate on a potential correlation between BFN1 downregulation and the onset of S. indica-induced cell death, we have described other pathways through which S. indica induces cell death. For example, it produces small metabolites such as dAdo through the synergistic activity of two secreted fungal effector proteins (Dunken et al., 2023). This suggests that S. indica recruits different pathways to induce cell death, which may vary depending on the host plant and interact with each other as shown for many other immunity related cell death pathways which share some components.

      Regarding the second part of the question, BFN1 expression correlates positively with cells primed for dPCD (Olvera-Carrillo et al., 2015). BFN1 protein accumulates in the ER lumen and is released into the cytoplasm upon cell death induction to exert its DNase functions (Fendrych et al., 2014). If accumulation of BFN1 is cause or consequence of cell death remains to be validated.

      - Line 190: there is a typo "in the nucleus", this is superfluous given that the reporter is nuclear.

      The manuscript has been adjusted accordingly; see line L208. However, we consider the distinction important as we aim to emphasize the difference between the nuclear localization of the fluorescent signal in "healthy" cells and the dispersed fluorescent signal spreading in the cytoplasm of cells priming or undergoing dPCD.

      - Line 255: there is a typo, stem cells can not differentiate.

      The manuscript has been adjusted.

      - During root hair development some epidermal cells undergo PCD to allow the emergence of root hairs. Furthermore, during plant defense against pathogens, epidermal cells undergo cell death to prevent further colonization. Have these cell death events been reported to occur under physiological conditions during development?

      Plant defence responses in roots and the hypersensitive response (HR) still remain largely unexplored. The HR is a defence mechanism that consists of a localized and rapid cell death at the site of pathogen invasion. It is triggered by pathogenic effector proteins, usually recognized by intracellular immune receptors (NLRs), and accompanied by other features such as ROS signalling, Ca2+ bursts and cell wall modifications (Balint-Kurti, 2019). Notably, HR has been widely described in leaves, but no strong evidence has been shown for the occurrence of HR in plant roots (Hermanns et al., 2003, Radwan et al., 2005). Additionally, previous studies have not shown any transcriptional parallels between common dPCD marker genes and HR PCD in Arabidopsis (Olvera-Carrillo et al., 2015; Salguero-Linares et al., 2022).

      While S. indica is a beneficial root endophyte that does not induce classical hypersensitive response (HR) in host plants, the impact of dPCD on S. indica colonization should not be overlooked. S. indica promotes root hair formation in its hosts (Saleem et al., 2022), and in Arabidopsis, root hair cells naturally undergo cell death 2–3 weeks after emergence (Tan et al., 2016). This aspect could be particularly relevant for understanding the dynamics of S. indica colonization.

      - Showing the analysis of pBFN1 in smb-3 would help in validating the idea that the downregulation of BFN1 by S. indica is regulated independently of SMB.

      SMB is known to be a root cap specific transcription factor (Willemsen et al., 2008; Fendrych et al., 2014). The pBFN1:tdTOMATO reporter line shows that BFN1 expression occurs in many different tissues undergoing dPCD, above and below ground, where SMB is not expressed or present. Therefore, we can postulate that the downregulation of BFN1 by S. indica in the differentiation zone is regulated independently of SMB.

      - A question of great interest still remains open: is it the microbe that induces the regulation of BFN1 causing a delay in cell clearance and favoring the infection or is it the plant that reduces BFN1 to favor the interaction with the microbe? In other words, is the mechanism a response to stress or a consolidation of the interaction with the host?

      We agree with this reviewer that this question remains open. Whether active interference by fungal effector proteins, fungal-derived signaling molecules, or a systemic response of Arabidopsis roots underlies BFN1 downregulation during S. indica colonization remains to be investigated. Yet, it is noteworthy that the downregulation of BFN1 in Arabidopsis is not specific to S. indica but also occurs during interactions with other beneficial microbes such as S. vermifera and two bacterial synthetic communities. This suggests that it could be a broader plant response to microbial presence. However, at this stage, we can only speculate on these possibilities. We therefore changed some of the statements in the paper to moderate our conclusions: e.g. “Expression of plant nuclease BFN1, which is associated with senescence, is modulated to facilitate root accommodation of beneficial microbes” to leave open who exactly is controlling BFN1, the plant or the microbes.

      Reviewer #2 (Recommendations For The Authors):

      This is a straightforward study, well executed and well written. I have only a few specific comments, and some concern the statistics which is a bit more serious and where I would like to get answers first. Looking at the figures, I am sure that the authors can easily clarify the issues in the manuscript.

      We appreciate the positive feedback and included clarifications in the statistical section in the material and methods.

      Statistics:

      - The statistics are not detailed in Material and Methods, but are only briefly indicated in the headings of the figures. Include a statistics section in Material and Methods.

      We added an extra paragraph with statistical analysis in the Material and Method section for clarifications, which reads as follows:” All statistical analyses, except for the transcriptomic analysis, were performed using Prism8. Individual figures state the applied statistical methods, as well as p and F values. p-values and corresponding asterisks are defined as following, p<0.05 *, p<0.01**, p<0.001***.”.

      - Figure 1/ Figure S3, etc: First of all, a **** with p< 0.00001 does not exist! Significance in statistics just means that we assume that there is a difference with some kind of probability that has been defined as p<0.05 *, p<0.01**, p<0.001***, and NOT more! Even if p<0.000001, it is still p<0.001***. Stating the meaning of asterisks in a separate Statistics section in Materials and Methods would also avoid repetitive explanations (e.g. Figure 4, L68: 'Asterisk indicates significantly different...').

      We agree and have updated the manuscript accordingly. See comment above.  

      - Also, it is advisable to reduce the digits of the p-values to a meaningful length (e.g. Figure 2 L 36: (*P<0.0466) should be (F[1, ?] = ?; p<0.047). The * is not necessary in the text, as p<0.05 is already given. We do not obtain more information by a more exact p-value, because all we need to know is that p<0.05.

      We adjusted the p-values accordingly throughout the manuscript.

      - It is NOT sufficient to communicate just the p-value of a statistical analysis. What is always needed is the F-value (student test and ANOVA) with both nominator and denominator degrees of freedom (e.g. F[2, 10] =) AND the p-value.

      We included F-values throughout the manuscript in all main and supplemental figures to provide more clarity for the readers.

      - The reason becomes clear in Fig. 2D where the authors state that they used 3 biological replicates, each with 40 plants. I assume the statistics was wrongly based on calculating with 120 plants (F[1,120] =) as technical replicates instead of correctly the biological replicates (3 means of 40 technical replicates each, (F[1,3] =))?? If F-value and df had been given, errors like this would be immediately visible - for any reviewer/reader, but also to the authors.<br /> \=>Please re-analyze the statistics correctly.

      To assess S. indica-induced growth promotion, we measured and compared the root length of Arabidopsis plants under S. indica colonization or mock conditions at three different time points. Each genotype and treatment combination involved measuring 50 plants, with each plant serving as an independent biological replicate inoculated with the same S. indica spore solution. For comprehensive statistical analysis, we conducted the experiment a total of 3 times, using fresh fungal inoculum each time, originally referred to as "three biological replicates." We maintain that including all plant measurements is essential for a thorough statistical analysis of our growth promotion experiment. However, in order to avoid confusion, we have updated the figure legend to clarify the experimental set-up as following: “(D) Root length measurements of WT plants and smb-3 mutant plants, during S. indica colonization (seed inoculated) or mock treatment. 50 plants for each genotype and treatment combination were observed and individually measured over a time period of two weeks. WT roots show S. indica-induced growth promotion, while growth promotion of smb-3 mutants was delayed and only observed at later stages of colonization. This experiment was repeater 2 more independent times, each time with fresh fungal material. Statistical analysis was performed via one-way ANOVA and Tukey’s post hoc test (F [11, 1785] = 1149; p < 0.001). For visual representation of statistical relevance each time point was additionally evaluated via one-way ANOVA and Tukey’s post hoc test at 8dpi (F [3, 593] = 69.24; p < 0.001), 10dpi (F [3, 596] = 47.59; p < 0.001) and 14dpi (F [3, 596] = 154.3; p < 0.001).”

      - Figure 2, L 18; Figure 5, L 95, Figure S5 L53, etc: I am worried about executing a statistical test 'before normalization' - what does it mean?? WHY was a normalization necessary, WHAT EXACTLY was normalized and do we see normalized plots that do NOT correspond to the data on which the statistics was based? At least this implies 'before normalization'! Please explain, and/or re-analyze the statistics correctly.

      We agree that the phrasing “before normalization” may lead to confusion, as the normalization of data to the mean of the control group does not alter the statistical analysis. Normalization was performed to achieve a clearer visual representation. Additionally, Evans blue staining is quantified by measuring the mean grey value, which does not correspond to a specific unit. Normalizing the data allows for the representation of relative staining intensities. The manuscript has been adjusted accordingly throughout.

      - Statistics in Figure 1: L8/9: 'in reference to B' is unclear, I guess the mean of the control was used as a reference? This would also explain the variation in relative staining intensity (Figure 1C). if normalization was carried out (see above) all control (WT) values should be exactly 1, but they are not. I guess it was normalized to the mean of the control?

      “In reference to X” or “corresponding to X” typically means that Figure X shows an example image from the dataset on which the statistical quantification is based. We have updated the manuscript throughout the main and supplemental figure legends to use “refers to image shown in X” to avoid confusion.  

      Figure S4, L 42: '(corresponding to A)', see comment above.

      See comment above.

      Figure 5B, L 87: '(in reference to A)'; L93: (in reference to C), etc. - see above. Unclear how A was used as a reference. Was it the mean of A? BUT again only 3 biological replicates! So it has to be the mean of 3 reps that was used as control! OR can we at least say that the 10 measured roots were independent of each other (crucial (!) precondition for executing student's test or ANOVA? Then you would have at least 10 replicates (mean of 4 pictures taken per root for each).

      Quantification of Evans blue staining intensity involved taking 4 pictures along the main root axis of each plant. We re-evaluated the statistical analysis correctly with the averaged datapoints for each plant root. We adjusted main figures (Fig.1C and 5B) and supplementary figures (Fig. S1C and S4B) and changed the material and methods section of the manuscript as following: “4 pictures were taken along the main root axis of each plant and averaged together, for an overview of cell death in the differentiation zone.”.

      - Statistics in Figure 4, L 69: what means 'adjusted p-value'? Which analysis?

      The material and method section of the manuscript has been adjusted as following for clarification: “Differential gene expression analysis was performed using the R package DESeq2 (Love et al., 2014). Genes with an FDR adjusted p-value < 0.05 were considered as differentially expressed genes (DEGs). The adjusted p-value refers to the transformation of the p-value obtained with the Wald test after considering multiple testing. To visualize gene expression, genes expression levels were normalized as Transcript Per kilobase million (TPM).”.

      - Statistics in Figure 5, L102-105: see above! Were the statistics correctly calculated with 7 reps, or wrongly with 30? # I guess each time point was normalized to the mean of WT? By the way, it is not clear if repeated measurements were done on the same plants. If repeated measurements were done on the SAME plants, then these data are statistically not independent anymore (time-series analysis), and e.g. MANOVA must be used and significant (!) before proceeding to ANOVA and Tukey.

      The statistics for quantifying intraradical colonization of Arabidopsis roots were calculated with 7 replicates. For each replicate, 30 plants were pooled to obtain sufficient material for RNA extraction and cDNA synthesis. Plants from the same genotype were harvested separately for each time point, ensuring that the time points are statistically independent from one another.

      Statistics Fig. S1, L 11-12: see above, '5 plants were imaged for each mock and ..., evaluating 4 pictures ...' That means you have means of 4 pictures for 5 biological replicates - the figure shows 20 replicates. However, the statistics must be based on 5 reps! You may indicate the 4 pictures per root by different colours. Change throughout all figures and calculate the statistics correctly (show this by indicating the correct df in your statistics as discussed above).

      We have conducted a re-evaluation of the statistical analysis of Evans blue staining for all figures presented throughout the manuscript. See comment above.

      Statistics Fig. S3, L 31: 'Relative quantification of ...' see above, relative to what? Explain this also clearly in Statistics in Materials and Methods.

      Relative quantification refers to normalizing data to the mean of the corresponding control group. Figure legends have been revised to clarify this point.

      Statistics Fig. S5, L 57/58: 'Genes are clustered using spearmen correlation as distance measure'. If I understand it correctly, Spearman correlation is NOT a distance measure. You used Spearman correlation to cluster gene expression. Now it would be interesting to know WHICH clustering method was used, e.g. a hierarchical or non-hierarchical clustering method? and which one, e.g. single linkage, complete linkage? The outcome depends very much on the clustering method. Therefore, this information is important.

      To perform gene clustering, we set the option “clustering_distance_rows = "spearman" “ of the Heatmap function included in the ComplexHeatmap package. The function first computes the distance matrix using the formula 1 - cor(x, y, method) with Spearman as correlation method. It then performs hierarchical clustering using the complete linkage method by default.

      # Arabidopsis is a genus name and by convention, this has to be written throughout the MS in italics - even if the authors define Arabidopsis thaliana (in italics) = Arabidopsis (without).

      # typos

      L 24: smb-3 mutants (must be explained)

      L 83 insert: ...two well-characterized SMB loss-of-function ...

      While smb-3 is a SMB loss-of-function mutant bfn1-1 is a BFN1 loss-of-function mutant, independent of SMB.

      L 93: The switch between the biotrophic..

      L 119: distal border

      L 125: aggregates in the smb-3 mutant

      L 132: between the meristematic

      L 177/178: was observed at 6 dpi in Arabidopsis colonized by S. indica.

      L 250: colonization stages by S. indica.

      L 288: and root cell death (RCD)

      L 289: and towards...

      L 296: dPCD protects the

      L 304: This raises the

      L 351: to remove loose

      All the above-mentioned typos have been addressed in the manuscript.

      Materials and Methods

      L 327: give composition and supplier of MYP medium

      L 344 name supplier of MS medium

      L 338 name supplier of PNM medium

      L 353: replace 'Following,..' with 'Subsequently, ..'

      L 360: replace 'on plate' with 'on the agar plate' - change throughout the Materials and methods!

      L 360: name supplier of Alexa Fluor 488

      L 363: name supplier of (MS) square plate

      L 377: insert comma: After cleaning, the roots...

      L 394: explain the acronym and name supplier of PBS

      L 399: explain the acronym and name supplier of TBST

      All the above-mentioned comments in the material and methods have been addressed in the manuscript.  

      Figure 2G) x-axis, change order: Hoechst/Proteostat

      Figure 3, L53: propidium iodide: name supplier

      Figure 4, L68: Asterisks

      L 60: explain LRC

      L 67, L69, L70: explain the acronym TPM and how expression values were measured in Materials and Methods, the brief explanation in the figure is unclear and not sufficient

      All the above-mentioned comments in the figure legends have been addressed.

      Figure S5, L50: explain 'SynComs'

      L 51: corrects 30 plans => 30 plants

      L 56: vaules => values

      L 57: use capital letter: Spearman correlation

      All the above-mentioned comments in the supplemental figure legends have been addressed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the role of orexin receptors in dopamine neurons is studied. Considering the importance of both orexin and dopamine signalling in the brain, with critical roles in arousal and drug seeking, this study is important to understand the anatomical and functional interaction between these two neuromodulators. This work suggests that such interaction is direct and occurs at the level of SN and VTA, via the expression of OX1R-type orexin receptors by dopaminergic neurons.

      Strengths:

      The use of a transgenic line that lacks OX1R in dopamine-transporter-expressing neurons is a strong approach to dissecting the direct role of orexin in modulating dopamine signalling in the brain. The battery of behavioural assays to study this line provides a valuable source of information for researchers interested in the role of orexin-A in animal physiology.

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses:

      The choice of methods to demonstrate the role of orexin in the activation of dopamine neurons is not justified and the quantification methods are not described with enough detail. The representation of results can be dramatically improved and the data can be statistically analysed with more appropriate methods.

      We have further improved our description of the methods in the revised reviewed preprint, and here in the response letter, we respond point-by-point to ‘Reviewer #1 (Recommendations For The Authors)’ below. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examines the expression of orexin receptors in the midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that dopamine neurons predominantly express the orexin receptor 1 subtype and then go on to delete this receptor in dopamine neurons using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that in the absence of this receptor orexin is no longer able to excite dopamine neurons of the substantia nigra.

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin.

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus on changes in the dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from the midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths:

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:

      (1) The distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.

      (2) Use of the genetic model that knocks out a specific orexin receptor subtype from only dopamine neurons is a useful model and helps to narrow down the behavioral significance of this interaction.

      (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially since two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood.

      We thank the reviewer for the careful summary and highlighting the novelty of our study.

      Weaknesses:

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies does not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. In addition, some more discussion on what the results tell us about orexin signaling to dopamine neurons under normal physiological conditions would be very useful. For example, what is the relevance of the orexin-dopamine interaction blunting noveltyinduced locomotion under wildtype conditions?

      We agree that focusing on some orexin-dopamine targeting areas, such as dBNST or LPGi, is important to further reveal the anatomy-behavior links and underlying mechanisms. While we are very interested in further investigations, in the present manuscript we mainly aim to give an overview of the behavioral roles of orexin-dopamine interaction and to propose some promising downstream pathways in a relatively broad and systematical way. 

      We have explained the physiological meanings of our results in more detail in the discussion in the revised reviewed preprint (lines 282-293, 318-332, ). Novelty-induced behavioral response should be at proper levels under normal physiological conditions. The orexin-dopamine interaction blunting novelty-induced locomotion could be important to keep attention on the main task without being distracted too much by other random stimuli in the environment. When this balance is disrupted, behavioral deficit may happen, such as attention deficit and hyperactivity disorder (ADHD).  

      In some places in the Results, insufficient explanation and reporting is provided. For example, when reporting the behavioral effects of the Ox1 deletion in two bottle preference, it is stated that "[mutant] mice showed significant changes..." without stating the direction in which preference was affected.

      For the reward-related behaviors described in this study, we did not find significant changes between [mutant] and control mice. We agree that it will be helpful for readers by describing the behavioral tests in more details. In the revised reviewed preprint, we have described in more detail in the results and Materials and Methods section how the control and [mutant] mice behave to the reward (lines 162-165, 171-181, 526-528).  

      The cocaine CPP results are difficult to interpret because it is unclear whether any of the control mice developed a CPP preference. Therefore, it is difficult to conclude that the knockout animals were unaffected by drug reward learning. Similarly, the sucrose/sucralose preference scores are also difficult to interpret because no test of preference vs. water is performed (although the data appear to show that there is a preference at least at higher concentrations, it has not been tested).

      We described the CPP analysis in the Materials and Methods section (lines 523-528 ) as below: ‘The percentage of time spent in the reward-paired compartment was calculated: 100 x time spent in the compartment / (total time - time spent in the middle area). The CPP score was then analyzed using the calculated percentage of time: 100 x (time on the test day – time on pre-test days)/ time on pre-test days. The pre-test and test days were before and after the conditioning, respectively. Thus, the CPP score above zero indicates that the CPP preference has developed.’ In Figure 2—figure supplement 4 C and F, it was shown that most control and knockout mice had a CPP score above zero. The control and knockout groups both developed a preference and there was no significant difference between the groups. 

      For the sucrose/sucralose preference tests, in Figure 2—figure supplement 4 A and D, we present values as the percentages of sucrose/sucralose consumption in total daily drinking amount (sucrose/sucralose solution + water). Thus, percentages above 50% indicates mice prefer sucrose/sucralose to water. As shown in the figure, male mice only showed weak preference of 0.5% sucrose, compared to water, and under all other tested conditions, the mice showed strong preference of the sweet solution. There was no significant difference between control and knockout mice. 

      We have described this in more details in the Results and Materials and Methods section in the revised reviewed preprint. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, A-I. It is difficult to depict the anatomical subdivision of VTA in Figure 1, panels A and B. It is recommended to add a panel showing a schematic illustration of the SNc and subregions of VTA: PN, PIF, PBP, IF (providing more detail than in Figure 1, panel J). It is also recommended to show lower magnification images (as in Figure 1 - supplement 1), including both hemispheres, and to delineate the outline of the different subregions using curved lines, based on reference atlases (similar to Figure 1, panel I, please include distance from bregma). It would be helpful to indicate in Figure 1 that panel A is a control mouse and panel B is a Ox1RΔDAT mouse and include C-F letters to show corresponding insets. Anatomically, the paraintrafasicular nucleus (PIF) is positioned between the paranigral nucleus (PN) and the parabrachial pigmented nucleus (PBP). The authors have depicted the PIF ventral to the PN in Figure 1 panels A, B, and I. These panels and the quantification of Ox1R/2R positive cells within the different subdivisions need to be corrected accordingly. The image analysis method used to quantify RNAscope fluorescent images is not described in sufficient detail. Please expand this section.

      According to the reviewer’s suggestions, we have refined Figure 1 in the revised reviewed preprint. We are now showing the schematic illustration of the SN and subregions of VTA in panel I, with blue squares to label the regions shown in panels A and B, and the distance from bregma is included. The outlines to delineate SN and the subregions of VTA are adjusted from straight to curved lines based on reference atlases. As suggested, we have also indicated panel A is a control and panel B is a Ox1RΔDAT mouse and included C-F letters to show corresponding insets. We apologize for the mistake about labeling PIF and PN positions in Figure A. We have corrected the labeling of their positions and double checked the quantification accordingly. This does not change our discussion or conclusion since both PIF and PN are the medial part of VTA, where both Ox1R and Ox2R are observed. The description of the image analysis in Matierials and Methods section has been improved (lines 378-385). We decided not to show lower magnification images than in Figure 1—supplement 1 to include both hemispheres, in the interests of clarity and reader-friendliness.  

      (2) Figure 1, J-L. The claim that orexin activates dopaminergic SN and VTA neurons is weakly supported by the data provided. Calcium imaging of SN dopaminergic neurons in control mice suggests a discrete effect of 100 nM orexin-A application compared to baseline. Application of 300 nM shows a slightly bigger effect, but none of these results are statistically analysed. 

      We are surprised by this comment and thank the reviewer for pointing out our apparent lack of clarity in the previous version (lines 96-106 and legend of Figure 1K, L). In more detail, we explain the data analysis in the new version (lines 119-133, 451-465) and the legend of Figure 1K, L and Figure 1-figure supplement 3).

      The main goal of this part of the project was to functionally validate the Ox1R knockout in dopaminergic (DAT-expressing) neurons. This was a prerequisite for the behavioral and PET imaging experiments. We used GCaMP-mediated Ca2+ imaging in acute brain slices to reach this goal. This analysis was performed on the dopaminergic SN neurons, which we used as an "indicator population" because a large number of these neurons express Ox1R, but only a few express Ox2R. 

      The analysis consisted of two parts:

      a) For each neuron, we tested whether it responded to orexin A. At the single cell level, a neuron was considered orexin A-responsive if the change in fluorescence induced by orexin A was three times larger than the standard deviation (3 σ criterion) of the baseline fluorescence, corresponding to a Zscore of 3. We found that 56% of the neurons tested responded to orexin A, while 44% of the neurons did not respond to orexin A (Figure 1L, top). These data agree with the number of Ox1R-expressing neurons (Figure 1J). 

      b) We also determined the orexin A-induced GCaMP fluorescence for each neuron, expressed as a percentage of GCaMP fluorescence induced upon application of high K+ saline. Accordingly, the "population response" of all analyzed neurons was expressed as the mean ± SEM of these responses. The significance of this mean response was tested for each group (control and Ox1R KO) using a onesample t-test. We found a marked and highly significant (p < 0.0001, n = 71) response of control neurons to 100 nM orexin A, while the Ox1R KO neurons did not respond (p = 0.5, n = 86). Note that, as described in a), 44% of the neurons contributing to the mean do not respond to orexin. Thus, the orexin responses of most responders are significantly higher than the mean. This is also evident in the example recordings in Figure 1K and Figure1—figure supplement 3. The orexin A-induced change in fluorescence was increased by increasing the orexin A concentration to 300 nM.

      Note: As mentioned above, the orexin A response was expressed for each neuron individually as a percentage of its high K+saline-induced GCaMP fluorescence. This value is a solid reference point, reflecting the GCaMP fluorescence at maximal voltage-activated Ca2+ influx. Obviously, the Ca2+ concentration at this point is extremely high and not typically reached under physiological conditions. Therefore, as shown in Figure1—figure supplement 3 for completeness, the physiologically relevant responses may appear relatively minor at first glance when presented together in one figure (compare Figure1—figure supplement 3 A and B).

      The authors should provide more evidence of the orexin-induced activation of dopaminergic neurons in the SN to support this claim and investigate whether a similar activation is observed in VTA neurons. 

      Following the reviewer's suggestion, we confirmed orexin A-induced activation of dopaminergic neurons in the mouse SN by using perforated patch clamp recordings (Figure1—figure supplement 2).

      This finding is consistent with previous extracellular in vivo recordings in rats (Liu et al., 2018).

      The activation of dopaminergic neurons in the mouse VTA by orexin A has been shown repeatedly in earlier studies (e.g., Baimel et al., 2017; Korotkova et al., 2003; Tung et al., 2016).

      In addition, Figure 3-Figure Supplement 2 shows that injection of orexin does not induce c-Fos expression in SN and VTA dopaminergic neurons of control and Ox1RΔDAT mice, which further weakens the claim made by the authors.

      Figure 3—Figure Supplement 2 in the original submission is now Figure 3—Figure Supplement 3 in the revised reviewed preprint. It shows low c-Fos expression in SN and VTA dopaminergic neurons, and orexin-induced c-Fos expression was observed in Th-negative cells in SN and VTA. 

      Technically relatively straightforward, Fos analysis is widely (and successfully) used in studies to reveal neuronal activation. However, this approach has limitations, e.g., regarding sensitivity and temporal resolution. Electrophysiological or optical imaging techniques can circumvent these shortcomings. The electrophysiological and Ca2+ imaging studies presented here, along with previous electrophysiological studies by others, clearly show that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons.

      In vivo, the injection of orexin A induced a pronounced c-Fos activity in non-dopaminergic cells of the VTA and SN but not in dopaminergic neurons. This result shows that the detection of c-Fos has worked in principle. Whether the absent c-Fos staining in dopaminergic neurons is due to lack of sensitivity, whether other IEGs would have worked better here, or whether there are other, e.g., cell type-specific reasons for the absence of staining, cannot be determined from the current data.

      (3) Figure 2, I-L. The fact that ICV injection of both saline and orexin causes a sustained increase of locomotion (around 20 minutes in males, and over 30 minutes in females) is problematic and could mask the effects of orexin, particularly in females. It is unclear what panels J and L are showing. To be appropriately analysed, the authors should plot the pre- and post-injection AUC data for all groups and analyse it as a two-way mixed ANOVA, with the within-subjects factor "pre/post injection activity" and between-subjects factor "group". The authors can only warrant a statistically meaningful hyperlocomotor effect in Ox1RΔDAT mice if a significant interaction is found.

      Though mice were habituated to the injection, it still makes sense to see the injection-induced increase in locomotion to some extent. We described in the figure legend that the AUC was calculated for the period after orexin injection, which meant 5 – 90 min in Figure 2 I, K. We have clearly observed significant differences between genotypes and between saline and orexin application, which means the genotype and orexin impact is strong enough to pop up despite of the injection effect. 

      As the reviewer’s suggests, we have now plotted the pre- and post-injection AUC data for all groups and analyzed it as a two-way mixed ANOVA, with the within-subjects factor "pre/post injection activity" and between-subjects factor "group". To match the pre- and post-injection duration, we are now comparing AUC for around 60 min before and after the injection. A significant interaction is found here. Panels I-L are renewed, and the differences induced by Ox1R knockout and orexin confirmed the results shown in the initially submitted manuscript.  

      (4) Figure 3. The literature has robustly shown that one of the main projection areas of VTA and SN dopaminergic neurons is the striatum, in particular its ventral part. It is surprising to see that this region is not affected by the lack of OX1R or by the injection of orexin. How can the authors explain that identified regions with significantly different activity include neighbouring brain structures with heterogenous composition? See for example, in panel A, section bregma 0.62mm, a significant region is seen expanding across the cortex, corpus callosum, and striatum. While the data from PET studies is potentially interesting, it may not be adequate to provide enough resolution to allow examination of the anatomical distribution of orexin-mediated neuronal activation.

      While the striatum is a major projection area of dopaminergic neurons in VTA and SN, the projection and function of Ox1R-positive dopaminergic neurons is not clear. We have improved the description of dopamine function diversity in the revised reviewed preprint (lines 46-58), and it was reported before that the projection-defined dopaminergic populations in the VTA exhibited different responses to orexin A (Baimel et al., 2017). Moreover, the striatum activity is modulated by the indirect effect via other brain regions affected by Ox1R-positive dopaminergic neurons. It is unknown how the striatum activity should change after Ox1R deletion in dopaminergic neurons. We could not rule out the possibility that the striatum is indeed modulated by the Ox1R-positive dopaminergic neurons, though there was only a trend of genotype difference (Ox1RΔDAT vs. ctrl) in the ventral striatum in the section bregma 1.42 mm in Figure 3A. The ICV injection of orexin is potentially acting on Ox1R and Ox2R in the whole brain, so projections from other brain regions to the striatum also affect striatum activity and could have masked the effect of Ox1R-positive dopaminergic neurons. 

      The spatial resolution of the PET data is in the order of ~1 mm^3. As we also explained in the Materials and Methods section, the size of a voxel in the original PET data is 0.4mm x 0.4mm x 0.8 mm. All calculations were performed on this grid. The higher-resolved images shown in Figure 3 are for presentation purposes only inspired by a request of the reviewer who asked us to show this in the Jais et al. 2016 manuscript. To make this clearer we now added the p-map images with the original voxel size to the supplement (Figure 3—figure supplement 1). For the interest in specific brain areas, more precise identification of anatomical sub-regions requires using methods with higher spatial resolution such as staining of brain slices for c-Fos-positive cells as we do in Figure 4.

      PET is a powerful tool to identify global regions of activation/inhibition. In the manuscript, we have described in the results and discussion section that the activity in brain regions with related functions were changed. In panel A, Ox1RΔDAT showed activity increase in MPA, Pir and endopiriform claustrum, which are important for olfactory sensation; spinal trigeminal nucleus, sp5, and IRt, which regulates mastication and sensation of the oral cavity and the surface of the face; SubCV and Gi, which regulates sleeping and motion-related arousal and motivation. In panel B, changes in HDB, MCPO, Pir, DEn, S1, V2L and V1 are related to sensation, and changes in BNST, LPGi and M2 are important for emotion, exploration, and action selection. 

      (5) Figure 4. As in Figure 1, the authors should consider including a schematic illustration of the brain areas that are being analysed using a reference atlas. It is also recommended to provide more details describing the quantification of the images. Without such information, the data is not convincing, in particular, the claim that Ox1R depletion causes a decrease in DRD1 in BNST is unclear. Additional unbiased quantitative approaches could be used to strengthen this point.

      We have added Figure 4—figure supplement 1 as a schematic illustration of the brain areas that were being analyzed using a reference atlas. More details describing the unbiased quantification of the images have been added to Materials and Methods. We have added Figure 4—figure supplement 3, to show DRD1, DRD2 and the merged signal separately.  

      (6) The discussion starts by stating that the main findings of this study are based on RNAscope and optophysiological experiments, however, the latter are not presented anywhere in the manuscript. This sentence (line 192) should be revised. The authors state in line 193 that OX1R is the only orexin receptor in the SN, but they show in Figure 1 that in the SN, 3% of neurons express OX2R and 2% co-express both receptors. 

      We thank the reviewer for the input. We have rephrased the beginning of the discussion to clarify the objectives (lines 238 - 246). In doing so, we changed "optophysiological experiments" and "single orexin receptor" (lines 192 and 193 in the original manuscript) to " Ca2+ imaging experiments" and "main subtype of orexin receptors ", respectively. In this context, it should be noted that Ca2+ imaging is considered an optophysiological method - optophysiology generally refers to techniques that combine optical methods with physiological measurements.

      The results of LPGi and BNST dopamine receptors in control and Ox1RΔDAT mice are poorly discussed. The authors should justify why these two regions were selected for further validation and how these may be related to the behavioural effects found in Ox1RΔDAT regarding exposure to a novel context.

      Ox1RΔDAT mice exhibited increased novelty- and orexin-induced locomotion compared to control mice. After orexin injection, PET imaging shows that the neural activity of BNST and LPGi was lower or higher than in control mice, respectively. We selected BNST and LPGi for further validation because we think their key functional roles in regulating emotion, exploratory behaviors and locomotor speed are related to novelty-induced locomotion. We confirmed changes in neural activity change by c-Fos staining and investigated the expression patterns of dopamine receptors in BNST and LPGi. Our findings suggested that Ox1R deletion in dopaminergic neurons results in the disinhibition of neural activity in LPGi via dopaminergic pathways and the decrease of dopamine-mediated neural activity in BNST. Emotion perception affects the decision of how to respond to the novelty. It is possible that novelty activates the orexin system and Ox1R signaling in dopaminergic neurons promotes emotion perception and inhibits exploration. Of course, further careful investigation is necessary to test this hypothesis in the future experiments. We have improved the rational description and discussion in the

      ‘Results’ and ‘Discussion’ section in the revised reviewed preprint (lines 210-213, 259-270, 293-308). 

      Reviewer #2 (Recommendations For The Authors):

      A major recommendation - if possible - would be to directly show that one or both of the two target areas - dBNST and LPGi - are associated with the behavioral effects caused by the deletion of the orexin receptor 1 in dopamine neurons.

      We completely agree that it would be very valuable to directly show dBNST and LPGi are associated with the behavioral effects caused by the deletion of Ox1R in dopaminergic neurons. While we are very interested in carefully investigating specific orexin-dopamine targeting areas and related neural circuits in the future, in the present manuscript, we mainly aim to give an overview of the behavioral roles of orexin-dopamine interaction and propose some promising downstream pathways. 

      The authors should state if data are corrected for multiple comparisons, e.g., in the PET study of different regions.

      We have included information about the post-hoc tests for all 2-way ANOVA analyses in the submitted manuscript. For the PET study, the p-values in the p-maps were not corrected for multiple comparison, Figure 3—figure supplement 2 shows the raw data of each mouse and the analysis method (t-test). In the revised reviewed preprint, we include the information on the analysis method in the figure legends of Figure 3. 

      We consider that saline and orexin injections mimic the resting and active state of mice, respectively, and would like to study genotype effect under each condition. Doing 2-way ANOVA takes in count the difference between orexin and saline injection, which could mask the genotype effect under a certain condition. Therefore, we decided to perform t-tests for each condition in Figure 3. While we provide readers with full information in Figure 3—figure supplement 2 with the raw data of each individual mouse, below we present the p-maps after multiple comparisons (Sidak’s post hoc test). After multiple comparisons, we could see changes in similar brain regions as in Figure 3, though significant values are reduced by the correction for multiple comparisons, and under orexin-injection condition, we fail to see significantly higher activity around the lateral paragigantocellular nucleus (LPGi), nucleus of the horizontal limb of the diagonal band (HDB) and magnocellular preoptic nucleus (MCPO) in Ox1RΔDAT mice. In order to more precisely identify the anatomical locations, we performed additional experiments to confirm the changes revealed by PET. For example, LPGi is a relatively small region confirmed and identified more precisely by c-Fos immunostaining (Figure 4A, C). 

      Author response image 1.

      PET imaging studies comparing Ox1RΔDAT and control mice, with post-hoc t-test to correct for multiple comparisons. 3D maps of p-values in PET imaging studies comparing Ox1RΔDAT and control mice, after intracerebroventricular (ICV) injection of (A) saline (NS) and (B) orexin A. Control-NS, n = 8; control-orexin, n = 6; Ox1RΔDAT, n = 8. M2, secondary motor cortex; MPA, medial preoptic area; Pir, piriform cortex; IEn, intermediate endopiriform claustrum; DEn, dorsal endopiriform claustrum; VEn, ventral endopiriform claustrum; LSS, lateral stripe of the striatum; BNST, the dorsal bed nucleus of the stria terminalis; S1Sh, primary somatosensory cortex, shoulder region; S1HL, primary somatosensory cortex, hindlimb region; S1BF, primary somatosensory cortex, barrel field; S1Tr, primary somatosensory cortex, trunk region; V1, primary visual cortex; V2L, secondary visual cortex, lateral area; SubCV, subcoeruleus nucleus, ventral part; Gi, gigantocellular reticular nucleus; IRt, intermediate reticular nucleus; sp5, spinal trigeminal tract.

      Provide a rationale for following up on BNST and LPGi and not any of the regions identified in the PET study.

      We thank the reviewer for the careful reading and important input. Ox1RΔDAT mice exhibited increased novelty- and orexin-induced locomotion compared to control mice. After orexin injection, PET imaging shows that the neural activity of BNST and LPGi was lower or higher than control mice, respectively.

      We selected BNST and LPGi for further validation because we think their key functional roles in regulating emotion, exploratory behaviors and locomotor speed are related to novelty-induced locomotion. We confirmed the neural activity change by c-Fos staining and investigated the expression patterns of dopamine receptors in BNST and LPGi. Our findings suggested that Ox1R deletion in dopaminergic neurons results in the disinhibition of neural activity in LPGi via dopaminergic pathways and the decrease of dopamine-mediated neural activity in BNST. Emotion perception affects the decision how to respond to the novelty. It is possible that novelty activates the orexin system and Ox1R signaling in dopaminergic neurons promotes emotion perception and inhibits exploration. Of course, further investigation is necessary to test this hypothesis in future. We have improved the rational description and discussion in the ‘Results’ and ‘Discussion’ section in the revised reviewed preprint (lines 210-213, 259-270, 293-308). 

      Heatmap in Fig. 1K should not have smoothing across the y-axis, individual cells should be discrete.

      We thank the reviewer for bringing this issue to our attention. The data had not been intentionally smoothed (neither across the x-axis nor the y-axis), but it was probably a formatting issue. We have corrected this and separated individual cell traces with lines (Figure 1K, Figure 1—figure supplement 3).

      Dopamine cells are well known to lack Fos expression in most cases. Did the authors consider using another IEG to show neural activation, e.g., pERK?

      We did not use another IEG. The electrophysiological and Ca2+ imaging studies presented here, along with previous electrophysiological studies by others, clearly show that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons. Please see also the response to a related comment of Reviewer 1.

      Consider adding a lower magnification section to anatomical figures to aid the reader in orienting and identifying the location.

      We have added the schematic illustration of SN, VTA, BNST and LPGi in Figure 1I and Figure 4— figure supplement 1. We hope this helps the reader in orienting and identifying the location.  

      Data availability should be stated.

      There are no restrictions on data availability. We have added this section to the revised reviewed preprint.

      Line 50. Some more references both historical and recent could be given to support this statement about the function of dopamine.

      We have improved the description and references to support the statement about dopamine function (lines 46-58). We have cited recent studies and some reviews in the revised reviewed preprint (lines 4658). 

      The PET data (Fig. 3) might be easier to visualize and interpret if a white background was used. In addition, is there a more refined way of presenting the data in Fig 3, S1?

      It is common to present imaging data such as PET and MRI on a black background. We also have already applied this color scheme in multiple publications and would therefore prefer to stick to this color scheme. 

      While Figure 3 is the concise way to present PET data, we aim to show the original individual results of mice in Figure 3—figure supplement 2 and to demonstrate how we performed the statistical analysis. Therefore, we take an example voxel of the respective brain area, perform the t-test, and present the data as bars with individual dots. 

      Line 97. State what type of Ca imaging here, e.g., "we performed Ca imaging in ex vivo slices of VTA and SN".

      As the reviewer suggested, we have specified the type of Ca2+ imaging (line 112).

      Line 165. State which groups this post-mortem analysis was performed on and if any differences were to be found (not expected to find differences in this anatomical tracing experiment but good to report this as both groups were used).

      Postmortem analysis of c-Fos staining revealed low c-Fos expression in dopaminergic neurons in the VTA and SN of Ox1RΔDAT and control mice after ICV injection of saline or orexin A (1 nmol). No obvious changes were observed among the groups. We have improved the description in the revised reviewed preprint (lines 202-208).

      Line 192. What do you mean by optophysiological here? The Ca imaging (which is a fairly small, confirmatory element of the manuscript).

      We have changed ‘optophysiological experiments’ (line 192 in initial submitted manuscript) to ‘calcium imaging experiments’ and rephrased the beginning of the discussion to clarify the objectives (lines 238246).

      The protein level in the diet is substantially higher than in most rodent diets (34% here vs 14-20% in most commercial rodent chows). Please comment on this.

      This diet is for rat and mouse maintenance, purchased from ssniff Spezialdiäten GmbH (product V1554).

      The percentage of calories supplied by protein is affected by the calculation methods. The company calculated with pig equation before and the value was 34% in the old instruction data sheet. They have updated the value to 23% in the new data sheet with calculations by Atwater factors. We thank the reviewer for reminding us and have updated the values in the revised reviewed preprint (lines 314-316). 

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We have provided the source data and the statistical reporting for each Figure with the revision

      References

      Baimel, C., Lau, B. K., Qiao, M., & Borgland, S. L. (2017). Projection-target-defined effects of orexin and dynorphin on VTA dopamine neurons. Cell Rep, 18(6), 1346-1355.  https://doi.org/10.1016/j.celrep.2017.01.030

      Korotkova, T. M., Eriksson, K. S., Haas, H. L., & Brown, R. E. (2002). Selective excitation of GABAergic neurons in the substantia nigra of the rat by orexin/hypocretin in vitro. Regul Pept, 104(1-3), 83-89. https://doi.org/10.1016/s0167-0115(01)00323-8 

      Korotkova, T. M., Sergeeva, O. A., Eriksson, K. S., Haas, H. L., & Brown, R. E. (2003). Excitation of ventral tegmental area dopaminergic and nondopaminergic neurons by orexins/hypocretins. J Neurosci, 23(1), 7-11. https://www.ncbi.nlm.nih.gov/pubmed/12514194

      Liu, C., Xue, Y., Liu, M. F., Wang, Y., Liu, Z. R., Diao, H. L., & Chen, L. (2018). Orexins increase the firing activity of nigral dopaminergic neurons and participate in motor control in rats. J Neurochem, 147(3), 380-394. https://doi.org/10.1111/jnc.14568 

      Tung, L. W., Lu, G. L., Lee, Y. H., Yu, L., Lee, H. J., Leishman, E., Bradshaw, H., Hwang, L. L., Hung, M. S., Mackie, K., Zimmer, A., & Chiou, L. C. (2016). Orexins contribute to restraint stress-induced cocaine relapse by endocannabinoid-mediated disinhibition of dopaminergic neurons. Nat Commun, 7, 12199. https://doi.org/10.1038/ncomms12199

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The authors investigated the anatomical features of the synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of each synapse, the macular or perforated appearance, the size of the synaptic active zone, the number and volume of the mitochondria, and the number of synaptic and dense core vesicles, also differentiating between the readily releasable, the recycling, and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The authors conclude that the subcellular morphology of the layer 1 synapses are suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow increased glutamate spillover from the synapses, enhancing synaptic crosstalk within this cortical layer. 

      Strengths: 

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable since this is a highly time- and energy-consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion. 

      We would like to thank reviewer#1 for his very positive comments on our manuscript stating that such data about the fine structure of the human neocortex are are highly relevant.

      Weaknesses: 

      There are several weaknesses in this work. First, the authors should check and review extensively for improvements to the use of English. Second, several additional analyses performed on the existing data could substantially elevate the value of the data presented. Much more information could be gained from the existing data about the functions of the investigated layer, of the cortical column, and about the information processing of the human neocortex. Third, several methodological concerns weaken the conclusions drawn from the results. 

      We would like to thank the reviewer for his critical and thus helpful comments on our manuscript. We took the first comment of the reviewer concerning the English and have thus improved our manuscript by rephrasing and shortening sentences. Secondly, according to the reviewer several additional analyses should be performed on the existing data, which could substantially elevate the value of the data presented. We will implement some of the suggestions in the improved version of the manuscript where appropriate. We will address a more detailed answer to the reviewer’s queries in her/his suggestions to the authors (see below). However, the reviewer states himself: “The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion”.

      Reviewer #2 (Public review): 

      Summary: 

      The study of Rollenhagen et al. examines the ultrastructural features of Layer 1 of the human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as far as possible from the epilepsy focus, and as such considered to be non-epileptic. The analyses included 4 patients with different ages, sex, medication, and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex: 

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex.

      They find, that the L1 synaptic boutons mainly have a single active zone, a very large pool of synaptic vesicles, and are mostly devoid of astrocytic coverage. 

      Strengths: 

      The manuscript is well-written and easy to read. The Results section gives a detailed set of figures showing many morphological parameters of synaptic boutons and glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in the human brain are still very limited, the current manuscript has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analysis is clear and precise.

      We would like to thank the reviewer for his very positive evaluation of our paper and the comments that such data have a substantial relevance, in particular in the human neocortex. In contrast to reviewer#1, this reviewer’s opinion is that the manuscript is well written and easy to read.

      Weaknesses: 

      One of the main findings of this paper is that "low degree of astrocytic coverage of L1 SBs suggests that glutamate spillover and as a consequence synaptic cross-talk may occur at the majority of synaptic complexes in L1". However, the authors only quantified the volume ratio of astrocytes in all 6 layers, which is not necessarily the same as the glial coverage of synapses. In order to strengthen this statement, the authors could provide 3D data (that they have from the aligned serial sections) detailing the percentage of synapses that have glial processes in close proximity to the synaptic cleft, that would prevent spillover. 

      We agree with the reviewer that we only quantified the volume ratio of the astrocytic coverage but not necessarily the percentage of synapses that may or not contribute to the formation of the ‘tripartite’ synapse. As suggested, we will re-analyze our material with respect to the percentage of coverage for individual synaptic boutons in each layer and will implement the results in the improved version of the manuscript. However, since this is a completely new analysis that is time-consuming we would like to ask the reviewer for additional time to perform this task.

      A specific statement is missing on whether only glutamatergic boutons were analyzed in this MS, or GABAergic boutons were also included. There is a statement, that they can be distinguished from glutamatergic ones, but it would be useful to state it clearly in the Abstract, Results, and Methods section what sort of boutons were analyzed. Also, what is the percentage of those boutons from the total bouton population in L1? 

      We would like to thank the reviewer for this comment. Although our title clearly states, we focused on quantitative 3D-models of excitatory synaptic boutons, we will point out that more clearly in the Methods and Result chapters. Our data support recent findings by others (see for example Cano-Astorga et al. 2023, 2024; Shapson-Coe et al. 2024) that have evaluated the ratio between excitatory vs. inhibitory synaptic boutons in the temporal lobe neocortex, the same area as in our study, which was between 10-15% inhibitory terminals but with a significant layer and region specific difference. We will include the excitatory vs. inhibitory ratio and the corresponding citations in the Results section.

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections results in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly. 

      We partially disagree with the reviewer on this point. Using high-resolution transmission electron microscopy, we measured the distance from the outer-to-outer membrane only on those synaptic vesicles that were round in shape with a clear ring-like structure to avoid double counts and discarded all those that were only partially cut according to criteria developed by Abercrombie (1946) and Boissonnat (1988). We assumed that within a 55±5 nm thick ultrathin section (silver to gray interference contrast) all clear-ring-like vesicles were distributed in this section assuming a vesicle diameter between 25 to 40nm. For large DCVs, double-counts were excluded by careful examination of adjacent images and were only counted in the image where they appeared largest.

      In addition, we have measured synaptic vesicles using TEM tomography and came to similar results. We will address this in Material and Methods that both methods were used.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool, and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Indeed, it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming are also a prerequisite for release.

      We thank the reviewer for this comment. However, nobody before us tried to define a morphological correlate for the three functionally defined pools of synaptic vesicles since synaptic vesicles normally are distributed over the entire nerve terminal. As already mentioned above, after long and thorough discussions with Profs. Bill Betz, Chuck Stevens, Thomas Schikorski and other experts in this field we tried to define the readily releasable (RRP), recycling (RP) and resting pools by measuring the distance of each synaptic vesicle to the presynaptic density (PreAZ). Using distance as a criterion, we defined the RRP including all vesicles that were located within a distance (perimeter) of 10 to 20 nm from the PreAZ that is less than an average vesicle diameter (between 25 to 40 nm). The RP was defined as vesicles within a distance of 60-200 nm away, still quite close but also rapidly available on demand and the remaining ones beyond 200 nm were suggested to belong to the resting pool. This concept was developed for our first publication (Sätzler et al. 2002) and this approximation since then is very much acknowledged by scientist working in the field of synaptic neuroscience and computational neuroscientist. We were asked by several labs worldwide whether they can use our data of the perimeter analysis for modeling. We agree that our definition of the three pools can be seen as arbitrary but we never claimed that our approach is the truth but nothing as the truth. Concerning the debate whether only docked vesicles or also those very close the PreAZ should constitute the RRP we have a paper in preparation using our perimeter analysis, EM tomography and simulations trying to clarify this debate. Our preliminary results suggest that the size of the RRP should be reconsidered.

      Tissue shrinkage due to aldehyde fixation is a well-documented phenomenon that needs compensation when dealing with density values. The authors cite Korogod et al 2015 - which actually draws attention to the problem comparing aldehyde fixed and non-fixed tissue, still the data is non-compensated in the manuscript. Since all the previous publications from this lab are based on aldehyde fixed non-compensated data, and for this sake, this dataset should be kept as it is for comparative purposes, it would be important to provide a scaling factor applicable to be able to compare these data to other publications.

      We thank the reviewer for his suggestion. However, for several reasons we did not correct for shrinkage caused by aldehyde fixation. There are papers by Eyre et al. (2007) and the mentioned paper by Korogod et al. 2015 that have demonstrated that cryo-fixation reveals larger numbers of docked synaptic vesicles, a smaller glial volume, and a less intimate glial coverage of synapses and blood vessels compared to chemical fixation. Other structural subelements such as active zone size and shape and the total number of synaptic vesicles remained unaffected. In two further publications Zhao et al. (2012a, b) investigating hippocampal mossy fiber boutons using cryo-fixation and substitutions came to similar results with respect to bouton and active zone size and number and diameter of synaptic vesicles compared to aldehyde-fixation as described by Rollenhagen et al. 2007 for the same nerve terminal. This was one of the reasons not correcting for shrinkage. In addition, all cited papers state that chemical fixation in general provides a much better ultrastructural preservation of tissue samples when compared with cryo-fixation and substitution where optimal preservation is only regional within a block of tissue and therefore less suitable for large-scale ultrastructural analyses as we performed.

      Reviewer #3 (Public review): 

      Summary: 

      Rollenhagen et al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology, and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue. 

      Strengths: 

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented. 

      We are very thankful to the reviewer upon his very positive comments about our data analysis and presentation.

      Weaknesses: 

      The study provides only morphological details, these can be useful in the future when combined with functional assessments or computational approaches. The authors emphasize the importance of their findings on astrocytic coverage and suggest important implications for glutamate spillover. However, the percentage of synapses that form tripartite synapses has not been quantified, the authors' functional claims are based solely on volumetric fraction measurements. 

      We thank the reviewer for his critical comments on our findings concerning the layer-specific astrocytic coverage as also suggested by reviewer#2. As already stated above we will analyze the astrocytic coverage and the layer-specific percentage of astrocytic contribution to the ‘tripartite’ synapse in more detail. We are, however, a bit puzzled about the comment that structural anatomists usually receive that our study only provides morphological details. Our thorough analysis of structural and synaptic parameters of synaptic boutons underlie and might even predict the function of synaptic boutons in a given microcircuit or network and will thus very much improve our understanding and knowledge about the functional properties of these structures, in particular in the human brain where such studies are still quite rare. The main goal of our studies in the human neocortex was the quantitative morphology of synaptic boutons and thus the synaptic organization of the cortical column, layer by layer which to our knowledge is the first such detailed study undertaken in the human brain. Our efforts have set a golden standard in the analysis of synaptic boutons embedded in different microcircuits und is meanwhile internationally very well accepted.

      The distinction between excitatory and inhibitory synapses is not clear, they should be analyzed separately. 

      As already stated above in response to reviewer#1 our study focused on excitatory synaptic boutons since they represent the majority of synapses. However, in the improved version of our manuscript in the Material and Method section we included a paragraph with structural criteria to distinguish excitatory from inhibitory terminals (see also our comment to reviewer#1 concerning this point) including appropriate citations.

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles are also more complex than suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data. 

      We thank the reviewer for this comment. However, it has been shown by meanwhile numerous publications that the shape and size of the active zone together with the pool of synaptic vesicles and the astrocytic coverage critically determines synaptic transmission and synaptic strength, but can also contribute to the modulation of synaptic plasticity (see also citations within the text). It has been shown that synaptic boutons can switch upon certain stimulation conditions to different modes of release (uni- vs. multiquantal, uni- vs multivesicular release) and from asynchronous to synchronous release leading also to the modulation of synaptic short- and long-term plasticity. To the second comment: When we started with our first paper about the Calyx of Held – principal neuron synapse in the MNTB (Sätzler et al. 2002) we tried to define a morphological correlate for the three functionally defined pools. As already mentioned above in our reply to the other two reviewers, this is rather difficult since synaptic vesicles are normally distributed over the entire nerve terminal. After long and thorough discussions with Bill Betz, Chuck Stevens and other leading scientist in the field of synaptic neuroscience, we together with Bert Sakmann tried to define a morphological correlate for the functionally defined pools using a perimeter analysis. We defined the readily releasable pool as vesicles 10 to 20 nm away from the presynaptic active zone, the recycling pool as those in 60-200 nm distance and the remaining as those belonging to the resting pool. However, it has been shown by capacitance measurements (see for example Hallermann et al 2003), FM1-43 investigations (see for example Henkel et al. 1996) and high-resolution electron microscopy (see for example Schikorski and Stevens 2001; Schikorski 2014) that our estimate of the RRP nearly perfectly matches with the functionally defined pools at hippocampal and cortical synapses (Silver et al. 2003). In addition, in one of our own papers (Rollenhagen et al. 2018) we also estimated the RP functionally from trains of EPSPs using an exponential fit analysis and came to similar results upon its size using the perimeter analysis.

      Of course, as stated by the reviewer the scenario could be more complex, using other criteria but we never claimed that our morphologically defined pools are the truth but nothing as the truth but we believe it offers a quite good approximation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers and editors for their careful assessment and review of our article. The many detailed comments, questions and suggestions were very helpful in improving our analyses and presentation of data. In particular, our Discussion benefited enormously from the comments. 

      Below we respond in detail to every point raised. 

      We especially note that Reviewer #3’s small query on “trial where learning is defined to have occurred, we were not given the quantitative criterion operationalizing "learning" - please provide” led to deeper analyses and insights and a lengthy response.

      This analysis prompted the addition of a sentence (red) to the Abstract. 

      “Animals navigate by learning the spatial layout of their environment. We investigated spatial learning of mice in an open maze where food was hidden in one of a hundred holes. Mice leaving from a stable entrance learned to efficiently navigate to the food without the need for landmarks. We developed a quantitative framework to reveal how the mice estimate the food location based on analyses of trajectories and active hole checks. After learning, the computed “target estimation vector” (TEV) closely approximated the mice’s route and its hole check distribution. The TEV required learning both the direction and distance of the start to food vector, and our data suggests that different learning dynamics underlie these estimates. We propose that the TEV can be precisely connected to the properties of hippocampal place cells. Finally, we provide the first demonstration that, after learning the location of two food sites, the mice took a shortcut between the sites, demonstrating that they had generated a cognitive map. ”

      Note: we added, at the end of the manuscript, the legends for the Shortcut video (Video 1) and the main text figure legends; these are with a larger font and so easier to read. 

      Reviewer #1 (Public Review):

      Assessment:

      This important work advances our understanding of navigation and path integration in mammals by using a clever behavioral paradigm. The paper provides compelling evidence that mice are able to create and use a cognitive map to find "short cuts" in an environment, using only the location of rewards relative to the point of entry to the environment and path integration, and need not rely on visual landmarks.

      Thank you.

      Summary:

      The authors have designed a novel experimental apparatus called the 'Hidden Food Maze (HFM)' and a beautiful suite of behavioral experiments using this apparatus to investigate the interplay between allothetic and idiothetic cues in navigation. The results presented provide a clear demonstration of the central claim of the paper, namely that mice only need a fixed start location and path integration to develop a cognitive map. The experiments and analyses conducted to test the main claim of the paper -- that the animals have formed a cognitive map -- are conclusive. While I think the results are quite interesting and sound, one issue that needs to be addressed is the framing of how landmarks are used (or not), as discussed below, although I believe this will be a straightforward issue for the authors to address.

      We have now added detailed discussion on this important point. See below.

      Strengths:

      The 90-degree rotationally symmetric design and use of 4 distal landmarks and 4 quadrants with their corresponding rotationally equivalent locations (REL) lends itself to teasing apart the influence of path integration and landmark-based navigation in a clever way. The authors use a really complete set of experiments and associated controls to show that mice can use a start location and path integration to develop a cognitive map and generate shortcut routes to new locations.

      Weaknesses:

      I have two comments. The second comment is perhaps major and would require rephrasing multiple sentences/paragraphs throughout the paper.

      (1) The data clearly indicate that in the hidden food maze (HFM) task mice did not use external visual "cue cards" to navigate, as this is clearly shown in the errors mice make when they start trials from a different start location when trained in the static entrance condition. The absence of visual landmark-guided behavior is indeed surprising, given the previous literature showing the use of distal landmarks to navigate and neural correlates of visual landmarks in hippocampal formation. While the authors briefly mention that the mice might not be using distal landmarks because of their pretraining procedure - I think it is worth highlighting this point (about the importance of landmark stability and citing relevant papers) and elaborating on it in greater detail. It is very likely that mice do not use the distal visual landmarks in this task because the pretraining of animals leads to them not identifying them as stable landmarks. For example, if they thought that each time they were introduced to the arena, it was "through the same door", then the landmarks would appear to be in arbitrary locations compared to the last time. In the same way, we as humans wouldn't use clouds or the location of people or other animate objects as trusted navigational beacons. In addition, the animals are introduced to the environment without any extra-maze landmarks that could help them resolve this ambiguity. Previous work (and what we see in our dome experiments) has shown that in environments with 'unreliable' landmarks, place cells are not controlled by landmarks - https://www.sciencedirect.com/science/article/pii/S0028390898000537, https://pubmed.ncbi.nlm.nih.gov/7891125/. This makes it likely that the absence of these distal visual landmarks when the animal first entered the maze ensured that the animal does not 'trust' these visual features as landmarks.

      Thank you. We have added many references and discussion exactly on this point including both direct behavioral experiments as well as discussion on the effects of landmark (in)stability of place cell encoding of “place”.  See Page 18 third paragraph.

      “An alternate factor might be the lack of reliability of distal spatial cues in predicting the food location. The mice, during pretraining trials, learned to find multiple food locations without landmarks. In the random trials, the continuous change of relative landmark location may lead the mice to not identifying them as “stable landmarks”. This view is supported by behavioral experiments that showed the importance of landmark stability for spatial learning (32-34) and that place cells are not controlled by “unreliable landmarks” (35-38). Control experiments without landmarks (Fig. S6A,B) or in the dark (Fig. S6C-F) confirmed that the mice did not need landmarks for spatial learning of the food location.”

      (2) I don't agree with the statement that 'Exogenous cues are not required for learning the food location'. There are many cues that the animal is likely using to help reduce errors in path integration. For example, the start location of the rat could act as a landmark/exogenous cue in the sense of partially correcting path integration errors. The maze has four identical entrances (90-degree rotationally symmetric). Despite this, it is entirely plausible that the animal can correct path integration errors by identifying the correct start entrance for a given trial, and indeed the distance/bearing to the others would also help triangulate one's location. Further, the overall arena geometry could help reduce PI error. For example, with a food source learned to be "near the middle" of the arena, the animal would surely not estimate the position to be near the far wall (and an interesting follow-on experiment would be to have two different-sized, but otherwise nearly identical arenas). As the rat travels away from the start location, small path integration errors are bound to accumulate, these errors could be at least partially corrected based on entrance and distal wall locations. If this process of periodically checking the location of the entrance to correct path integration errors is done every few seconds, path integration would be aided 'exogenously' to build a cognitive map. While the original claim of the paper still stands, i.e. mice can learn the location of a hidden food size when their starting point in the environment remains constant across trials. I would advise rewording portions of the paper, including the discussion throughout the paper that states claims such as "Exogenous cues are not required for learning the food location" to account for the possibility that the start and the overall arena geometry could be used as helpful exogenous cues to correct for path integration errors.

      We agree with the referee that our claim was ill-phrased. Surely the behavior of the mouse must be constrained by the arena size to some extent. To minimize potential geometric cues from the arena, we carefully analyzed many preliminary experiments (each with a unique batch of 4 mice) having the target positioned at different locations. We added a paragraph to the section “Further controls” where we explain our choice for the target position. Page 12 last paragraph; Page 13 “Arena geometry” paragraph.

      Also, following the suggestion from the reviewer, we probed whether the hole checks accumulated near the center of the arena for the random entrance mice, as a potential sign that some spatial learning is going on. In fact, neither the density of hole checks, nor the distance of the hole checks to the center of the arena change with learning: panel A below shows the probability density of finding a hole check at a given distance from the center of the arena; both trial 1 and trial 14 have very similar profiles. Panel B shows the density of hole checks near (<20cm) and far (>20cm) from the arena’s center.

      Author response image 1.

      It also doesn’t show any significant differences between trials 1 and 14.

      So even though there’s some trend (in panel A, the peak goes from 60cm to a double peak, one at 30cm away from the center, and the other still at 60cm), the distance from the center is still way too large compared to the mouse’s body size and to the average inter-hole distance (<10cm). These panels are now in the Supplementary Figure S8B.

      Finally, we enhanced the wording in our claim. We now have a new section entitled: “What cues are required for learning the food location?”. There, we systematically cover all possible cues and how they might be affected by their stability under the perturbation of maze floor rotation. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript reports interesting findings about the navigational behavior of mice. The authors have dissected this behavior in various components using a sophisticated behavioral maze and statistical analysis of the data.

      Strengths:

      The results are solid and they support the main conclusions, which will be of considerable value to many scientists.

      Thank you.

      Weaknesses:

      Figure 1: In some trials the mice seem to be doing thigmotaxis, walking along the perimeter of the maze. This is perhaps due to the fear of the open arena. But, these paths along the perimeter would significantly influence all metrics of navigation, e.g. the distance or time to reward.

      Perhaps analysis can be done that treats such behavior separately and the factors it out from the paths that are away from the perimeter.

      In Page 4, we added a small section entitled: “Pretraining trials”. Our reference was suggested by Reviewer #3 (noted as “Golani” with first author “Fonio”). Our preliminary experiments used naïve mice and they typically took greater than 2 days before they ventured into the arena center and found the single filled hole. This added unacceptable delays and the Pretraining trials greatly diminished the extensive thigmotaxis (not quantified). The “near the walls” trajectories did continue in the first learning trial (Fig. 2A, 3A) but then diminished in subsequent trials. We found no evidence that thigmotaxis (trajectories adjacent to the wall) were a separate category of trajectory. 

      Figure 1c: the color axis seems unusual. Red colors indicate less frequently visited regions (less than 25%) and white corresponds to more frequently visited places (>25%)? Why use such a binary measure instead of a graded map as commonly done?

      Thank you; you are completely correct. We have completely changed the color coding. 

      Some figures use linear scale and others use logarithmic scale. Is there a scientific justification? For example, average latency is on a log scale and average speed is on a linear scale, but both quantify the same behavior. The y-axis in panel 1-I is much wider than the data. Is there a reason for this? Or can the authors zoom into the y-axis so that the reader can discern any pattern?

      We use logarithmic scale with the purpose of displaying variables that have a wide range of variation (mainly, distance, latency, and number of hole checks, since it linearly and positively correlates with both distance and latency – see new Fig. S4B,C). For example, Latency goes from hundreds of seconds (trial 1) to just a few seconds (trial 14). Similarly, the total distance goes from hundreds of centimeters (trial 1, sometimes more than 1000cm, see answer about the 10-fold variation of distance below) to just the start-target distance (which is ~100cm). These variables vary over a few orders of magnitude. We display speed in a linear axis because it does not increase for more than one order of magnitude.

      Moreover, fitting the wide-ranged data (distance, latency, nchecks) yields smaller error in logscale [i.e., fitting log(y) vs. trial, instead of y vs. trial]. In these cases, the log-scale also helps visualizing how well the data was fitted by the curve. Thus, presenting wide-ranged data in linear scale could be misleading regarding goodness of fit.

      We now zoomed into the Y axis scale in Panels I of Fig. 2 and Fig. 3. We kept it in log-scale, but linear Y scale produces Author response image 2 for Figs. 3I and 2I, respectively.

      Author response image 2.

      Thus, we believe that the loglog-scale in these panels won’t compromise the interpretation of the phenomenon. In fact, the loglog of the static case suggests that the probability of hole checking distance increases according to a power law as the mouse approaches the target (however, we did not check this thoroughly, so we did not include this point in the discussion). Power law behavior is observed in other animals (e.g, ants: DOI: 10.1371/journal.pone.0009621) and is sometimes associated with a stochastic process with memory.

      1F shows no significant reduction in distance to reward. Does that mean there is no improvement with experience and all the improvement in the latency is due to increasing running speed with experience?

      Correct and in the section “Random Entrance experiments” under “Results” (Page 5) we explicitly note this point.

      “We hypothesize that the mice did not significantly reduce their distance travelled (Fig. 2A,B,F) because they had not learned the food location - the decrease in latency (Fig. 2D) was due to its increased running speed and familiarity with non-spatial task parameters.”

      Figure 3: The distance traveled was reduced by nearly 10-fold and speed increased by by about 3fold. So, the time to reach the reward should decrease by only 3 fold (t=d/v) but that too reduced by 10fold. How does one reconcile the 3fold difference between the expected and observed values?

      The traveled distance is obtained by linearly interpolating the sampled trajectory points. In other words, the software samples a discrete set of positions, for each recorded instant 𝑡. The total distance is 

      where is the Euclidean distance between two consecutively sampled points. However, the same result (within a fraction of cm error) can be obtained by integrating the sampled speed over time 𝑣! using the Simpson method

      Since Latency varies by 10-fold, it is just expected that, given 𝑑 = 𝑣𝑡, the total distance will also vary by 10-fold (since 𝑣 is constant in each time interval Δ𝑡; replacing 𝑣! in the integral yields the discrete sum above).

      The correctness of our kinetic measurements can be simply verified by multiplying the data from the Latency panel with the data from the Velocity panel. If this results in the Distance plot, then there is no discrepancy. 

      In Author response image 3, we show the actual measured distance, 𝑑_total_, for both conditions (random and static entrance), calculated with the discrete sum above (black filled circles). 

      Author response image 3.

      We compare this with two quantities: (a) average speed multiplied by average latency (red squares); and (b) average of the product of speed by latency (blue inverted triangles). The averages are taken over mice. Notice that if the multiplication is taken before the average (as it should be done), then the product 〈𝑣𝑡〉45*( is indistinguishable from the total distance obtained by linear interpolation. Even taking the averages prior to the multiplication (which is physically incorrect, since speed and latency and properties of each individual mouse), yields almost exactly the same result (well within 1 standard deviation).

      The only thing to keep in mind here is that the Distance panel in the paper presents the normalized distance according to the target distance to the starting point. This is necessary because in the random entrance experiments, each mouse can go to 1 of 4 possible targets (each of which has a different distance to the starting point).

      Figure 4: The reader is confused about the use of a binary color scheme here for the checking behavior: gray for a large amount of checking, and pink for small. But, there is a large ellipse that is gray and there are smaller circles that are also gray, but these two gray areas mean very different things as far as the reader can tell. Is that so? Why not show the entire graded colormap of checking probability instead of such a seemingly arbitrary binary depiction?

      Thank you. Our coloring scheme was indeed poorly thought out and we have changed it. Hopefully the reviewer now finds it easier to interpret. The frequency of hole checks is now encoded into only filled circles of varying sizes and shades of pink. Small empty circles represent the arena holes (empty because they have no food); The large transparent gray ellipse is the variance of the unrestricted spatial distribution of hole checks.

      Figure 4C: What would explain the large amount of checking behavior at the perimeter? Does that occur predominantly during thigmotaxis?

      Yes. As mentioned above, thigmotaxis still occurs in the first trial of training. The point to note is that the hole checking shown in Fig. 4C is over all the mice so that, per mice, it does not appear so overwhelming. 

      Was there a correlation between the amount of time spent by the animals in a part of the maze and the amount of reward checking? Previous studies have shown that the two behaviors are often positively correlated, e.g. reference 20 in the manuscript. How does this fit with the path integration hypothesis?

      We thank the reviewer for pointing this out. Indeed, the time spent searching & the hole checking behavior are correlated. We added a new panel C to Fig. S4 showing a raw correlation plot between Latency and number of checks. 

      Also, in the last paragraph of the “Revealing the mouse estimate of target position from behavior” section under “Results”), we now added a sentence relating the findings in Fig. 4H and 4K (spatial distribution of hole checks, and density of checks near the target, respectively) to note that these findings are in agreement with Fig 3C (time spent searching in each quadrant).

      “The mean position of hole checks near (20cm) the target is interpreted as the mouse estimated target (Fig. 4C,D,G,H; green + sign=mean position; green ellipses = covariance of spatial hole check distribution restricted to 20cm near the target). This finding together with the displacement and spatial hole check maps (Figs. 4F and 4H, respectively) corroborates the heatmap of time spent in the target quadrant (Fig. 3C), suggesting a positive correlation between hole checks and time searching (see also Fig. S4C).”

      "Scratches and odor trails were eliminated by washing and rotating the maze floor between trials." Can one eliminate scratches by just washing the maze floor? Rotation of the maze floor between trials can make these cues unreliable or variable but will not eliminate them. Ditto for odor cues.

      The upper arena floor is rotated between trials so that any scratches will not be stable cues. We clarified this in the Discussion about potential cues. 

      See “What cues are required for learning the food location?”

      "Possible odor gradient cues were eliminated by experiments where such gradients were prevented with vacuum fans (Fig. S6E)" What tests were done to ensure that these were *eliminated* versus just diminished?

      "Probe trials of fully trained mice resulted in trajectories and initial hole checking identical to that of regular trials thereby demonstrating that local odor cues are not essential for spatial learning." As far as the reader can tell, probe trials only eliminated the food odor cues but did not eliminate all other odors. If so, this conclusion can be modified accordingly.

      We were most worried about odor cues guiding the mice and as now described at great length, we tried to mitigate this problem in many ways. As the reviewer notes, it is not possible to have absolute certainty that there are no odor cues remaining. The most difficult odor to eliminate was the potential odor gradient emanating from the mouse’s home cage. However, the 2 vacuum fans per cage were very powerful in first evacuating the cage air (150x in 5 minutes) and then drawing air from the arena, through the cage and out its top for the duration of each trial. We believe that we did at least vastly reduce any odor cues and perhaps completely eliminated them.

      The interpretation of direction selectivity is a bit tricky. At different places in this manuscript, this is interpreted as a path integration signal that encodes goal location, including the Consync cells. However, studies show that (e.g. Acharya et al. 2016) direction selectivity in virtual reality is comparable to that during natural mazes, despite large differences in vestibular cues and spatial selectivity. How would one reconcile these observations with path integration interpretation?

      Thank you. We had not been serious enough in considering the VR studies and their implications for optic flow as a cue for spatial learning. We now have a section (Optic flow cues) in the Discussion that acknowledges the potential role of such cues in spatial learning in our maze. 

      However, spatial learning in our maze can also occur in the dark. The next small section (Vestibular and proprioceptive cues) addresses this point. We cannot be certain about the precise cues used by the mouse to effectively learn to locate food in our maze, but it will take further behavioral and electrophysiological studies to go deeper into these questions. 

      An extended discussion is found in the sections entitled “What cues are required for learning the food location” and “A fixed start location and self-motion cues are required for spatial learning”.  We may have missed some references or ideas regarding VR maze learning with optic flow signals – the Acharya et al reference was an excellent starting point, and we would be grateful for additional pointers that would improve our discussion of this point.

      The manuscript would be improved if the speculations about place cells, grid cells, BTSP, etc. were pared down. I could easily imagine the outcome of these speculations to go the other way and some claims are not supported by data. "We note that the cited experiments were done with virtual movement constrained to 1D and in the presence of landmarks. It remains to be shown whether similar results are obtained in our unconstrained 2D maze and with only self-motion cues available." There are many studies that have measured the evolution of place cells in non- virtual mazes, look up papers from the 1990s. Reference 43 reports such results in a 2D virtual maze.

      We understand the reviewer’s concerns with the length of the manuscript. However, both the first and third reviewer did find this extensive section useful. We did not add the many papers on the evolution of place fields in real world mazes simply to prevent even greater expansion of the discussion, but relied on the very thorough review of Knierim and Hamilton instead. 

      Reviewer #3 (Public Review):

      Summary:

      How is it that animals find learned food locations in their daily life? Do they use landmarks to home in on these learned locations or do they learn a path based on self-motion (turn left, take ten steps forward, turn right, etc.). This study carefully examines this question in a well-designed behavioral apparatus. A key finding is that to support the observed behavior in the hidden food arena, mice appear to not use the distal cues that are present in the environment for performing this task. Removal of such cues did not change the learning rate, for example. In a clever analysis of whether the resulting cognitive map based on self-motion cues could allow a mouse to take a shortcut, it was found that indeed they are. The work nicely shows the evolution of the rodent's learning of the task, and the role of active sensing in the targeted reduction of uncertainty of food location proximal to its expected location.

      Strengths:

      A convincing demonstration that mice can synthesize a cognitive map for the finding of a static reward using body frame-based cues. This shows that the uncertainty of the final target location is resolved by an active sensing process of probing holes proximal to the expected location. Showing that changing the position of entry into the arena rotates the anticipated location of the reward in a manner consistent with failure to use distal cues.

      Thank you.

      Weaknesses:

      The task is low stakes, and thus the failure to use distal cues at most costs the animal a delay in finding the food; this delay is likely unimportant to the animal. Thus, it is unclear whether this result would generalize to a situation where the animal may be under some time pressure, urgency due to food (or water) restriction, or due to predatory threat. In such cases, the use of distal cues to make locating the reward robust to changing start locations may be more likely to be observed.

      We have added “Combining trajectory direction and hole check locations yields a Target Estimation Vector” a section summarizing our main hypotheses and this section includes noting exactly this point + including the reference to the excellent MacIver paper on “robot aggression”.

      The main point here follows the Knierim and Hamilton review and assumes that learning “heading direction” and “distance from start to food” require different cues and extraction mechanisms.  “Here we follow a review by Knierim and Hamilton (12) suggesting independent mechanisms for extraction of target direction versus target distance information. Averaging across trajectories gave a mean displacement direction, an estimate of the average heading direction as the mouse ran from start to food. The heading direction must be continuously updated as the mice runs towards the food, given that the mean displacement direction remains straight despite the variation across individual trajectories. Heading direction might be extracted from optic flow and/or vestibular system and be encoded by head direction cells. However, the distance from home to food is not encoded by head direction signals.”

      And

      “We hypothesize that path integration over trajectories is used to estimate the distance from start to food. The stimuli used for integration might include proprioception or acceleration (vestibular) signals as neither depends on visual input. Our conclusion is in accord with a literature survey that concluded that the distance of a target from a start location was based on path integration and separate from the coding of target heading direction (12). Our “in the dark” experiments reveal the minimal stimuli required for spatial learning – an anchoring starting point and directional information based on vestibular and perhaps proprioceptive signals. This view is in accord with recent studies using VR (47, 48). Under more naturalistic conditions, animals have many additional cues available that can be used for flexible control of navigation under time or predation pressure (51).”.

      Furthermore, we added panel G do Fig S4, where we show the evolution of the heading angle along the trajectory, plotted as a function of the trials. We see that the mouse only steer towards the target in the last segment of the trajectory, consistent with having the head direction being continuously updated along the path to the food.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      All three reviewers agreed during the consultation that the context in which distal cues are described in the manuscript would benefit significantly from refinement. The distal cues may be made completely useless from an ethological perspective e.g. if they are seen as "moving" relative to the entrance point (i.e. if the animal were to think it were entering the same location), then the cues would appear as unstable in the random entrance. As such, they may be so unlike natural experiences as to be potentially confusing to the animal. Moreover, as reported in some of the reviews, the animals may be using the entrances and boundaries as cues to help refine path integration. The results are still very interesting, but more refinement in the text on the interpretation of cues would greatly improve the manuscript. Thus, we recommend that you revise your manuscript to address the reviews.

      Thank you. We agree with this recommendation of the reviewers have greatly expanded our discussion on cue stability as already indicated above. 

      Should you choose to revise your manuscript, pleasse ensure the manuscript include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      Done

      Lastly, I want to personally apologize for the long delay in editing this manuscript. All three reviews were unfortunately quite delayed, including my own review. I want to thank you for submitting your work to eLife and hope that we can be more efficient in editing your work in the future.

      It was a long review process, but we also appreciate that our article was dense and difficult to read. We tried to be comprehensive in our controls and analyses and we appreciate the considerable effort it must have taken to carefully review our paper.

      Reviewer #3 (Recommendations For The Authors):

      I quite enjoyed this paper and have some suggestions for further improvement.

      First, while I appreciate that the format of the journal has Methods at the end, there are some key details that need to be moved forward in the study for proper appreciation of the results. These include:

      (1) Location and size of distal cues.

      Done

      (2) Use of floor washing between mice.  

      Done

      (3) Use of food across the subfloor to provide some masking of the location of the food reward.

      Done

      (4) A scale bar on one of the early figures showing the apparatus would be beneficial.

      Done for Figure 1 where we also provide arena diameter and area.

      (5) Motivational state of the mouse with respect to the food reward (in this case, not food restricted, correct?).

      Done

      Although we are told the trial where learning is defined to have occurred, we were not given the quantitative criterion operationalizing "learning" - please provide (unless I missed it!).

      Thank you.  This question turned out to be of importance and led to more detailed analyses and related Discussion. We therefore answer in depth.

      We now realize that learning the distance to food versus learning the direction to food must be analyzed separately.

      On Page 5 second paragraph we provide a definition of “learning distance to food”.

      “Fitting the function dtotal \= B*exp(-Trial/K) reveals the characteristic timescale of learning, K, in trial units (Fig. 2F). We obtained K= 26±24 giving a coefficient of variation (CV) of 0.92. The mean, K=26, is therefore very uncertain and far greater than the actual number of trials. Thus, we hypothesize that the mice did not significantly reduce their distance travelled (Fig. 2A,B,F) because they had not learned the food location – the decrease in latency (Fig. 2D) was due to its increased running speed and familiarity with non-spatial task parameters. ”

      On Page 7 second paragraph the same analysis gives:

      “Now the fitting of the function dtotal\=B exp(-Trial/K) yielded K\=5.6±0.5 with a CV = 0.08; the mean is therefore a reliable estimate of total distance travelled. We interpret this to indicate that it takes a minimum number of K= 6 trials for learning the distance to the target (see also Fig. S4D,E,F,G).

      Learning is still not complete because it takes 14 trials before the trajectories become near optimal.”

      Learning of distance to food is evident by Trial 6 but is not complete.

      On Page 9 third paragraph we give a very precise answer to time taken to learn the direction from start to food. This was already very clear from Fig. 4I but we had missed the significance of this result. 

      “We compared the deviation between the TEV and the true target vector (that points from start directly to the food hole; Fig. 4I). While the random entrance mice had a persistent deviation between TEV and target of more than 70o, the static entrance mice were able to learn the direction of the target almost perfectly by trial 6 (TEV-target deviation in first trial mean±SD = 57.27o ± 41.61o; last trial mean±SD = 5.16o ± 0.20o; P=0.0166). A minimum of 6 trials is sufficient for learning both the direction and distance to food (Fig. 4I) (Fig. 3F) (see Discussion). The kinetics of learning direction to food are clearly different from learning distance to food since the direction to food remains stable after Trial 6 while the distance to food continues to approach the optimal value.”

      Learning the direction from start to food is completely learned by Trial 6. 

      These analyses led to an addition to the Discussion on Page 20 (following the Heading).

      “Here we follow a review by Knierim and Hamilton (12) that hypothesized independent mechanisms for extraction of target direction versus target distance information. Our data strongly supports their hypothesis. Target direction is nearly perfectly estimated at trial 6 (Fig. 4I and Results). The deviation of the TEV from the start to food vector is rapidly reduced to its minimal value (5.16o) and with minimal variability (SD=0.20o). Learning the distance from start to food is also evident at trial 6 but only reaches an asymptotic near optimal value at trial 14 (Fig. 3F). The learning dynamics are therefore very different for target direction versus target distance. As noted below, the food direction is likely estimated from the activity of head direction cells. The neural mechanisms by which distance from start to food is estimated are not known (but see (49)).”

      We believe that this small addition summarizes the complicated answer to the reviewer’s question and is helpful in better connecting the Knierim and Hamilton paper to our data. However, if the reviewers and editors feel that we have gone too far or that this discussion is not clear, we can remove or alter the extra sentences as per any comments. 

      Reference #49 is to a review paper on spatial learning in weakly electric fish in the dark (https://doi.org/10.1016/j.conb.2021.07.002). The review summarizes data on a neural “time stamp” mechanism for estimating distance from start to food. In this review article, we explicitly hypothesized that rodents might utilize such a time stamp mechanism for finding food. We did not include this in the discussion because it was too distracting and would likely confuse readers but put in the reference in case some readers did want to access the “time stamp” hypothesis for spatial learning in the dark. 

      Second, the discussion was thoughtful and rich. I particularly enjoyed the segment describing the likely computations of the hippocampus. There are a few thoughts I have for the authors to think about that might be useful to potentially add to the discussion:

      "The remaining one, mouse 34, went from B to the start location and then, to A."

      This out-and-back pattern has been seen in the literature, such as multiple papers by Golani (here's one: https://www.pnas.org/doi/full/10.1073/pnas.0812513106). Would the authors speculate, given their suggested algorithm, what the significance of out and back may be? Is there something about the cell's encoding of direction and distance that requires a return to the start location, and would this be different if representation is based on self-motion versus based on distal cues in an allocentric representation?

      We do discuss this for pretraining trials but have no idea what this mouse is doing in this case.

      In a low-stakes task environment, for an animal that has a low acuity visual system, where the penalty for not using distal cues is at most some additional (likely enriching in itself to these mice who live a fairly unenriched life in small cages) search/learning/exploration time, perhaps it is not so surprising that body-frame cues are used. Considering the ethology of the animal, if it had multiple exits of an underground burrow, it might need to use distal cues to avoid confusion. The scenario you provide to the animal is essentially a deceptive one where it has no way of telling it is coming out to the arena from a different burrow hole, modulo some small landmarks on an otherwise uniform cylinder of space. This might be asking too much of an animal where the space it would enter normally would not be a uniform cylinder.

      What happens with a higher-stakes case? This is clearly a different study, but you may find some recent work with a mobile predatory robot of interest (https://www.sciencedirect.com/science/article/pii/S2211124723016820). Visual cues are crucial in the avoidance of threats in this case. Re-routing, as shown by multiple videos of that study, is after a brief pause, and seemingly takes into account the likely future position of the threat.

      Done. A fascinating paper that illustrates the unexpected “high level” behavior a rodent is capable of when placed in more naturalistic situations. I think our “two food location” experiments are along the same direction – unexpected rich behavior when the mouse are challenged.

      Connected to the low-stakes vs high-stakes point, it might be nice for the paper to discuss situations in which cognitive-map-based spatial problem solutions make sense versus not.

      Here is an example of such a discussion, around page 496:

      https://www.dropbox.com/scl/fi/ayoo5w4jgnkblgfu7mpad/MacI09a_situated_cog.pdf?

      rlkey=2qhh89ii7jbkavt6ivevarvdk&dl=0.

      Right a very relevant discussion by MacIver. However, when I tried to write it in it took nearly half a page of dense writing to connect to the themes of our article. I figured that the already long discussion will try the patience of most readers and so decided to not include this extra discussion.

      Minor points/ queries

      Why the increase in sample density at about the 1/4 radius of arena distance? Static, trial 14, Figure 3I, shown also maybe Figure 4 H.

      We were also puzzled when this occurred but have no explanation. And there are, in our figures, many other examples of the mice hole checking near their exit site. See next answer.

      Why was the hole proximal to start so often probed in 7B?

      We were also puzzled when this occurred but have no explanation.

      Check Video 1 to exactly see this behavior. The mouse exits its home and immediately checks a nearby hole. It proceeds to Site B (empty) and then Site A (empty) with many hole checks along the way. After leaving Site A, the mouse proceeds to the wall located far from an entrance and does another hole check. The near the wall holes that are checked are in no way remarkable: a) they have never contained food; b) they are rotated between trials, and we wash the floor carefully, so they do not “smell” any particular hole; c) the food on the lower level floor is in no way “clumped” under that hole, etc.

      We have discussed this phenomenon quite a lot and LM was able to come up with only one hypothesis for this behavior. In analogy to the electric fish work (responses of diencephalic neurons to “leaving or encountering a landmark”), the “near the entrance” hole check might be an active sensing probe to “time stamp” the exit from home while finding food would “time stamp” the end of a successful trajectory. Path integration between time stamps would then provide the estimate for time/distance from start to food – exactly our hypothesis for weakly electric fish spatial learning in the dark. This hypothesis is exceedingly speculative and so we do not want to include it.  

      Normally I would cite a line number. Since I do not see line numbers, I will leave it to you to do a search:

      "A than the expected by chance" -> "than expected"

      Done. I apologize for the lack of line numbers. I have, so far, been unable to get Word to confine line numbers to selected text and not run over onto the Figure Legends. I have put in page numbers and hope this helps.

      RW, VR, MWM, etc - please expand the acronym on first use.

      Done

      It might be interesting to see differences in demand/reliance on active sensing in the individuals who learn the task less well than the animals who learn the task well. If the point is to expunge uncertainty, then does the need for such expunging increase with the poverty of internal representation resolution / fewer decimal places on the internal TEV calculation?

      We do have variation in the mice learning time but the numbers are not sufficient for this interesting extension. This is just one of many follow up studies we hope to carry out.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      In the article by Dearlove et al., the authors present evidence in strong support of nucleotide ubiquitylation by DTX3L, suggesting it is a promiscuous E3 ligase with capacity to ubiquitylate ADP ribose and nucleotides. The authors include data to identify the likely site of attachment and the requirements for nucleotide modification. 

      While this discovery potentially reveals a whole new mechanism by which nucleotide function can be regulated in cells, there are some weaknesses that should be considered. Is there any evidence of nucleotide ubiquitylation occurring cells? It seems possible, but evidence in support of this would strengthen the manuscript. The NMR data could also be strengthened as the binding interface is not reported or mapped onto the structure/model, this seems of considerable interest given that highly related proteins do have the same activity. 

      The paper is for the most part well well-written and is potentially highly significant 

      Comments on revised version: 

      The revised manuscript has addressed many of the concerns raised and clarified a number of points. As a result the manuscript is improved. 

      The primary concern that remains is the absence of biological function for Ub-ssDNA/RNA and the inability to detect it in cells. Despite this the manuscript will be of interest to those in the ubiquitin field and will likely provoke further studies and the development of tools to better assess the cellular relevance. As a result this manuscript is important. 

      We agree with the reviewer’s assessment.

      Minor issue: 

      Figure 1A - the authors have now included the constructs used but it would be more informative if the authors lined up the various constructs under the relevant domains in the full-length protein. 

      Figure 1 will be fixed in the Version of Record.

      Reviewer #2 (Public Review):

      The manuscript by Dearlove et al. entitled "DTX3L ubiquitin ligase ubiquitinates single-stranded nucleic acids" reports a novel activity of a DELTEX E3 ligase family member, DTX3L, which can conjugate ubiquitin to the 3' hydroxyl of single-stranded oligonucleotides via an ester linkage. The findings that unmodified oligonucleotides can act as substrates for direct ubiquitylation and the identification of DTX3 as the enzyme capable of performing such oligonucleotide modification are novel, intriguing, and impactful because they represent a significant expansion of our view of the ubiquitin biology. The authors perform a detailed and diligent biochemical characterization of this novel activity, and key claims made in the article are well supported by experimental data. However, the studies leave room for some healthy skepticism about the physiological significance of the unique activity of DTX3 and DTX3L described by the authors because DTX3/DTX3L can also robustly attach ubiquitin to the ADP ribose moiety of NAD or ADP-ribosylated substrates. The study could be strengthened by a more direct and quantitative comparison between ubiquitylation of unmodified oligonucleotides by DTX3/DTX3L with the ubiquitylation of ADP-ribose, the activity that DTX3 and DTX3L share with the other members of the DELTEX family.

      Comment on revised version:

      In my opinion, reviewers' comments are constructively addressed by the authors in the revised manuscript, which further strengthens the revised submission and makes it an important contribution to the field. Specifically, the authors perform a direct quantitative comparison of two distinct ubiquitylation substrates, unmodified oligonucleotides and fluorescently labeled NADH and report that kcat/Km is 5-fold higher for unmodified oligos compared to NADH. This observation suggests that ubiquitylation of unmodified oligos is not a minor artifactual side reaction in vitro and that unmodified oligonucleotides may very well turn out to be the physiological substrates of the enzyme. However, the true identity of the physiological substrates and the functionally relevant modification site(s) remain to be established in further studies. 

      We agree with the reviewer’s assessment.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In the article by Dearlove et al., the authors present evidence in strong support of nucleotide ubiquitylation by DTX3L, suggesting it is a promiscuous E3 ligase with capacity to ubiquitylate ADP ribose and nucleotides. The authors include data to identify the likely site of attachment and the requirements for nucleotide modification. 

      While this discovery potentially reveals a whole new mechanism by which nucleotide function can be regulated in cells, there are some weaknesses that should be considered. Is there any evidence of nucleotide ubiquitylation occurring cells? It seems possible, but evidence in support of this would strengthen the manuscript. The NMR data could also be strengthened as the binding interface is not reported or mapped onto the structure/model, this seems of considerable interest given that highly related proteins do have the same activity. 

      The paper is for the most part well well-written and is potentially highly significant, but it could be strengthened as follows: 

      (1) The authors start out by showing DTX3L binding to nucleotides and ubiquitylation of ssRNA/DNA. While ubiquitylation is subsequently dissected and ascribed to the RD domains, the binding data is not followed up. Does the RD protein alone bind to the nucleotides? Further analysis of nucleotide binding is also relevant to the Discussion where the role of the KH domains is considered, but the binding properties of these alone have not been analysed. 

      We thank the reviewer for the suggestion. We have tested DTX3L RD for ssDNA binding using NMR (see Figure 4A and Figure S2), which showed that DTX3L RD binds ssDNA. We have now tested the DTX3L KH domains for RNA/ssDNA binding using an FP experiment. However, the FP experiment did not show significant changes upon titrating RNA/ssDNA, suggesting that the KH domains alone are not sufficient to bind RNA/ssDNA. We have opted to put this data in the response-to-review as future investigation will be required to examine whether other regions of DTX3L cooperate with RD to bind RNA/ssDNA. We have revised the Discussion on the KH domains. We now state that “Our findings show the DTX3L DTC domain binds nucleic acids but whether the KHL domains contribute to nucleic acid binding requires further investigation.”

      Author response image 1.

      Fold change of fluorescence polarisation of 6-FAM-labelled ssDNA D4 upon titrating with DTX3L variants. DTX3L KH domain fragments were expressed with a N-terminal His-MBP tag to increase the molecular weight to enhance the signal.

      (2) With regard to the E3 ligase activity, can the authors account for the apparent decreased ubiquitylation activity of the 232-C protein in Figure 1/S1 compared to FL and RD? 

      We found that the 232-C protein batch used in the assay was not pure and have subsequently re-purified the protein. We have repeated the ubiquitination of ssDNA and RNA (Fig. 1H and 1I) and 232-C exhibited similar activity as WT. Furthermore, we performed autoubiquitination (Fig. S1G) and E2~Ub discharge assay (Fig. S1H) to compare the activity. 232-C was slower in autoubiquitination (Fig. S1G), but showed similar activity in the E2~Ub discharge assay as WT. These findings suggest that the RING domain in 232-C is functional and 232-C likely lacks ubiquitination site(s) present in 1-231 region necessary for autoubiquitination.

      (3) Was it possible to positively identify the link between Ub and ssDNA/RNA using mass spectrometry? This would overcome issues associated with labels blocking binding rather than modification. 

      We have tried to use mass spectrometry to detect the linkage between Ub and ssDNA/RNA, but was unable to do so. We suspect that the oxyester linkage might be labile, posing a challenge for mass spectrometry techniques. Similarly, a recent preprint from Ahel lab, which utilises LC-MS, detects the Ub-NMP product rather than the linkage (https://www.biorxiv.org/content/10.1101/2024.04.19.590267v1.full.pdf).

      (4) Furthermore, can a targeted MS approach be used to show that nucleotides are ubiquitylated in cells? 

      This will require future development and improvement of the MS approach, specifically the isolation of labile oxyester-linked products from cells and the optimisation of the MS detection method.

      (5) Do the authors have the assignments (even partial?) for DTX3L RD? In Figure 4 it would be helpful to identify the peaks that correspond to the residues at the proposed binding site. Also do the shifts map to a defined surface or do they suggest an extended site, particularly for the ssDNA.

      We only collected HSQC spectra which was insufficient for assignments. We have performed a competition experiment using ADPr and labelled ssDNA, showing that ADPr competes against the ubiquitination of ssDNA (Figure 4D). We have also provided an additional experiment showing that ssDNA with a blocked 3’-OH can compete against ubiquitination of ADPr (Figure 4E). These data, together with our NMR analysis, further strengthen the evidence that ssDNA and ADPr compete the same binding pocket in DTX3L RD. Understanding how DTX3L RD binds ssDNA/RNA is an ongoing research in the lab.

      (6) Does sequence analysis help explain the specificity of activity for the family of proteins? 

      We have performed sequence alignment and structure comparison of DTX proteins using both RING and DTC domains (Fig. S3). These analyses showed that DTX3 and DTX3L RING domains lack a N-terminal helix and two loop insertions compared to DTX1, DTX2 and DTX4. These additions make DTX1, DTX2 and DTX4 RING domain larger than DTX3L and DTX3. It is not clear how these would influence the orientation of the recruited E2~Ub. Comparison of the DTC domain showed that DTX1, DTX2 and DTX4 contain an Ala-Arg motif, which causes a bulge at one end of DTC pocket. In the absence of Ala-Arg motif, DTC pockets of DTX3 and DTX3L contain an extended groove which might accommodate one or more of the nucleotides 5' to the targeted terminal nucleotide. It seems that both features of RING and DTC domains might attribute to the specificity of DTX3L and DTX3. We have included these comparisons in the discussion and suggested that future structural characterization is necessary to unveil the specificity.

      (7) While including a summary mechanism (Figure 5I) is helpful, the schematic included does not necessarily make it easier for the reader to appreciate the key findings of the manuscript or to account for the specificity of activity observed. While this figure could be modified, it might also be helpful to highlight the range of substrates that DTX3L can modify - nucleotide, ADPr, ADPr on nucleotides etc. 

      We have modified this Figure to include the range of substrates.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Dearlove et al. entitled "DTX3L ubiquitin ligase ubiquitinates single-stranded nucleic acids" reports a novel activity of a DELTEX E3 ligase family member, DTX3L, which can conjugate ubiquitin to the 3' hydroxyl of single-stranded oligonucleotides via an ester linkage. The findings that unmodified oligonucleotides can act as substrates for direct ubiquitylation and the identification of DTX3 as the enzyme capable of performing such oligonucleotide modification are novel, intriguing, and impactful because they represent a significant expansion of our view of the ubiquitin biology. The authors perform a detailed and diligent biochemical characterization of this novel activity, and key claims made in the article are well supported by experimental data. However, the studies leave room for some healthy skepticism about the physiological significance of the unique activity of DTX3 and DTX3L described by the authors because DTX3/DTX3L can also robustly attach ubiquitin to the ADP ribose moiety of NAD or ADP-ribosylated substrates. The study could be strengthened by a more direct and quantitative comparison between ubiquitylation of unmodified oligonucleotides by DTX3/DTX3L with the ubiquitylation of ADP-ribose, the activity that DTX3 and DTX3L share with the other members of the DELTEX family. 

      Strengths: 

      The manuscript reports a novel and exciting observation that ubiquitin can be directly attached to the 3' hydroxyl of unmodified, single-stranded oligonucleotides by DTX3L. The study builds on the extensive expertise and the impactful previous studies by the Huang laboratory of the DELTEX family of E3 ubiquitin ligases. The authors perform a detailed and diligent biochemical characterization of this novel activity, and all claims made in the article are well supported by experimental data. The manuscript is clearly written and easy to read, which further elevates the overall quality of submitted work. The findings are impactful and will help illuminate multiple avenues for future follow-up investigations that may help establish how this novel biochemical activity observed in vitro may contribute to the biological function of DTX3L. The authors demonstrate that the activity is unique to the DTX3/DTX3L members of the DELTEX family and show that the enzyme requires at least two single-stranded nucleotides at the 3' end of the oligonucleotide substrate and that the adenine nucleotide is preferred in the 3' position. Most notably, the authors describe a chimeric construct containing RING domain of DTX3L fused to the DTC domain DTX2, which displays robust NAD ubiquitylation, but lacks the ability to ubiquitylate unmodified oligonucleotides. This construct will be invaluable in the future cell-based studies of DTX3L biology that may help establish the physiological relevance of 3' ubiquitylation of nucleic acids. 

      Weaknesses: 

      The main weakness of the study is in the lack of direct evidence that the ubiquitylation of unmodified oligonucleotides reported by the authors plays any role in the biological function of DTX3L. The study leaves plenty of room for natural skepticism regarding the physiological relevance of the reported activity, because, akin to other DELTEX family members, DTX3 and DTX3L can also catalyze attachment of ubiquitin to NAD, ADP ribose and ADP-ribosylated substrates. Unfortunately, the study does not offer any quantitative comparison of the two distinct activities of the enzyme, which leaves plenty of room for doubt. One is left wondering, whether ubiquitylation of unmodified oligonucleotides is just a minor and artifactual side activity owing to the high concentration of the oligonucleotide substrates and E2~Ub conjugates present in the in-vitro conditions and the somewhat lower specificity of the DTX3 and DTX3L DTC domains (compared to DTX2 and other DELTEX family members) for ADP ribose over other adenine-containing substrates such as unmodified oligonucleotides, ADP/ATP/dADP/dATP, etc. The intriguing coincidence that DTX3L, which is the only DTX protein capable of ubiquitylating unmodified oligonucleotides, is also the only family member that contains nucleic acid interacting domains in the N-terminus, is suggestive but not compelling. A recently published DTX3L study by a competing laboratory (PMID: 38000390), which is not cited in the manuscript, suggests that ADP-ribose-modified nucleic acids could be the physiologically relevant substrates of DTX3L. That competing hypothesis appears more convincing than ubiquitylation of unmodified oligonucleotides because experiments in that study demonstrate that ubiquitylation of ADP-ribosylated oligos is quite robust in comparison to ubiquitylation of unmodified oligos, which is undetectable. It is possible that the unmodified oligonucleotides in the competing study did not have adenine in the 3' position, which may explain the apparent discrepancy between the two studies. In summary, a quantitative comparison of ubiquitylation of ADP ribose vs. unmodified oligonucleotides could strengthen the study. 

      We thank the reviewer for the constructive feedback. We agree that evidence for the biological function is lacking. While we have tried to detect Ub-ssDNA/RNA from cells, we found that isolating and detecting labile oxyester-linked Ub-ssDNA/RNA products remain challenging due to (1) low levels of Ub-ssDNA/RNA products, (2) the presence of DUBs and nucleases that rapidly remove the products during the experiments, and (3) our lack of a suitable MS approach to detect the product. For these reasons, we feel that discovering the biological function will require future effort and expertise and is beyond the scope of our current manuscript.

      In the manuscript (PMID: 38000390), the authors used PARP10 to catalyse ADP-ribosylation onto 5’-phosphorylated ssDNA/RNA. They used the following sequences which lacks 3’-adenosine, which could explain the lack of ubiquitination.

      E15_5′P_RNA [Phos]GUGGCGCGGAGACUU

      E15_5′P_DNA [Phos]GTGGCGCGGAGACTT

      We have performed the experiment using this sequence to verify this (see Author response image 2 below). We have cited this manuscript but for some reasons, Pubmed has updated its published date from mid 2023 to Jan 2024. We have updated the Endnote in the revised manuscript.

      Author response image 2.

      Fluorescently detected SDS-PAGE gel of in vitro ubiquitination catalysed by DTX3L-RD in the presence ubiquitination components and 6-FAM-labelled ssDNA D4 or D31.

      We agree that it is crucial to compare ubiquitination of oligonucleotides and ADPr by DTX3L to find its preferred substrate. We have challenged oligonucleotide ubiquitination by adding excess ADPr and found that ADPr efficiently competes with oligonucleotide (Figure 4D). We have also performed an experiment showing that ssDNA with a blocked 3’-OH can compete against ubiquitination of ADPr (Figure 4E). These data support that ADPr and ssDNA compete for the same binding site on DTX3L.

      We also performed kinetic analysis of ubiquitination of fluorescently labelled ssDNA (D4) and NAD+ by DTX3L-RD (Fig. 4F and Fig. S2D–G) to assess substrate preferences. Here, we used fluorescent-labelled NAD+ (F-NAD+) in place of ADPr as labelled NAD+ is commercially available. With the known concentration of fluorescently labelled ssDNA and NAD+ as the standard, we could estimate the rate of ubiquitinated product formation across different substrate concentrations. We have included this finding in the main text “DTX3L-RD displayed _k_cat value of 0.0358 ± 0.0034 min-1 and a _K_m value of 6.56 ± 1.80 mM for Ub-D4 formation, whereas the Michaelis-Menten curve did not reach saturation for Ub-F-NAD+ formation (Fig. 4F and fig. S2, D-G). Comparison of the estimated catalytic efficiency (_k_cat/_K_m = 5457  M-1 min-1 for D4 and estimated _k_cat/_K_m = 1190  M-1 min-1 for F-NAD+; Fig. 4F) suggested that DTX3L-RD exhibited 4.5-fold higher catalytic efficiency for D4 than F-NAD+. This difference primarily results from a better _K_m value for D4 compared to F-NAD+. Although DTX3L-RD showed weak _K_m for F-NAD+, it displays a higher rate for converting F-NAD+ to Ub-F-NAD+ at higher substrate concentration (Fig. 4F). Thus, substrate concentration will play a role in determining the preference.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Writing/technical points: 

      (1) The introduction is relatively complex and the last paragraph, which reviews the discoveries on the paper, is long. It may be helpful to highlight the significance and frame the experiments as what they have addressed, rather than detailing each set of experiments completed. 

      We have modified the last paragraph in the introduction to highlight the major discovery of our work.

      (2) Line 24, Abstract. 'Its N-terminal region' is not obvious 

      We have changed “Its N-terminal region” to “the N-terminal region of DTX3L”.

      (3) Line 44 - split sentence to emphasize E3 ligase point? 

      We have modified the sentence as suggested.

      (4) Figures 1B and 1C could be larger - currently they are not very helpful. Also atoms (ADPr?) are shown, but not indicated in the legend or labelled on the panel. 

      We have enlarged Figures 1B and 1C and indicated RNA on the structure.

      (5) The structure of the D2 domain of DTX3L has recently been reported (Vela-Rodriguez et al). It might be helpful to comment on this manuscript. 

      We have now commented on D2 domain in the results section and in the discussion.

      (6) It would be helpful to indicate the DTX3L constructs used in Figure 1a. 

      We have included all DTX3L constructs used in Figure 1a.

      (7) Interpretation of Figure 4A is difficult, the authors may wish to consider other ways to visualize the data. 

      We have now removed the black arrow in Figure 4A as it was confusing. Instead, we drew a black box on the cross-peak where the close-up views are shown in Figures 4B and 4C.

      (8) Figure 4A. Please indicate which binding partner is highlighted by red/black arrows. 

      We have removed black arrow. The red arrows indicate cross-peaks which undergo chemical shift perturbation when DTX3L-RD was titrated with ssDNA or ADPr, highlighting their binding sites on DTX3L-RD overlap.

      (9) Line 284 - please indicate the bulge in Figure S3. 

      We have indicated the bulge on Figure S3.

      (10) Aspects of the discussion are speculative, given that evidence of Ub conjugated to nucleotides in cells is yet to be obtained and the functional consequences of modification are uncertain. 

      We understand that the discussion on the potential roles of ubiquitination of ssNAs is speculative. We have now modified it to: “Based on the known functions of the DTX3L/PARP9 complex and the findings of this study, we propose several hypotheses for future research”, so that readers will understand that these are speculative.

      (11) Line 295 onwards - this paragraph discusses the role of the KH domains in nucleotide binding, but it is not clear that the authors have directly demonstrated that the KH domains bind nucleotides as all constructs used in the binding experiments in Figure 1/S1 include the RING-DTC domains. 

      We found that KH domains alone did not bind ssDNA or RNA. We have modified line 295. This section now reads “Typically, KH domains contain a GXXG motif within the loop between the first and second α helix (22). However, analysis of the sequence of the KHL domains in DTX3L shows these domains lack this motif. Multiple studies have shown that mutation in this motif abolishes binding to nucleic acids (23-26). Our findings show the DTX3L DTC domain binds nucleic acids but whether the KHL domains contribute to nucleic acid binding requires further investigation. Additionally, the structure of the first KHL domain was recently reported and shown to form a tetrameric assembly (20). Our analysis with DTX3L 232-C, which lacks the first KHL domain and RRM, indicate that it can still bind ssDNA and ssRNA. Despite this, a more detailed analysis will be required to determine whether oligomerization plays a role in nucleic acid binding and ubiquitination.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Tian et al. describe how TIPE regulates melanoma progression, stemness, and glycolysis. The authors link high TIPE expression to increased melanoma cell proliferation and tumor growth. TIPE causes dimerization of PKM2, as well as translocation of PKM2 to the nucleus, thereby activating HIF-1alpha. TIPE promotes the phosphorylation of S37 on PKM2 in an ERK-dependent manner. TIPE is shown to increase stem-like phenotype markers. The expression of TIPE is positively correlated with the levels of PKM2 Ser37 phosphorylation in murine and clinical tissue samples. Taken together, the authors demonstrate how TIPE impacts melanoma progression, stemness, and glycolysis through dimeric PKM2 and HIF-1alpha crosstalk.

      Strengths:

      The authors manipulated TIPE expression using both shRNA and overexpression approaches throughout the manuscript. Using these models, they provide strong evidence of the involvement of TIPE in mediating PKM2 Ser37 phosphorylation and dimerization. The authors also used mutants of PKM2 at S37A to block its interaction with TIPE and HIF-1alpha. In addition, an ERK inhibitor (U0126) was used to block the phosphorylation of Ser37 on PKM2. The authors show how dimerization of PKM2 by TIPE causes nuclear import of PKM2 and activation of HIF-1alpha and target genes. Pyridoxine was used to induce PKM2 dimer formation, while TEPP-46 was used to suppress PKM2 dimer formation. TIPE maintains stem cell phenotypes by increasing the expression of stem-like markers. Furthermore, the relationship between TIPE and Ser37 PKM2 was demonstrated in murine and clinical tissue samples.

      Weaknesses:

      The evaluation of how TIPE causes metabolic reprogramming can be better assessed using isotope tracing experiments and improved bioenergetic analysis.

      Thank you very much for your suggestions. Unfortunately, we cannot complete the isotope tracing experiments due to the lack of instruments, nor with the help of the company after consulting several companies. We are very sorry for this imperfect experiment, and we have discussed this disadvantage in our manuscripts. Moreover, due to our negligence, there was only three metabolites were presented in the previous manuscripts. However, we have performed the routine untargeted metabolomics to demonstrate how TIPE causes metabolic reprogramming. We have added the detailed results as a new figure named as Figure S3, in which, the glycolysis pathway particularly pyruvate and lactic acid is decreased after TIPE interference.

      Reviewer #2 (Public Review):

      In this article, Tian et al present a convincing analysis of the molecular mechanisms underpinning TIPE-mediated regulation of glycolysis and tumor growth in melanoma. The authors begin by confirming TIPE expression in melanoma cell lines and identify "high" and "low" expressing models for functional analysis. They show that TIPE depletion slows tumour growth in vivo, and using both knockdown and over-expression approaches, show that this is associated with changes in glycolysis in vitro. Compelling data using multiple independent approaches is presented to support an interaction between TIPE and the glycolysis regulator PKM2, and the over-expression of TIPE-promoted nuclear translocation of PKM2 dimers. Mechanistically, the authors also demonstrate that PKM2 is required for TIPE-mediated activation of HIF1a transcriptional activity, as assessed using an HRE-promoter reporter assay, and that TIPE-mediated PKM2 dimerization is p-ERK dependent. Finally, the dependence of TIPE activity on PKM2 dimerization was demonstrated on tumor growth in vivo and in the regulation of glycolysis in vitro, and ectopic expression of HIF1a could rescue the inhibition of PKM2 dimerization in TIPE overexpressing cells and reduced induction of general cancer stem cell markers, showing a clear role for HIF1a in this pathway. The main conclusions of this paper are well supported by data, but some aspects of the experiments need clarification and some data panels are difficult to read and interpret as currently presented.

      The detailed mechanistic analysis of TIPE-mediated regulation of PKM2 to control aerobic glycolysis and tumor growth is a major strength of the study and provides new insights into the molecular mechanisms that underpin the Warburg effect in cancer cells. However, despite these strengths, some weaknesses were noted, which if addressed will further strengthen the study.

      (1) The analysis of patient samples should be expanded to more directly measure the relationship between TIPE levels and melanoma patient outcome and progression (primary vs metastasis), to build on the association between TIPE levels and proliferation (Ki67) and hypoxia gene sets that are currently shown.

      Thanks for your suggestions, we have added the relationship between TIPE levels and progression (non-lymph node metastasis vs lymph node metastasis). In addition, we added the association between TIPE and Ki67 or LDH levels as your advised, as shown in Figure 7.

      However, the relationship between TIPE levels and melanoma patient outcome is not presented in this article. One reason is that the tissue microarray lack of the survival data. Interestingly, the TCGA dataset showed that the higher TIPE expression has a favorable prognosis for melanoma. We are also very curious about this. Our following study indicated that TIPE might serve as a positive regulator of PD-L1. Therefore, the higher expression of TIPE presents more sensitive tendency to immunotherapy, resulting in a favorable prognosis in melanoma. The detailed mechanisms will be discussed in our following article, and we hope that it might as a continuous research topic for TIPE in melanoma.

      We just only disclose a little information that TIPE has a similar survival and immune signature to PD-L1 and PD-1 in melanoma as following:

      Author response image 1.

      (2) The duration of the in vivo experiments was not clearly defined in the figures, however, it was clear from the tumor volume measurements that they ended well before standard ethical endpoints in some of the experiments. A rationale for this should be provided because longer-duration experiments might significantly change the interpretation of the data. For example, does TIPE depletion transiently reduce or lead to sustained reductions in tumor growth?

      Thanks for your suggestions. Actually, we have performed a pre-experiment before the formal experiments, and all the time points were referred to this. Furthermore, we have added the detailed time points into the figure legends as you suggested.

      (3) The analysis of general cancer stem cell markers is solid and interesting, however inclusion of neural crest stem cell markers that are more relevant to melanoma biology would greatly strengthen this aspect of the study.

      Thanks for your advices. We have selected two neural crest stem cell markers including Nestin and Sox10 to test their expression after overexpression of TIPE in G361 cells or interference of TIPE in A375 cells.

      (4) The authors should take care that all data panels are clearly readable in the figures to facilitate appropriate interpretation by the reader.

      Thanks for your suggestions. We have amended the data panels according to you advises to ensure it is clear and professionally presented.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) In Figure 1D, glucose, pyruvate, and lactate were measured at a steady state. However, metabolites at steady state do not accurately depict changes in pathway activity. An isotope tracing experiment (i.e., using labelled 13C glucose) can be used to study glucose catabolism into pyruvate, as well as tracing into lactate or into the TCA cycle following changes in TIPE expression. In addition, although the authors point towards changes in metabolic reprogramming, only three metabolites were measured. The use of isotope tracing to monitor metabolites from more than one pathway would be suggested to support the claim that metabolism is being reprogrammed due to TIPE.

      Thank you very much for your suggestions. Unfortunately, we cannot complete the isotope tracing experiments due to the lack of instruments, nor with the help of the company after consulting several companies. We are very sorry for this imperfect experiment, and we have discussed this disadvantage in our manuscripts. Moreover, due to our negligence, there was only three metabolites were presented in the previous manuscripts. However, we have performed the routine untargeted metabolomics to demonstrate how TIPE causes metabolic reprogramming. We have added the detailed results as a new figure named as Figure S3, in which, the glycolysis pathway particularly pyruvate and lactic acid is decreased after TIPE interference.

      (2) In Figure 1H, extracellular acidification was used to determine glycolytic activity. However, bicarbonate secretion can also greatly affect pH, and should be considered (PMID 25449966). Although total ATP content was measured, the contribution of ATP from glycolysis can be also determined (see PMID 28270511) to provide a more accurate representation of glycolytic ATP production.

      Thanks for your suggestions again. As described at the above, we will improve our measurement methods in the future, and we have discussed our weakness in the manuscripts.

      (3) On page 5, lines 108-111, the authors show that "This process represents an important regulator of the TIPE family switching between oxidative phosphorylation and aerobic glycolysis, paving the way for cancer-specific metabolism in response to low-oxygen challenge." However, there is no data on oxidative phosphorylation. What is the effect of TIPE on oxygen consumption?

      Thanks for your careful and professional advices. We have conducted a thorough review of the manuscript for language accuracy and corrected this term to eliminate confusion and ensure the text is clear and professionally presented.

      Minor points

      (1) On page 3, line 68, it is unclear what is increasing lactate levels, as lactate can be transported inside of cells.

      Thanks for your suggestions, we have corrected this misdescription to improve the overall quality and readability of the manuscript.

      (2) In Figure 1B, RNA sequencing was performed on TIPE overexpressing G361 cells. The "ribosome" pathway has the highest count and lowest p-value. However, there is no mention of this in the text.

      Thanks for your suggestions, we selected aerobic glycolysis as our major story comprehensively according to the transcriptomics, metabolomics and the Co-IP/MS results. Anyway, the "ribosome" pathway as you pointed might is our next research topic in the future.

      (3) It would be helpful to include the cell line in Figure S1B-C as well as in the figure legend.

      Thanks for your suggestions, we have added the cell line into Figure S1B-C as well as in the figure legend.

      (4) Concerning supplementary figures, it would be helpful to include the panel numbers when referring to them in the main text (see line 120 or 122 as an example).

      Thanks for your suggestions, we have added the panel numbers when referring to them in the main text.

      (5) The sentence on lines 127-131 is very confusing.

      Thanks for your suggestions, we have corrected the improper descriptions as you mentioned.

      (6) In Figure S3, qPCR is misspelled in the figure legend. Also, it would be helpful to include what is meant by "relative expression" on the y-axis of Figure S3A.

      Thanks for your suggestions, we have corrected the errors as you pointed. Due to the y-axis represents the expression both of TIPE and HIF-1α, the present description might be more suitable.

      (7) There is an extra space on line 196.

      Thanks for your suggestions, we have corrected as you pointed.

      (8) In Figure 7E LDH staining was performed. Which isoform of LDH was detected?

      Actually, we stained total LDH in Figure 7E.

      (9) On line 931, Warburg is misspelled.

      Thanks for your suggestion, we have corrected all mentioned typos, including " Warburg " in lines 931.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      - Supplementary Figure 2G. Unit of time measurement for tumor growth panel needs to be defined. If this refers to days, 5 days is a relatively short period to assess tumor growth differences in vivo, and indeed, 1000-1200mm3 is a standard ethical end-point for these types of models, and this experiment was concluded well before reaching these tumor sizes. Can the authors explain why they ended this experiment at this timepoint?

      Thanks for your suggestions. As you suggested, we have added the detailed time points into the figure legends. Actually, we have performed a pre-experiment before the formal experiments, and all the time points were referred to this.

      - Supplementary Figure 2j - Correlation analysis between TIPE expression and overall survival outcome in melanoma patients is more relevant to support the experimental observations described in the paper than the correlation with Ki67. This analysis should also be provided. In addition, is there any difference in TIPE expression between primary and metastatic melanoma patients which would then more directly link TIPE with melanoma progression in patients?

      The relationship between TIPE levels and melanoma patient outcome is not presented in this article. One reason is that the tissue microarray lack of the survival data. Interestingly, the TCGA dataset showed that the higher TIPE expression has a favorable prognosis for melanoma. We are also very curious about this. Our following study indicated that TIPE might serve as a positive regulator for PD-L1. Therefore, the higher expression of TIPE presents more sensitive tendency to immunotherapy, resulting in a favorable prognosis in melanoma. The detailed mechanisms will be discussed in our following article, and we hope that it might as a continuous research topic for TIPE in melanoma.

      Furthermore, we have added the relationship between TIPE levels and progression (non-lymph node metastasis vs lymph node metastasis), and Ki67 in Figure 7.

      - Figure 2 - The A2 domain protein represents a substantial reduction in the size of PKM2, which would likely have other structural effects that could affect interactions with TIPE. This should be discussed by the authors because, in this reviewer's opinion, the data presented do not shed light on the specific TIPE domain requirements for the interaction with PKM2.

      Thanks for your suggestions. We have discussed this phenomenon in our manuscripts.

      - Figure 4: The authors show that PKM2 recruitment to the promoters of GLUT1 and LDHA is induced by TIPE expression. Is HIF1a recruitment also induced by TIPE? This is a key gap in the detailed molecular analysis provided by the authors.

      Thanks for your suggestions. This phenomenon you mentioned is very interesting, however, the expression of GLUT1 and LDHA was completely decreased when we overexpression of TIPE and PKM2 (S37A) compared to overexpression of TIPE and wild PKM2. Therefore, we believe that the higher expression of GLUT1 and LDHA was primarily promoted by TIPE-induced PKM2 recruitment.

      - Figure 6: The authors present nice data for general pluripotency/stem cell markers however given melanocytes arise from the neural crest, and neural crest markers are expressed during melanoma initiation and response to therapies, analysis of neural crest stem cell markers would be appropriate to include in this analysis. For example, Sox10, Pax3, NGFR, and AQP2 have all been identified as neural crest stem cell markers expressed in both melanoma patients and experimental models.

      Thanks for your advices. We have selected two neural crest stem cell markers including Nestin and Sox10 to test their expression after overexpression of TIPE in G361 cells or interference of TIPE in A375 cells.

      Minor comments:

      - All Figure and Supplementary Figure legends should indicate how many replicate experiments the data represents, and all error bars should be defined (StDev vs SEM).

      We have added as you suggested.

      - Supplementary Figure S1C - can the authors confirm the densitometry values on the western, as the band looks to be considerably larger than 1.6 fold higher compared to the control?

      We redone the densitometry measurement by ImageJ. However, the result still the same.

      - FACs panels in Supplementary Figure 2C-D are unreadable and should be enlarged.

      - Supplementary Figure S2i - quantification of Ki67 images appears warranted.

      - Supplementary Figure S2j - The text in the figure panel is too small and needs to be increased so the data can be interpreted accurately. Also, the authors should confirm the data is specifically from melanoma patients in the figure legend.

      We have improved the quality of the figures and revised their descriptions for greater clarity and coherence, ensuring that they effectively highlight the key results of our study.

      - Figure 1A - text on the heat map cannot be read. Gene-level information can be removed, and sample labels should be made larger. In panel D, no statistical analysis is shown for the metabolomics analysis. These should be added, or the authors should modify the text when referring to these data.

      We have improved the quality of the figures and revised their descriptions for greater clarity and coherence, ensuring that they effectively highlight the key results of our study.

      - Line 127: RNAseq data does not indicate a change in metabolites; text should be changed to say "TIPE dramatically promoted expression of genes...".

      We have corrected as you suggested.

      - Supplementary Figure S3c - Labels and correlation values are not readable.

      - Figure 2A - The text and details in the figure are difficult to read.

      - Figure S4 D-H - text in figure panels too small to read.

      Thank you for above three questions, we have carefully reviewed the entire document to ensure all figures are clear and correctly cited, preventing any confusion and maintaining the integrity of our research findings.

      - Figure 3 - the legend restates the major observations and interpretations of the figure, however does not contain enough information about what the data represents or how it was generated. The interpretation of the data should be made in the main text. For example, in panel 3. F-G the number of individual cells quantified for the analysis should be stated. In addition, given the data are generated from two completely independent cell lines, it would be more appropriate to have separate graphs for the A375 cells and G361 cells. The signal levels in the respective controls at baseline are very different, and plotted together without clear labels, making the reader question the validity of the data when this just reflects different basal signals in different cell models.

      We have separated the graphs for the A375 cells and G361 cells.

      - Figure 4 B-C - IgG controls are missing in Co-IP experiments.

      We have added the IgG controls as you suggested.

      - Figure 5F - The unit of measure of time should be indicated on the axes; is this days?

      We measured the tumor volumes for 7 times every 5 days. We have added the detailed description in the materials and methods section.

      - Line 348: error in text, mammosphere which should presumably be tumorsphere if from melanoma cells.

      Thanks for your suggestions, we have corrected this term to "tumorsphere" and conducted a thorough language and grammar review of the manuscript to ensure its professional presentation.

      - Methods: more experimental details for the transcriptomic, mass spec, and metabolomics studies should be provided. There are insufficient details if readers wish to repeat these experiments.

      Thanks for your suggestions, we have corrected as you advised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Lines 43 to 46 cannot be referred to as methodology: 

      "to investigate a) determinants of attribution; b) patterns of investigated events, including species and breed affected, history of previous abortion and recent stressful events, and the seasonality of cases; c) determinants of reporting, investigation and attribution; (d) cases in which zoonotic pathogens were detection". 

      The above should be deleted from the methodology.  

      The text is in the abstract and describes, in brief, analyses that we performed and the rationale for these analyses, which we consider relevant for understanding the approach.  As such, we think the text should remain.   

      Italicize et al. in the citations

      This has been done.

      Reviewer 2: 

      Data Presentation: While the analysis is comprehensive, the presentation of data could be enhanced with the use of more visual aids such as tables, graphs, or charts to illustrate key findings. 

      While further visualisation of findings would be possible, we consider the key results are captured effectively in the existing figures and tables.  Open access to the data also allows for further analyses that might be of interest to readers. 

      Discussion Section: The paper could benefit from a more in-depth discussion of the implications of the findings for disease control strategies and policy formulation in Tanzania. 

      We thank the reviewer for this important comment.  In most of the paragraphs of the Discussion we discuss the implications of the findings with specific reference, where relevant, to disease control in Tanzania.  For example, in the paragraph regarding human capacity building, we discuss how LFOs might be incentivised to report health events and how this could improve the reach and sensitivity of future surveillance platforms.  Similarly, these issues are discussed in other paragraphs of the Discussion. 

      Future Directions: Including recommendations for future research or areas for further investigation would add depth to the paper.

      This suggestion has been acted upon and we have added text in the conclusion to describe recommendations for future research.

      Reviewer 3:

      The thoughts of the authors on the topic and its significance are implied, and the methodological approach needs further clarity.  The number of wards in the study area, statistical selection of wards, type of questionnaire ie open or close-ended. Statistical analyses of outcomes were not clearly elucidated in the manuscript. 

      The number of wards and how they were selected (from randomly selected wards included in earlier cross-sectional exposure studies (Bodenham et al. 2021)) is described in the Abortion Surveillance Platform section of the Methods.  We have added description of the questionnaire to indicate that it was a mixture of open and closed questions. We have reviewed the statistical analyses and consider that they have been fully and appropriately described and so have not changed this. 

      Fifteen wards were mentioned in the text but 13 used what were the exclusion criteria. 

      As described, the study focussed on fifteen wards however two wards did not report any cases. As such, investigations only took place in thirteen of the fifteen wards and this has been described in the text. 

      Observations were from pastoral, agropastoral, and smallholder agroecological farmers. No sample numbers or questionnaires were attributed to the above farming systems to correlate findings with management systems. 

      As described, the 15 wards comprised five wards that were expected to be predominantly pastoral, three were expected to be predominantly agropastoral and seven expected to predominantly smallholder, and these categories were assigned by the research team following discussion with local experts (typically the district level veterinary officer) (Bodenham et al. 2021). As such, we consider this to be described sufficiently.  

      The impacts of the research investigation output are not clearly visible as to warrant intervention methods. 

      The aim of this paper was to provide insights on the feasibility and value of establishing a livestock abortion surveillance platform. The aetiological data that could be used to inform specific disease control measures or interventions was the focus of a previous paper (Thomas et al. 2022) as described in the text.    

      What were the identified pathogens from laboratory investigation, particularly with the use of culture and PCR not even mentioning the zoonotic pathogens encountered if any? 

      An earlier published paper describing the aetiology of the cases was mentioned (Thomas et al. 2022).  This paper fully describes the identified pathogens and the methods used for identification and attribution. Additionally, in the Sample Analysis section we describe the pathogens that were tested and the methods used.  In the section Exposure to Zoonotic Pathogens we specifically list Brucella spp. C. burnetiid, T. gondii and RVFV and so we consider that we have sufficiently described the pathogens tested for, the methods and the zoonotic pathogens detected. 

      The public health importance of any of the abortifacient agents was not highlighted. 

      The Introduction provides background information on the public health importance of abortifacient agents and we dedicate a whole section (Exposure to Zoonotic Pathogens) to the public health implications of the number of cases in which zoonotic pathogens were detected. Additionally, we discuss the implications of this in the Discussion. 

      Comments in manuscript itself:

      Line 230: Why are you estimating. The study was supposed to be based on real time abortion events or at least abortion events within 72 hours

      We were estimating the sensitivity of the platform by dividing the number of investigated abortion cases by the number of abortions for the livestock population in each of the study wards that would have been expected over the study period.  Because the denominator in this calculation was an expected number, and not a measured count, we can only estimate.

      236: In areas where there was no reported abortion event why will you estimate. This action will lead to false conclusion of abortion event in area that did have an event.

      We think there has been some misunderstanding of what this section of text was describing. We were not attributing a case to an area where there was none. Rather, as mentioned above, the aim of this particular analysis was to estimate the sensitivity of the platform. To achieve this, we needed to estimate what the expected number of abortion cases in each ward would have been. 

      279: Give a brief description of R

      A citation and some explanatory text have been added.

      348: Table 1: Your table did not show cases where estimate values were used

      We think this comment has resulted from the confusion described above regarding estimated cases.  Table 1 has summary data for the actual cases that were reported in the study and does not have the data for the estimated number of abortions that were expected to have occurred in each ward.  As described in line 247, this data is given in Supplementary Materials 3.

      404: Not clear, please rephase

      This sentence has been re-drafted to improve clarity

      467: Why are you numbering the findings of your investigation in your discussion? You have not told us about the previous abortion event in your study area prior to this study and why you embarked on this study in this regions. The current abortion event situation in your country based on other researchers work is missing and how your findings is important as it related to similar investigation elsewhere.

      We number the key findings for clarity and to make each finding distinct and so prefer to retain it. 

      The study area was chosen because it was the site of an earlier cross-sectional exposure study within which the wards were randomly selected.  As a result, thirteen of the fifteen wards targeted in the reported study were randomly selected.  Two additional wards were selected purposively because of strong existing relationships with the livestock-keeping community.  This was explained in the Methods in Lines 161 – 164. 

      Regarding livestock abortion in Tanzania, as explained in the Introduction (lines 112-114), there is little data on abortion in livestock in Tanzania and elsewhere. Nonetheless, in the Discussion, we do describe the results with respect to other abortion studies carried out in

      Ethiopia, Nigeria and India (lines 592-598). Moreover, as described in the Introduction (line 90-94), the implementation of syndromic or event-based surveillance in livestock is rare and to the authors’ knowledge has mostly been implemented in Europe, North America or Australasia with only a single pilot project identified in Africa.  

      494: Why will you use an estimate for abortion event that were not reported

      As described above, this comment reflects a misunderstanding of what was being described.  As written in line 494, an attempt was made to gauge the sensitivity of the surveillance platform by estimating the percentage of expected abortions that the investigated cases represented. That is, to estimate the percentage of abortions that the surveillance platform managed to detect, we divided the number of investigated abortions by the expected number of abortions (in each ward).  The method for this estimation was described in lines 228-238.  

      511: Why was farming pattern excluded. Livestock rearing condition is equally critical for this type of investigation example an animal reared intensive system farming method will definitely experience different stress than livestock on nomadic free range system

      We agree with the reviewer that livestock rearing system might be expected to impact both the aetiology and incidence of livestock abortion.  However, because the number of wards was small and the distribution across system not equal, any association between investigated cases and and livestock rearing system could not be assessed.  We have made this clearer with additional text in the same paragraph of the Discussion.

      529: Nothing was mentioned about educating the farmers or livestock owners to assist in some instances on possible sample collection during this abortion events and

      sending these samples as quickly as possible to the central laboratory in suitable condition for investigation and result of the finding communicated back to the farmers

      Because abortions can be caused by zoonotic pathogens, we did not involve livestock keepers in the collection of samples.  Rather, sample collection was carried out by the research team and livestock field officers who had received appropriate training.  In addition, results were reported back to the livestock keepers within 10 days of the investigation and, where pathogens were detected, more specific advice provided as to management strategies that could minimise further transmission to livestock and people. This is all described in the Methods (lines 181-199).

      540: The livestock owner can be taught how to collect vaginal swab and send samples under suitable condition to the laboratory and the findings reported back to them.

      Please see above response.

      549: Please summerise.

      Line 549-581 succinctly describes the attribution of cases to specific pathogens.  The text given is required for comprehension and any further summarisation could impact understanding. Consequently, we have left the text as it is. 

      584: Please summerise.

      Line 584-626 describes the patterns of livestock abortion in Tanzania.  The text given is required to fully discuss the findings and any further reduction in text could impact understanding. Consequently, we have left the text as it is.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Deletion of the hrp2 and hrp3 loci in P. falciparum poses an immediate public health threat. This manuscript provides a more complete understanding of the dynamic nature with which these deletions are generated. By delving into the likely mechanisms behind their generation, the authors also provide interesting insight into general Plasmodium biology that can inform our broader understanding of the parasite's genomic evolution.

      Strengths:

      The sub-telomeric regions of P. falciparum (where hrp2 and hrp3 are located) are notoriously difficult to study with short-read sequence data. The authors take an appropriate, targeted approach toward studying the loci of interest, which includes read-depth analysis and local haplotype reconstruction. They additionally use both long-read and short-read data to validate their major findings. There is an extensive set of supplementary plots, which helps clarify several aspects of the data.

      Weaknesses:

      In this first version, there are a few factors that hinder a full assessment of the robustness and replicability of the results.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer comment: First, a number of the analyses lack basic details in the methods; for instance, one must visit the authors' personal website to find some of the tools used.

      We have extensively updated the methods to clarify which tools were used and how they were run. All code and results for the analyses have been deposited in Zenodo at https://doi.org/10.5281/zenodo.12167687.

      Reviewer comment: Second, there are several tricky methodological points that are not fully documented. Read depths are treated (and plotted) discretely as 0/1/2 without any discussion of how thresholds were used and determined.

      We have added to the methods section the full details on how read depth was handled, including rounding to the closest 1 normalized coverage for visualizations. To ensure analysis of only highly confident deleted strains, normalized coverage of 0.1 or more was round to 1 instead of 0. Samples were considered for potential genomic deletion if they had zero coverage after rounding from chromosome 8 1,375,557 to 1,387,982 for pfhrp2, chromosome 13 from 2,841,776 to 2,844,785 for pfhrp3, and from chromosome 11 1,991,347 to 2,003,328. These numbers were chosen after visual inspection of samples with any zero coverage within the genomic region of pfhrp2/3.

      Reviewer comment: For read mapping to standard vs hybrid chromosomes, there is no documentation on how assignments were made if partially ambiguous or how final sample calls were determined when some reads were discordant. There is no mention of how missing data were handled. Without this, it is difficult to know when conclusions were based on analyses that were more quantitative (for instance, using pre-determined read thresholds) or more subjective (with patterns being extracted visually).

      We have updated several parts of the methods section to explicitly state what thresholds and analysis pipelines to use, making our documentation clearer. For mapping to the hybrid vs standard chromosomes for the long reads, spanning reads across the duplicated region were required to extend 50bp upstream and downstream of the region. These regions are significantly different between chromosomes 11 and 13, so requiring spanning reads to map to these regions prevented multi-mapping reads. Reads that started within the duplicated region were allowed to map to both the hybrid and standard chromosomes for visualization in Figure 4. Importantly, for both HB3 and SD01, no reads spanned from the duplicated region into chromosome 13, showing a complete lack of reads that contained the portion of chromosome 13 that came after the duplicated region. None of the other isolates had any spanning reads across the hybrid chromosomes. Details on deletion calls were based on initial visualization of pfhrp2/3 and then on read thresholds (see above response for details).

      Reviewer comment: Third, while a new method is employed for local haplotype reconstruction (PathWeaver), the manuscript does not include details on this approach or benchmarking data with which to evaluate its performance and understand any potential artifacts.

      We have added an analysis based on biallelic SNPs to compare to the PathWeaver results, which produced similar results to help validate the PathWeaver results. PathWeaver manuscript is in preparation.

      Reviewer #2 (Public Review):

      This work investigates the mechanisms, patterns, and geographical distribution of pfhrp2 and pfhrp3 deletions in Plasmodium falciparum. Rapid diagnostic tests (RDTs) detect P. falciparum histidine-rich protein 2 (PfHRP2) and its paralog PfHRP3 located in subtelomeric regions. However, laboratory and field isolates with deletions of pfhrp2 and pfhrp3 that can escape diagnosis by RDTs are spreading in some regions of Africa. They find that pfhrp2 deletions are less common and likely occur through chromosomal breakage with subsequent telomeric healing. Pfhrp3 deletions are more common and show three distinct patterns: loss of chromosome 13 from pfhrp3 to the telomere with evidence of telomere healing at breakpoint (Asia; Pattern 13-); duplication of a chromosome 5 segment containing pfhrp1 on chromosome 13 through non-allelic homologous recombination (NAHR) (Asia; Pattern 13-5++); and the most common pattern, duplication of a chromosome 11 segment on chromosome 13 through NAHR (Americas/Africa; Pattern 13-11++). The loss of these genes impacts the sensitivity of RDTs, and knowing these patterns and geographic distribution makes it possible to make better decisions for malaria control.

      Reviewer #3 (Public Review):

      Summary:

      The study provides a detailed analysis of the chromosomal rearrangements related to the deletions of histidine-rich protein 2 (pfhrp2) and pfhrp3 genes in P. falciparum that have clinical significance since malaria rapid diagnostic tests detect these parasite proteins. A large number of publicly available short sequence reads for the whole genome of the parasite were analyzed, and data on coverage and discordant mapping allowed the authors to identify deletions, duplications, and chromosomal rearrangements related to pfhrp3 deletions. Long-read sequences showed support for the presence of a normal chromosome 11 and a hybrid 13-11 chromosome lacking pfhrp3 in some of the pfhrp3-deleted parasites. The findings support that these translocations have repeatedly occurred in natural populations. The authors discuss the implications of these findings and how they do or do not support previous hypotheses on the emergence of these deletions and the possible selective pressures involved.

      Strengths:

      The genomic regions where these genes are located are challenging to study since they are highly repetitive and paralogous and the use of long-read sequencing allowed to span the duplicated regions, giving support to the identification of the hybrid 13-11 chromosome.

      All publicly available whole-genome sequences of the malaria parasite from around the world were analysed which allowed an overview of the worldwide variability, even though this analysis is biased by the availability of sequences, as the authors recognize.

      Despite the reduced sample size, the detailed analysis of haplotypes and identification of the location of breakpoints gives support to a single origin event for the 13-5++ parasites.

      The analysis of haplotype variation across the duplicated chromosome-11 segment identified breakpoints at varied locations that support multiple translocation events in natural populations. The authors suggest these translocations may be occurring at high frequency in meiosis in natural populations but are strongly selected against in most circumstances, which remains to be tested.

      Weaknesses:

      Reviewer comment: Relying on sequence data publicly available, that were collected based on diagnostic test positivity and that are limited by sequencing availability, limits the interpretation of the occurrence and relative frequency of the deletions.

      However, we have uncovered more mechanisms than previously detected for hrp2 (involving MDR1) in SEA and South American parasites are likely detected by microscopy as RDTs were never introduced due to the presence of the deletions.

      Reviewer comment: In the discussion, caution is needed when identifying the least common and most common mechanisms and their geographical associations. The identification of only one type of deletion pattern for Pfhrp2 may be related to these biases.

      We added a section in the Discussion on the limitations of our study, which states the following, “Limitations of this study include the use of publicly available sequencing data that were collected often based on positive rapid diagnostic tests, which limits our interpretation of the occurrence and relative frequency of these deletions. This could introduce regional biases due to different diagnostic methods as well as limit the full range of deletion mechanisms, particularly pfhrp2.”

      Reviewer comment: The specific objectives of the study are not stated clearly, and it is sometimes difficult to know which findings are new to this study. Is it the first study analyzing all the worldwide available sequences? Is it the first one to do long-read sequencing to span the entire duplicated region?

      In the Introduction, we added, “The objectives of this study were to determine the pfhrp3 deletion patterns along with their geographical associations and sequence and assemble the chromosomes containing the deletions using long-read sequencing.”

      We also added in the Discussion, “To the best of our knowledge, no prior studies have performed long-read sequencing to definitively span and assemble the entire segmental duplication involved in the deletions.”

      Reviewer comment: Another aspect that should be explained in the introduction is that there was previous information about the association of the deletions to patterns found in chromosomes 5 and 11. In the short-read sequences results, it is not clear if these chromosomes were analysed because of the associations found in this study (and no associations were found to putative duplications or deletions in other chromosomes), or if they were specifically included in the analysis because of the previous information (and the other chromosomes were not analysed).

      The former is correct. Chromosomes 5 and 11 were analyzed due to the associations found in this study, not from prior information. We have added the following sentence in the Results: “As a result of our short-read analysis demonstrating these three patterns and discordant reads between the chromosomes involved, chromosomes 5, 11, and 13 were further examined. No other chromosomes had associated discordant reads or changes in read coverage. ”

      Reviewer comment: An interesting statement in the discussion is that existing pfhrp3 deletions in a low-transmission environment may provide a genetic background on which less frequent pfhrp2 deletion events can occur. Does it mean that the occurrence of pfhrp3 deletions would favor the pfhrp2 deletion events? How, and is there any evidence for that?

      We should have stated more explicitly that selection would better be able to act on the now doubly deleted parasite versus a parasite with HRP3 still intact and weakly detectable by RDTs.Since fully RDT-negative parasites require a two-hit mechanism, where both pfhrp2 and pfhrp3 need to be deleted, and since there appear to be more mechanisms and drivers for pfhrp3 deletions, this would create a population of parasites with one hit already and would only require the additional hit of pfhrp2 deletion to occur to become RDT negative. So the point in the discussion being made is not that the pfhrp3 deletion would favor pfhrp2 deletion but rather that there is a population circulating with one hit already, which would make it more likely that the less frequent pfhrp2 deletion would result in a dual deleted parasite and therefore an RDT-negative parasite. The discussion has been modified to the following to try to make this point more clear. “In the setting of RDT use in a low-transmission environment, a pfhrp2 deletion occurring in the context of an existing pfhrp3 deletion may be more strongly selected for compared to pfhrp2 deletion occurring alone still detectable by RDTs.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer comment: In the text, clonal propagation is the proposed hypothesis for the presence of near-identical copies of the chromosome 11 duplicated region. Even among the parasites showing variation between chromosomes, Figure 5 shows 3 haplotype groups with multiple sample members, which is also suggestive that these are highly related parasites. In addition to confirming COI status, it would be straightforward to calculate the genome-wide relatedness between/among parasites belonging to the same haplotype group. The assumption is that they are clones or highly related. A different finding would require more thought into potential genomic artifacts driving the pattern.

      Thank you for this helpful suggestion. We confirmed the COI of each sample using THE REAL McCOIL. Six samples were not monoclonal, and we removed these samples from the downstream analysis to remove any contribution of polyclonal samples to the downstream haplotype analysis. Then, by using hmmIBD on whole-genome biallelic SNPs, we determined the whole-genome relatedness between the parasites. The haplotype groups do appear clonal though there appear to be several clonal groups within the larger groups of clusters 01 (n=28) and 03 (n=12) which combined with the variation seen within the 15.2kb region on chromosome 11/13, there appears to be different events that then lead to the same duplicated chromosome 11.

      Reviewer comment: By way of validating the PathWeaver results, it could be useful to use another comparator method on the samples that are COI=1 or 2.

      We have added an analysis based on biallelic SNPs to compare to the PathWeaver results, which produced similar results to help validate the PathWeaver results. We continued to use PathWeaver (Hathaway, in preparation), which is better able to detect variation relative to standard GATK4 analyses due to the refined local alignments from assembled haplotypes.

      Questions regarding Methods:

      Reviewer comment: Were any metrics of genome quality factored into sample selection?

      Yes, samples were removed if there was less than <5x median whole genome coverage. Additionally, several subsets of sWGA samples were removed based on visual inspection. These details have been added to the methods section.

      Reviewer comment: How were polyclonal samples treated to ensure they did not produce analysis artifacts?

      The read-depth analysis required zero coverage across the regions of pfhrp2/pfhrp3, which made it so that most of the samples analyzed were monoclonal (or polyclonal infections of only deleted strains). We have now used THE REAL McCOIL on whole genome SNPs to determine COIs. Six samples were identified as polyclonal, and we removed them for the analysis and updated the manuscript. Their removal did not significantly impact the results or conclusions.

      Reviewer comment: How was local realignment of short-read data performed? Was this step informed by the conserved, non-paralogous genomic regions, or were these only used for downstream variant analysis?

      No local realignment of short-read data was performed. The analysis was either read depth or de novo assembly from reads from specific regions. Regarding the de novo assembly, variant calls were replaced by complete local haplotypes, and a region was typed based on the haplotype called for the region.

      Reviewer comment: For read-depth estimation, what cutoffs were used to classify windows as deletion, WT, or duplication? How much variability was present in the data? The plot legends imply a continuous scale, but in reality, only 3 discrete colors are used (0, 1, 2), so these must represent the data after rounding.

      These have been added to the manuscript. See response to Reviewer #1 questions #2 and #3 above

      Reviewer comment: Similarly, what thresholds were used for mapping the long-reads? In Fig S21, it appears there is a high proportion of discordant reads.

      Long reads were mapped using minimap2 with default settings. For Figure 21, since it is from the mappings to 3D7 chromosome 11 and hybrid 3D7 13-11 chromosome, the genome from the duplicated region from the blue bar underneath is identical, so reads are expected to map to both since the genome regions are identical. The significance of this figure and Figure 4 is the number of long reads that span the whole chr11/13 duplicated region connection the 3D7 chromosome 11 and the hybrid proving that there are reads that start with chromosome 13 sequence and end with chromosome 11 sequence and the lack of reads that span from chromosome 13 into the 3D7 chromosome 13.

      Reviewer comment: The section on the mdr1 breakpoints is too vague.

      We have updated the methods section to be more explicit about how these breakpoints were determined.

      Reviewer comment: I assume that the "Homologous Genomic Structure" section of the Methods is the number analysis that was alluded to in the Results? As with other sections, this needs more information on exact methods and tools

      We have now updated the methods section to include exactly how the nucmer commands were run.

      Smaller comments:

      Reviewer comment: Introduction sub-header: "Precise *pfhrp2* and..."

      We have corrected the sub-header.

      Reviewer comment: Results (p.5) cite Table S4 instead of S3

      We have corrected this to Table S3.

      Reviewer comment: Results (p.5) "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions." This sentence makes it sound like they are 3 mutually exclusive categories. I'd suggest a rewording like "We identified 27 parasites with pfhrp2 deletion and 172 with pfhrp3 deletion. Of these, 21 contained both deletions."

      We have re-worded this sentence to the following: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern.”

      Reviewer comment: The annotations used for the deletions differ between the text and the figures. It would be easier for the reader to harmonize the two if these matched.

      The figures have been updated to reflect the annotations of the text.

      Reviewer comment: Figure numbering does not match the order they are first referenced in the text

      The figure numbers have been updated to match the order in which they are first referenced.

      Reviewer comment: Results (p. 8) there is no Table S4

      This has been changed to Table S3.

      Reviewer comment: Results (p.8) mention a genome-wide number analysis, but I couldn't find these results. The referenced figure is for the duplicated region only.

      We have updated to point to the correct location of the nucmer results by adding a supplemental table with the results and updated to point to the correct figure.

      Reviewer comment: Discussion typo: "Here, we used publicly available short-read and long-read *short-read sequencing data* from..."

      This was not a typo, as we used publicly available PacBio long-read data and then generated new Nanopore long-read data. However, we did clarify this in the sentence.

      Reviewer #2 (Recommendations For The Authors):

      Introduction

      Reviewer comment: "(...) suggesting the genes have important infections in normal infections and their loss is selected against". The word "infections" is in place of "role", etc.

      We have changed the word accordingly.

      Results

      Reviewer comment: In the section "Pfhrp2 and pfhrp3 deletions in the global P. falciparum genomic dataset" it is mentioned the number of parasites with each deletion and where it is more common. "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions." and "Across all regions, pfhrp3 deletions were more common than pfhrp2 deletions; specifically, pfhrp3 deletions and pfhrp2 deletions were present in Africa in 43 and 12, Asia in 53 and 4, and South America in 76 and 11 parasites." It is not clear where the 21 parasites with both pfhrp2 and pfhrp3 deletions are located.

      We have specified the following in the Results section: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern”

      Reviewer comment: "It should be noted that these numbers are not accurate measures of prevalence given that most WGS specimens have been collected based on RDT positivity." This, combined with the fact that subtelomeric regions are difficult to sequence and assembly, means these numbers are underestimated. I believe it should be more stressed in the text.

      We have added the following sentence, “Furthermore, subtelomeric regions are difficult to sequence and assemble, meaning these numbers may be significantly underestimated.”

      Reviewer comment: In the section "Pattern 13-11++ breakpoint occurs in a segmental duplication of ribosomal genes on chromosomes 11 and 13", Figures 2a and 2b should be mentioned in the text instead of just Figure 2.

      We have specified Figures 2a and 2b in the text now.

      Figures and Tables:

      Reviewer comment: Figure 2: I believe the color scale for percentage of identity is unnecessary given that the goal is to show that the paralogs are highly similar, and not that there is a significant difference between 0.99 and 0.998.

      Updated the color scale to represent the number of variants between segments rather than percent identity which ranges between 55-133 so that it represents something more discreet than 0.99 and 0.998.

      Reviewer comment: Adjust Figure 2b and the size of supplementary figure legends.Supplementary Figure 5-15: the legends are hard to read.

      All legends have been adjusted to be much more readable.

      Reviewer #3 (Recommendations For The Authors):

      Some minor suggestions:

      Reviewer comment: The order of the figures should follow the flow of the text, for example, Figure 5 appears in the text between Figure 1 and Figure 2.

      We have reordered the figures according to the order in which they appear in the text.

      Reviewer comment: Page 3 - "deleted parasites" - better to use: pfhrp2/3-deleted parasites.

      We have edited this accordingly.

      Reviewer comment: Define the acronyms the first time they are used, e.g. SEA.

      We have defined the acronyms accordingly.

      Reviewer comment: In the figures where pfmdr1 appears, indicate the correspondence to the full name of the gene that appears in the legend (multidrug resistance protein 1).

      Legends updated.

      Reviewer comment: Page 5 - Table S4 is missing.

      We apologize for our typo. There is no Table S4. We meant to refer to Table S3, which has been updated accordingly.

      Reviewer comment: Page 5 - "We identified 27 parasites with pfhrp2 deletion, 172 with pfhrp3 deletion, and 21 with both pfhrp2 and pfhrp3 deletions" - is it "and 21..." OR "from which, 21..."?

      We have reworded the sentence to the following: “We identified 26 parasites with pfhrp2 deletion and 168 with pfhrp3 deletion. Twenty field samples contained both deletions; 11 were found in Ethiopia, 6 in Peru, and 3 in Brazil, and all had the 13-11++ pfhrp3 deletion pattern.”

      Reviewer comment: Page 5 - "most WGS specimens have been collected based on RDT positivity." - explain better which tests are done - to detect pfhrp2, pfhrp3 or both?

      Co-occurrence is not detected?

      We used all publicly available WGS data that spanned over 30 studies, and the exact details of what RDTs were used are not readily available to fully answer this question. Though the exact details of RDTs are not known, this does not affect the deletion patterns found in the genomic data but does limit the ability to comment on how this affects prevalence. We have updated the manuscript to the following to be more explicit that we don’t have the full details: “It should be noted that these numbers are not accurate measures of prevalence, given that the publicly available WGS specimens utilized in this analysis come from locations and time periods that commonly used RDT positivity for collection”

      Reviewer comment: Supplementary Figure 1 - Legend for "Pattern" - what is the white?

      The “Pattern” refers to pfhrp3 deletion pattern with “white” being no pfhrp3 deletion. The annotation title has been changed to “pfhrp3- Pattern” to make this more clear and added to the text of the legend the following:”Of the 6 parasites without HRP3 deletion (marked as white in pfhrp3- Pattern column for having no pfhrp3 deletion),...”

      Reviewer comment: Supplementary Figure 8 - explain the haplotype rank. How was it obtained?

      The haplotype rank is based on the prevalence of the haplotype. To clarify this better the following has been added to the caption “Each column contains the haplotypes for that genomic region colored by the haplotype prevalence rank (more prevalent have a lower rank number, with most prevalent having rank 1) at that window/column. Colors are by frequency rank of the haplotypes (most prevalent haplotypes have rank 1 and colored red, 2nd most prevalent haplotypes are rank 2 and colored orange, and so forth. Shared colors between columns do not mean they are the same haplotype. If the column is black, there is no variation at that genomic window.”

      Reviewer comment: Figure 1 - Pattern in legend appears 11++13- but in text it is always referenced as 13-11++

      Figure legend has been updated to reflect the annotation within the text

      Reviewer comment: Page 6 - pattern 13- is which one(s) in Figure 1?

      This refers to the 13- with TARE1 sequence detected, the text has been updated to “(pattern 13-TARE1)” and the legend of Figure 1 has been updated so these statements match more closely.

      Reviewer comment: Page 7 - states "The 21 parasites with pattern 13-" and refers to Supplementary Figure 3 which presents "50 parasites with deletion pattern 13-". I believe this is pattern 13- unassociated with other rearrangements but it should be made clear in the text and legend of the supplementary figure.

      Thank you, you are correct. The manuscript has been updated in two locations for better clarity. The text has been updated to be “The 20 parasites with pattern 13-TARE1 without associated other chromosome rearrangements had deletions of the core genome averaging 19kb (range: 11-31kb). Of these 13-TARE1 deletions, 19 out of 20 had detectable TARE1 (pattern 13-TARE) adjacent to the breakpoint, consistent with telomere healing.” The Supplemental Figure 3 legend has been updated to “for the 48 parasites with pfhrp3 deletions not associated with pattern 13-11++”

      Reviewer comment: Supplementary figure 25 - "regions containing the pfhrp genes (lighter blue bars below chromosomes 11 and 13)" - the light blue bars are shown below chromosome 8 and 13; what is the difference between yellow and pink bars (telomere associates repetitive elements in the truncated legend)?

      The yellow bars are associated with the telomere-associated repetitive element 3 and the pink bars are telomere-associated repetitive element 1. To add clarity the legend has been updated to be “The yellow (TARE3) and pink (TARE1) bars on the bottom of the chromosomes represent the telomere-associated repetitive elements found at the end of chromosomes.”

      Reviewer comment: It would be helpful to have a positioning scale in the figures.

      Most plots have y-axis and x-axis with the genomic positioning labeled which can serve as a positioning scale so we opted not to add more to the figures to keep them less crowded. Other plots have regions plotted in genomic order but are all relatively positioned which prevents the usage of a positioning scale, we tried to clarify this by adding more details to the captions of these figures.

      Reviewer comment: Legend of Figure 6 - The last paragraph seems to be out of place

      We have deleted the last sentence in the legend of Figure 6 accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      I understand that the only spermatids observed in cKO testes are coming from cells that escaped the Cre system. However, I do think that the authors could provide sperm counts data also showing decreased sperm counts in the mutant, to make their claim stronger. This is a very common fertility assessment.

      All round spermatids isolated from Arid1acKO testes appeared only to express the normal transcript associated with the floxed allele (Fig. S4A).

      [New Data - Lines 154-159] Our evaluation of the first round of spermatid development based on DNA content (1C, 2C, and 4C), revealed a significantly reduced abundance of round spermatids (1C) in mutant testes compared to wild-type testes. This finding, obtained through flow cytometry, supports the observed meiotic block at the pachytene stage (new Fig. S5A-B).

      Reviewer 3:

      Lines 154-5: Currently read 'inefficient Stra8-cre inefficiency'. Should read 'inefficient Stra8-cre activity.' I see that this was noted in the first round of review but the original wording has persisted.

      The nucleolin antibody used should be listed in Supplementary table 3.

      'inefficient Stra8-cre inefficiency' now reads “inefficient Stra8-Cre activity”  [Line 158]

      Nucleolin antibody is now listed in Supplementary Table 3

    1. Author response:

      Response to Public Comment of Reviewer 1: We thank the Reviewer for the positive assessment of the manuscript. We also are grateful to the Reviewer for pointing out that providing alternatives to our model is a strength, and not a weakness, potentially stimulating future experiments that could falsify our model.  

      Response to Public Comment of Reviewer 2: We thank the Reviewer for the positive assessment of the manuscript. 

      In our manuscript, we already provide some references to evidence supporting reversible β-cell inactivation in a high-glucose environment. In the revision, we will expand this discussion, emphasize it, and add additional references that we have discovered recently. 

      In the revision, we will additionally expand our discussion of what is and is not known about the features of β-cell dysfunction in KPD, the relevant timescales, and so on. We will expand on how little is known about the possible pre-KPD state: individuals with KPD usually show up in a hospital with a new onset of diabetes, and often have had little access to medical care prior to this presentation. Thus, prior medical records are often unavailable. We hope this theoretical work will help justify appropriate future studies of the clinical history of KPD patients. 

      In the revision of the manuscript, we plan to briefly discuss how our model might, indeed, account for the honeymoon phase of type 1 diabetes, as well as for some phenomenology of gestational diabetes, and progression of type 2 diabetes in youth. In other words, the model developed for explaining KPD is potentially much broader, explaining many other phenomena. However, we prefer to leave the detailed modeling of these conditions, and comparisons to alternate hypotheses of their pathogenesis, to a future publication.

    1. Author response:

      We’d like to thank the reviewers for their fair, thoughtful, and critical review of our manuscript.

      We acknowledge that the small number of specimens limits the impact of our findings. While we are unable to expand the study, we are optimistic that more cases with insulitis will be made available for research and spatial technologies will become more cost-effective over time. We hope that the design and analyses in our study are useful to future efforts and that our findings can be validated and revised.

      We intend to revise the manuscript to address all other points raised by reviewers. These include a) adding HLA genotype information for each patient, b) analyzing how key immune signatures relate to the clinical variables, diabetes duration and age of onset, and c) measuring the relationship between IDO+ islets and HLA-ABC expression. We will also revise the text and figures for clarity in specific places and discuss important considerations including stem cell memory T cells and the potential impact of prolonged stays in the ICU.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment  

      This manuscript compiles existing algorithms into an open-source software package that enables realtime motor unit decomposition from muscle activity collected via grids of surface electrodes and indwelling electrode arrays. The software package is valuable given that many motor neuroscience labs are using such algorithms and that there exist a host of potential real-time applications for such data. Validation of the software package is generally solid but incomplete in some important areas: the primary data is narrow in scope and only from male participants, and there is a lack of ground truth tests on synthetic data. The impact of the software package could be strengthened by making it less tied to specific electrode hardware and by expanding it to easily permit offline analysis.

      We thank the reviewers and editors for their comments and suggestions after reading the initial version of our manuscript. In this second iteration, we have performed a validation of the algorithm using synthetic EMG signals. We have also added experimental data collected in female participants. Finally, the new version of I-Spin is compatible with the Open Ephys GUI that can interface with devices such as the Open Ephys and Intan acquisition boards. Another version has been developed for interfacing with the devices provided by the TMSi company (https://info.tmsi.com/blog/ispin-saga-real-timemotor-unit-decomposition-tool). We believe that such changes will make I-Spin more accessible for a broad range of experimental setups and research teams. Please find below the specific answers to the reviewers’ comments.

      Reviewer #1 (Public Review):  

      Many labs worldwide now use the blind source deconvolution technique to identify the firing patterns of multiple motor units simultaneously in human subjects. This technique has had a truly transformative effect on our understanding of the structure of motor output in both normal subjects and, increasingly, in persons with neurological disorders. The key advance presented here is that the software provides real-time identification of these firing patterns. The main strengths are the clarity of the presentation and the great potential that real-time decoding will provide. Figures are especially effective and statistical analyses are excellent. 

      We thank the reviewer for this positive appreciation of our work. 

      The main limitation of the work is that only male subjects were included in the validation of the software. The reason given - that yield of number of motor units identified is generally larger in males than females - is reasonable in the sense that this is the first systematic test of this real-time approach. At a minimum, however, the authors should clearly commit to future work with female subjects and emphasize the importance of considering sex differences. 

      As emphasised by the reviewer, the number of identified motor units is typically higher in males than females when using surface EMG (Taylor et al., 2022), which is the current main limitation of the implementation of offline EMG decomposition technique in a broad and representative sample of research participants. These differences between biological sex are less present when using intramuscular EMG, as the signals are less affected by the filtering effect of the volume conductor separating the motor units from the recording electrodes. Besides the different yields expected between males and females, we do not expect differences in terms of the accuracy of the motor unit identification algorithm, which is the main outcome of this paper. 

      Nevertheless, we acknowledge the importance to understand the reasons for this difference, and the imperative to refine algorithms and/or surface electrode design to mitigate this major limitation with surface EMG. 

      To support this point, the discussion has been updated (P20; L480):

      ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’

      Finally, we have completed new experiments including males and females in this new iteration (P.12; L.295):

      ‘Application of motor unit filters in experimental data

      We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC). 

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      A second weakness is that the Introduction does a poor job of establishing the potential importance of the real-time approach. 

      The introduction has been modified to highlight the importance of identifying the spiking activity of motor units in real time. Specifically, the first paragraph has been rewritten to read (P3; L67): 

      ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications’

      Reviewer #2 (Public Review):  

      Rossato et al present I-spin live, a software package to perform real-time blind-source separation-based sorting of motor unit activity. The core contribution of this manuscript is the development and validation of a software package to perform motor unit sorting, apply the resulting motor unit filters in real-time during muscle contractions, and provide real-time visual feedback of the motor unit activity. I have a few concerns with the work as presented: 

      I found it challenging to specifically understand the technical contributions of this manuscript. The authors do not appear to be claiming anything novel algorithmically (with respect to spike sorting) or methodologically (with respect to manual editing of spikes before the use of the algorithms in real-time). My takeaway is that the key contributions are C1) development of an open-source implementation of the Negro algorithm, C2) validating it for real-time application (evaluating its sorting efficacy, and closed-loop performance, etc), and developing a software package to run in closed-loop with visual feedback. I will comment on each of these items separately below. It would be great if the authors could more explicitly lay out the key contributions of this manuscript in the text. 

      The main objective of this work was to provide an open-source implementation of the real-time identification of motor units together with a user interface that allow researchers to easily process the data and display the firing activity of motor unit in the form of several visual feedback. We have explicitly laid out these key contributions in the introduction: “Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’

      Related to the above, much of the validation of the algorithms in this manuscript has a "trust me" feel. The authors note that the Negro et al. algorithm has already been validated, so very few details or presentations of primary data showing the algorithm's performance are shown. Similarly, the efficacy of the decomposition approach is evaluated using manual editing of the sorting output as a reference, which is a subjective process, and users would greatly benefit from explicit guidance. There are very few details of manual editing shown in this manuscript (I believe the authors reference the Hug et al. 2021 paper for these details), and little discussion of the core challenges and variability of that process, even though it seems to be a critical step in the proposed workflow. So this is very hard to evaluate and would be challenging for readers to replicate. 

      To address the reviewer’s comment, we added a validation step using synthetic EMG data (P.10; L.235). 

      ‘Validation of the algorithm

      We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units. 

      Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.

      Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and

      80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’

      In addition, we added a new paragraph in the Method section to describe the manual editing process (P.26; L.658). 

      ‘There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55).  Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%.  All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).’

      I found the User Guide in the Github package to be easy to follow. Importantly, it seems heavily tied to the specific hardware (Quattrocento). I understand it may be difficult to make the full software package work with different hardware, but it seems important to at least make an offline analysis of recorded data possible for this package to be useful more broadly. 

      The software was updated to perform real-time decomposition with signals recorded from the Quattrocento and the Open Ephys GUI, which is compatible with Intan and Open Ephys acquisition boards. I-Spin has also been adapted by TMSi to perform real-time decomposition with their devices (https://info.tmsi.com/blog/ispin-saga-real-time-motor-unit-decomposition-tool). 

      Moreover, the manual editing panel of the software can now import any files from these devices and allow users to reformat data in mat files to perform offline analyses.

      While this may be a powerful platform, it is also very possible that without more details and careful guidance for users on potential pitfalls, many non-experts in sorting could use this as a platform for somewhat sloppy science. 

      We fully agree with the reviewer that real-time EMG decomposition - with a different approach here than spike sorting - may yield unreliable results if not applied properly. As outlined in the introduction of our initial manuscript, assessing the accuracy and limitations of real-time decomposition was a primary motivation for this study. Specifically, we compared accuracy between contraction intensities, muscles, and electrode types (see Results section). 

      We also demonstrated that manual editing of the decomposition outputs should be done after the training phase to improve the motor unit filters, thereby improving the accuracy of real-time decomposition. We also outlined the importance to never blindly accept the result of the decomposition without visual inspection and manual editing. (P8; L214)

      ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’

      We have also included more detailed information about the manual editing process (see above).

      The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data. 

      This link to the data on figshare was added in the GitHub.

      Given the centrality of the real-time visual feedback to their system, the authors should show some examples of the actual display etc. so readers can understand what the system in action actually looks like (I believe there is no presentation of the actual system in the manuscript, just in the User Guide). Similarly, it would be helpful to have a schematic figure outlining the full workflow that a user goes through when using this system. 

      A figure of the workflow is present in the user manual. Additionally, we now display traces of visual feedback in figure 5 and we added videos of the software during each of the visual feedback in supplemental materials. 

      The authors note all data was collected with male subjects because more motor units can be decomposed from male subjects relative to females. But what is the long-term outlook for the field if studies avoid female subjects because their motor units may be harder to decompose? This should at least be discussed - it is an important challenge for the field to solve, and it is unacceptable if new methods just avoid this problem and are only tested on male subjects. 

      This point was rightly raised by each of the three reviewers. To solve this, we added data collected on four females, and discussed future developments to make the decomposition of surface EMG equally performant for everyone (P.20; L.480).

      ‘An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.’

      Specific comments on the core contributions of this paper:  

      C1. Development of an open-source implementation of the Negro algorithm 

      This seems an important contribution and useful for the community. There are very few figures showing any primary data, the efficacy of sorting, raw traces showing the waveforms that are identified, cluster shapes, etc. I realize the high-level algorithm has been outlined elsewhere, but the implementation in this package, and its efficacy, is a core component of the system and the claims being made in this paper. Much more presentation of data is needed to evaluate this. 

      It is worth noting that the approach used here is based on blind source separation, which is different than spike-sorting algorithms as it relies on the statistical properties of the spike trains (their sparseness) rather than the profiles of the action potentials. In short, we optimise separation vectors that are applied onto the whitened signal to generate a sparse motor unit pulse train. The discharge times are then directly estimated from the high peaks of this pulse train (Section 1 of the results; overview of the approach).

      We are thus displaying motor unit pulse trains in three figures with the automatically detected discharge times, with cases of successful separation in figure 1 and merged motor units in the same pulse train in figures 3 and 4.

      We also validated the algorithm with synthetic EMG to provide objective data on the accuracy of the algorithm. These results are shown in the section ‘Validation of the algorithm’ and displayed in figure 3.

      Similarly, more information on the offline manual editing process (e.g. showing before/after examples with primary data) would be important to gain confidence in the method. The current paper shows application to both surface EMG and intramuscular EMG, but I could not find IM EMG examples in the Hug paper (apologies if I missed them). Surface and IM data are very, very different, so one would imagine the considerations when working with them should also be different. 

      In response to another comment from the reviewer, we have included more detailed information about the manual editing process (see above). As stated above, the decomposition approach used in our software differs from a spike sorting approach. Therefore, even though intramuscular and surface EMG signals are different, the decomposition and manual editing process is the same. 

      All descriptions of math/algorithms are presented in text, without any actual math, variable definitions, etc. This presentation makes it difficult to understand what is done. I would strongly recommend writing out equations and defining variables where possible. 

      More details on how the level of sparseness is controlled during optimization would be helpful.

      And how this sparseness penalty is weighed against other optimization costs. 

      A mathematical description of the model has been added in the methods (P25; L620)

      ‘Mathematical modelling of the recorded spike trains.

      The spike train of a motor neuron recorded over time 𝑡 ∈ [0, 𝑇] can be described as the result of a convolution between a delta function (d) representing the firing times (j), and finite impulse responses (h) representing action potentials of duration L: . In practice, the nature of h and the duration L depend on the type of recordings. For electrophysiological measurements, h characterises the local electrical field generated by the spike and conducted through the surrounding tissues. 

      As the recorded volume of tissue comprises many active neurons, each recording can be considered as a convolutive mixture of multiple sources, and the previous equation can be expressed in the form of a matrix to also consider all the electrodes of an array: given , where is a matrix of m electrophysiological signals, is a matrix of n motor neurons’ spike trains, and 𝐻(𝑙) is a m by n matrix containing the lth sample of action potentials from n neurons and m signals. In this situation, we can reformulate the model as an instantaneous mixture of an extended set of sources, that is, the motor neurons’ spike trains and their delayed versions. This allows us to simply write the previous equation as a multiplication of matrices, in which each source is delayed L times, L being the duration of the impulse response h. This model can be inverted for neural decoding with source-separation approaches.’

      The rest of the decomposition approach was rewritten to make it clearer for the reader:

      ‘The monopolar EMG signals collected during the baseline contractions were extended with an extension factor of   1000/m (21), where m is the number of channels free of any noise or artifact. The signals were then demeaned and whitened. A contrast function was iteratively applied to estimate a separation vector that maximised the level of sparseness of the motor unit pulse train (Figure 1B). This loop stopped when the variation of the separation vector between two successive iterations reaches a predefined lower bound. After the application of a peak detection algorithm, the motor unit pulse train contained high peaks (i.e., the spikes from the identified motor unit) and low peaks from other motor units and noise. High peaks were separated from low peaks and noise using K-mean classification with two classes (Figure 1B). The peaks from the class with the highest centroid were considered as spikes of the identified motor unit. A second algorithm refined the estimation of the discharge times by iteratively recalculating the separation vector and repeating the steps with peak detection and K-mean classification until the coefficient of variation of the inter-spike intervals was minimised. The accuracy of each estimated spike train was assessed by computing the silhouette (SIL) value between the two classes of peaks identified with K-mean classification (24). When the SIL exceeded a predetermined threshold, the motor unit filter was saved for the real-time decomposition, together with the centroids of the ‘spikes’ and ‘noise’ classes (Figure 2A).’

      Overall the paper is not very rigorous about the accuracy of motor unit identification. For example, the authors note that SIL of 0.9 is generally used for offline evaluation (why is this acceptable?), but it was lowered to 0.8 for particular muscles in this study. But overall, it is unclear how sorting accuracy/inaccuracy affects performance in the target applications of this work. 

      In the section mentioned by the reviewer, we aimed to show how this metric can help to automatically select motor units that are likely to have a higher accuracy of spike detections as the peaks of their pulse train are easily separable from the noise. 

      We reformulated the conclusion of this section to make it clearer (P8; L214):

      ‘These results show how manual editing can improve the accuracy of spike detection from the motor unit pulse trains. Moreover, a SIL value around 0.9 can be used as a threshold to automatically remove the motor unit pulse trains with a poor quality a priori. Thus, these two steps were performed in the all the subsequent analyses. Importantly, it is worth noting that the motor unit pulse train must always be visually inspected after the session to check for errors of the automatic identification of discharge times.’

      C2. For real-time experiments, variability/jitter is important to characterize. Fig. 4 seems to be presenting mean computational times, etc, but no presentation of variability is shown. It would be helpful to depict data distributions somehow, rather than just mean values. 

      The variability in computational time was added to this section (P.28; L.730):

      ‘The standard deviation of computational times across windows reached 5.4 ± 4.0 ms (raster plot), 4.0 ± 3.2 ms (smoothed firing rate), and 2.8 ± 2.5 ms (quadrant)’

      The computational time minimally varied between the successive windows, except when the labels of the x-axis were updated in real-time with scrolling feedback. It was overall always well below the duration of the window.

      Author response image 1.

      Computational time for each iteration of the algorithm in one participant. The top panels display the continuous computation time through the recording, while the bottom panels display the distribution of computational times. The dash line represents the duration of a window of EMG signals.

      There is some description about the difference between units identified during baseline contractions, and how they might be misidentified during online contractions ("Accuracy of the real-time identification..."). This should be described in more detail. 

      We added an additional section in the results to clarify the concept of motor unit filters, and the reapplication of motor unit filters on signals in real-time. We highlighted how each motor unit must have a unique spatio-temporal signature to be accurately identified by our algorithms, in opposition to merged motor units sharing the same spatio-temporal features. This section shows how motor units accurately identified during baseline contractions can be misidentified during online contractions (P12; L295).

      ‘Application of motor unit filters in experimental data

      We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).  

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units.

      When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      Fig. 6: Given that a key challenge in sorting should be that collisions occur during large contractions, much more primary data should be presented/visualized to show how the accuracy of sorting changes during larger contractions in online experiments. 

      As indicated above, the decomposition approach implemented in our software is not based on spikesorting, so it does not require to separate overlapping profiles of action potentials (see Methods). 

      Fig.7: In presenting the accuracy of biofeedback, it is very hard to gain any intuition for performance by just looking at RMSE values. Showing the online decoded and edited trajectories would help readers understand the magnitude of errors. 

      We updated the figure to display examples of visual feedback before and after manual editing.

      Reviewer #3 (Public Review):  

      In this manuscript, Rossato and colleagues present a method for real-time decoding of EMG into putative single motor units. Their manuscript details a variety of decision points in their code and data collection pipeline that led to a final result of recording on the order of ~10 putative motor units per muscle in human males. Overall, the manuscript is highly restricted in its potential utility but may be of interest to aficionados. For those outside the field of human or nonhuman primate EMG, these methods will be of limited interest.

      We thank the reviewer for his/her throughout evaluation of our manuscript. We recognise that this tool/resource will immediately benefit groups working with humans or nonhuman primate models. However, the recent development of intramuscular thin films with various designs adapted to rodents and smaller animals could expand the range of future users (Chung et al., 2023, Elife).  Nonetheless, decoding motor units in humans could be useful for many fields, e.g. in the domains of movement restoration and augmentation. The following paragraph has been added in the introduction section to highlight the importance of real-time decoding of motor unit activity (P3; L67):  

      ‘The activity of motor neuron – in the form of spike trains – represents the neural code of movement to muscles. Decoding this firing activity in real-time during various behaviours can thus substantially enhance our understanding of movement control (2-5). Real-time decoding is also essential for interfacing with external devices (6) or virtual limbs (7) when activity is present at the periphery of the nervous system. For example, individuals with a spinal cord injury can control a virtual hand with the residual firing activity of the motor units in their forearm (7). Furthermore, sampling the activity of motor units receiving a substantial portion of independent synaptic inputs may pave the way for movement augmentation – specifically, extending a person’s movement repertoire through the increase of controllable degrees of freedom (8). In this way, Formento et al. (3) showed that individuals can intuitively learn to independently control motor units within the same muscle using visual cues. Having access to open-source tools that perform the real-time decoding of motor units would allow an increasing number of researchers to improve and expand the range of these applications.’

      Notes 

      (1) Artificial data should be used with this method to provide ground truth performance evaluations. Without it, the study assumptions are unchallenged and could be seriously flawed.

      A new section on the validation of the algorithm has been added. We verified the accuracy of the algorithm by comparing the series of identified discharge times with the ground truth, i.e., the simulated discharge times. (P10; L235)

      ‘Validation of the algorithm

      We first validated the accuracy of the algorithm using synthetic EMG signals generated with an anatomical model entailing a cylindrical muscle volume with parallel fibres [see Farina et al. (29), Konstantin et al. (36) for a full description of the model)]. In this model, subcutaneous and skin layers separate the muscle from a grid of 65 surface electrodes (5 columns, 13 rows), while an intramuscular array of electrodes is directly inserted in the muscle under the grid with an angle of 30 degrees. 150 motor units were distributed within the cross section of the muscle. Recruitment thresholds, firing rate/excitatory drive relations, and twitch parameters were assigned to each motor unit using the same procedure as Fuglevand et al. (37). During each simulation, a proportional-integral-derivative controller adjusted the level of excitatory drive to minimise the error between a predefined target of force and the force generated by the active motor units. 

      Figure 3A displays the raster plots of the active motor units during simulated trapezoidal isometric contractions with plateaus of force set at 10%, 20%, and 30% MVC. A sinusoidal isometric contraction ranging between 15 and 25% MVC at a frequency of 0.5 Hz was also simulated. We identified on average 10 ± 1 and 12 ± 2 motor units with surface and intramuscular arrays, respectively (Figure 3A). During the offline decomposition, the rate of agreement between the identified discharge times and the ground truth, that is, the simulated discharge times, reached 100.0 ± 0.0% for intramuscular EMG signals and 99.2 ± 1.8% for surface EMG signals (Figure 3B). The offline estimation of motor unit filters was therefore highly accurate, independently of the level of force or the pattern of the isometric contraction.

      Motor unit filters estimated during a baseline contraction at 20% MVC were then applied in real-time on signals simulated during a contraction with a different pattern (sinusoidal; Figure 3C). The rates of agreement between the online decomposition and the ground truth reached 96.3 ± 4.6% and 98.4 ± 2.3% for surface and intramuscular EMG signals, respectively. Finally, we tested whether the accuracy of the online decomposition changed when the level of force decreased or increased by 10% MVC when compared to the calibration performed at 20% MVC (Figure 3D). The rate of agreement remained high when applying the motor unit filters on signals recorded at 10% MVC: 99.8 ± 0.2% (surface EMG) and 99.5 ± 0.3% (intramuscular EMG). It is worth noting that only 3 out of 10 motor units identified from surface EMG at 20% MVC were active at 10% MVC, while 8 out of 12 motor units identified from intramuscular EMG were active at 10 % MVC. This shows how the decomposition of EMG signals tends to identify the last recruited motor units, which often innervate a larger number of fibres than the early recruited motor units (38). On the contrary, the application of motor unit filters on signals simulated at 30% MVC led to a decrease in the rate of agreement, with values of 88.6 ± 14.0% (surface EMG) and 80.3 ± 19.2% (intramuscular EMG). This decrease in accuracy did not impact all the motor units, with 5 motor units keeping a rate of agreement above 95% in both signals. For the other motor units, we observed a decrease in precision, which estimates the ratio of true discharge times over the total number of identified discharge times. This was caused by the recruitment of two motor units sharing a similar space within the muscle, which resulted in a merge in the same pulse train (Figure 3D).’

      (2) From the point of view of a motor control neuroscientist studying movement in animals other than humans or non-human primates, the title was misleadingly hopeful. The use case presented in this study requires human participants to perform isometric contractions, facilitating spatially redundant recordings across the muscle for the algorithm to work. It is unclear whether these methods will be of utility to use cases under more physiological conditions (ie. dynamic movement). 

      We modified the title to read: “I-Spin live: An open-source software based on blind-source separation for real-time decoding of motor unit activity in humans”. 

      (3) The text states that "EMG signals recorded with an array of electrodes can be considered and instantaneous mixture of the original motor unit spike trains and their delayed versions." While this may be a true statement, it is not a complete statement, since motor units at distal sites may be shared, not shared, or novel. It was not clear to me whether the diversity of these scenarios would affect the performance of the software or introduce artifacts. In other words, if at site 1 you can pick up the bulk signal of units 1,2,3,4; at site two you pick up the signals of units 2,3,4,5 and site three you pick up the signal of units 3,4,5,6, what does the algorithm assume is happening and what does it report and why?

      This section has been rewritten to clarify this point. The EMG signal represents indeed the sum of the active motor units within the recorded muscle volume. Put in other words, it is possible that deep motor units or motor units with innervated fibres far away from the grid were not in this recorded muscle volume, and thus non-identifiable. Another necessary condition to ensure the identifiability of the motor unit is its unique spatio-temporal signature within the signal. It means that two motor units close to each other within the muscle volume will be merged by the model. This point was clarified in the results during the validation and the application of filters on experimental data.

      (P5; L115)

      ‘An EMG signal represents the sum of trains of action potentials from all the active motor units within the recorded muscle volume (Figure 1A). During stationary conditions, e.g., isometric contractions, the train of motor unit action potentials can be modelled as the convolution of series of discrete delta functions, representing the discharge times, and motor unit action potentials that have a consistent shape across time. When EMG signals are recorded with an array of electrodes, the shape of the recorded potential of each motor unit differs across electrodes. This is due to 1) the varying conduction velocity of action potentials among the muscle fibres, and 2) the location/depth of the muscle fibres that belong to each motor unit relatively to the electrodes, which impact the low pass filtering effect of the tissue on the recorded potential. Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29). The uniqueness of motor unit action potential profiles is necessary for the blind source separation to accurately estimate the motor unit discharge times. Conversely, the spike trains of two motor units with similar action potential profiles will be merged by the model.

      Our software uses a fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a separation vector (i.e., the motor unit filter) for each motor unit [Figure 1B; (24-26)]. (24-26)]. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and a smaller number of samples significantly greater than zero (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units). During the decomposition in real-time, short segments of EMG signals are projected on the saved separation vectors, and the peaks are classified as discharge times if they are closer to the centroid of the class ‘spikes’ than to the centroid of the class ‘noise’ (Figure 1C). The algorithm used to identify motor units discharge activity is based on that proposed by Negro et al. (24) and Barsakcioglu et al. (26).’

      (4) I could not fully appreciate the performance gap solved by the current methods. What was not achievable before that is now achievable? The 125 ms speed of deconvolution? What was achievable before? Intro text around ln 85 states that 'most of the current implementations of this approach rely on offline processing, which restricts its ability to be used..." but no reference is provided here about what the non 'most' of can achieve. 

      (8) The authors might try to add text to be more circumspect about the contributions of this method. I would recommend emphasizing the conceptual advances over the specifics of the performance of the algorithm since processor speed and implementation of the ideas in a faster environment (Matlab can be slow) will change those outcomes in a trivial way. Yet, much of the results section is very focused on these metrics. 

      The main contribution of this work submitted to the section ‘Tools and Resource’ of Elife is to provide a user interface that enables researchers to decompose EMG signals recorded with multichannel systems into motor unit activities, to perform this process in real-time, and to translate it into visual feedback. The user interface is fully open source and does not require coding experience. If necessary, the users can inspect the commented code and even modify it for their own experimental setup. The toolbox is now compatible with various acquisition boards, which can expand its use to novel surface and intramuscular arrays of electrodes.

      (5) Relatedly, it would have been nice to see a proof of concept using real-time feedback for some kind of biofeedback signal. If that is the objective here, why not show us this? I found the actual readout metrics of performance rather esoteric. They may be of interest to very close experts so I will defer to them for input.

      We agree with the reviewer. Videos were added to the supplemental materials to show the different forms of feedback, together with a case scenario where the participant try to separate the activity of two motor units from the same muscle.

      (6) I was disappointed to see that only male participants are used because of some vague statement that 'it is widely known in the field' that more motor units can be resolved in males, without thorough referencing. It seems that the objective of the algorithm is the speed of analysis, not the number of units, which makes the elimination of female participants not justified. 

      The reviewer is right and that was corrected in the new version of the manuscript. We first performed additional experiments in both males and females focused on the accuracy of the approach, and further discussed the differences in yield between men and women in the discussion together with research perspectives to solve this issue.

      Results (P12; L296):

      ‘We then asked eight participants (4 males and 4 females) to perform trapezoidal isometric contractions with plateaus of force set at 10% and 20% MVC during which surface EMG signals were recorded from the TA with 256 electrodes separated by 4 mm. The aim of this experiment was to confirm the results of the simulation; specifically, to test the accuracy of the online decomposition when the level of force was below, equal to, or above the level of force produced during the baseline contraction used to estimate the motor unit filters (Figure 4). We assessed the accuracy of the motor unit spike trains identified in real time using their manually edited version as reference. 144 motor units were identified at both 10 and 20% MVC. When the test signals were recorded at the same level of force as the baseline contraction, we obtained rates of agreement of 95.6 ± 6.8% (10% MVC) and 93.9 ± 5.9% (20% MVC). The sensitivity reached 95.9 ± 6.7% (10% MVC) and 94.4 ± 5.6% (20% MVC), and the precision reached 99.6 ± 1.3% (10% MVC) and 99.4 ± 1.9% (20% MVC).  

      When the filters identified at 20% MVC were applied on signals recorded at a lower level of force (10% MVC), the rates of agreement decreased to 87.9 ± 16.2%. The sensitivity also decreased to 88.0 ± 16.2%, but the precision remained high (99.4 ± 4.3). Thus, the decrease in accuracy was mostly caused by missed discharge times rather than the false identification of artifacts or spikes from other motor units. When the filters identified at 10% MVC were applied to signals recorded at a higher level of force, the rates of agreement decreased to 83.3 ± 13.5%. The sensitivity decreased to 90.7 ± 8.1%, and the precision also decreased to 90.9 ± 12.6%. This result confirms what was observed with synthetic EMG, that is motor units recruited between 10 and 20% MVC can substantially disrupt the accuracy of the decomposition in real-time, as highlighted in Figure 4 (lower panel). Importantly, this situation does not happen for all the motor units, as suggested by the distribution of the values in Figure 4.’

      Discussion (P20; L480):

      “An important consideration regarding the implementation of offline or real-time surface EMG decomposition is the difference between individuals, with an overall lower yield in number of identified motor units in females (here: 9 ± 12) than in males (here: 30 ± 13). Typically, the number of identified motor units from surface EMG is twice as low in females than males (32, 49, 50). The cause for this difference remains unclear. It may be related to variations in properties of the tissues separating the motor units from the recording electrodes, or to differences in the morphological and physiological properties of muscle fibres, as well as to the innervation ratios of motor units. These sex-related differences have so far only been supported by data extracted from animal experiments (51). However, the recent developments of simulation frameworks capable of generating highly realistic EMG signals for anthropometrically diverse populations may help understanding the impact of sex-related differences in humans (52). Specifically, these simulations can account for diverse anatomical (e.g. muscle volume and architecture, thickness of subcutaneous tissues) and physiological characteristics (e.g. innervation ratio, number of motor units, fibre cross sectional area, fibre conduction velocity, contribution of rate coding vs. spatial recruitment). Generating such dataset could help identifying the primary factors affecting EMG decomposition performance, ultimately enabling the refinement of algorithms and/or surface electrode design.”

      (7) Human curation is often used in spike sorting, but the description of criteria used in this step or how the human curation choices are documented is missing. 

      To address the reviewer’s comment, we added a new paragraph in the Method section to describe the manual editing process: (P26; L657)

      “There is a consensus among experts that automatic decomposition should be followed by visual inspection and manual editing (55).  Manual editing involves the following steps: i) removing spikes that result in erroneous firing rates (outliers), ii) adding discharge times thar are clearly distinguishable from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the EMG signals (either a selected window or the entire signal), and v) repeating this procedure until no outliers are present and all clearly distinguishable spikes have been selected. Importantly, the manual editing of potentially missed or falsely identified discharge times should not be accepted before the application of the updated motor unit separation vector, thereby generating a new pulse train. Manual edits should be accepted only if the silhouette value improves following this operation or remains well above the preestablished threshold. A more extensive description of the manual editing of motor unit pulse trains can be found in (32). Even though some of the aforementioned steps involve subjective decision-making, evidence suggests that manual editing after EMG decomposition with blind source separation approaches remains highly reliable across operators (33). Specifically, the median rates of agreement calculated for 126 motor units over eight operators with various experience in manual editing was 99.6%.  All raw and processed data have been made available on a public data repository so that they can be used for training new operators (10.6084/m9.figshare.13695937).”

      Minor 

      Ln 115, "inversing" is not a word. "inverse" is not a verb 

      Changed as suggested

      Ln 186, typo, bioadhesive 

      Changed as suggested

      MVC should be defined on first use. It is currently defined on 3rd use or so. 

      The term rate is used in a variety of places without units. Eg line 465 but not limited to that 

      Changed as suggested

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Two minor comments: Para 125: it is not clear what is meant by "spatial distribution" of recording electrodes. 

      ‘Density’ was used instead of ‘spatial distribution’ to now read:

      ‘Increasing the number and density of recording electrodes increases the likelihood that each motor unit will have a unique motor unit action potential profile (shape), i.e., a temporal and spatial profile that differs from all the other active motor unit within the recorded volume (16, 29).’

      Para 545: perhaps a bit more explanation about why low spatial overlap is better would be appropriate. 

      We added a section in the results showing how motor units with similar spatial signatures are merged by our model, leading to a lower precision. We therefore changed this sentence to now read:

      ‘Therefore, the likelihood of having spatially overlapping motor unit action potentials - and thus merged motor units - is lower, which explains why the rate of agreement of motor units identified from intramuscular arrays of electrodes is much higher than grids of surface electrodes (12, 13).’

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention that data is included with the Github software package. I could not find any included data, or instructions on how to run the software offline on example data. (Apologies if I missed this - it would be helpful to make it more prominent)

      The link to the data on figshare was added in the GitHub, as well as data samples to run the algorithm offline and test manual editing.

      Minor comments: 

      Not sure what is meant by "boundary capabilities of online decomposition" 

      This was removed to only discuss the accuracy of online decomposition.

      CoV for ISIs is not formally defined or justified.

      This was added to the caption of figure 2:

      ‘The CoV of ISI estimates the regularity of spiking for each motor unit, an expected behaviour during isometric contractions at consistent levels of force.’

      Fig. 4: slope units should be ms/motor unit, perhaps? 

      Changed as suggested.

      In some places, the manuscript uses "edition" to describe the editing process. I am not familiar with this usage, "editing" may be more common. 

      Editing is now used through the entire manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      I would recommend that the authors revise their manuscript to conform to eLife formatting guidelines, including moving the methods to the end of the manuscript. This change may entail substantial editing since many ideas are presented in order from the beginning of the methods. While this suggestion may seem superficial, the success of the new publishing model might benefit from general uniformity in manuscript style.

      We changed and edited the draft to follow the classic format of Elife papers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.

      We thank the editors for their helpful comments.

      Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.

      Strengths:

      HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.

      We appreciate your valuable points.

      Weaknesses:

      This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.

      The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.

      In the revised version, we have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.

      Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that F We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.

      This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.

      In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq. In addition, RNase treatment markedly abolished the HBD-seq signals and the non-labeled controls exhibit obviously reduction of HepG4-seq signals, demonstrating that HBD-seq and HepG4-seq were not contamination from tagmentation of asccessible DNA.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.

      The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.

      Strengths:

      (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.

      (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.

      (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.

      We appreciate your valuable points.

      Weaknesses:

      (1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.

      (2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.

      Thank you for the suggestions. We have included a new Fig. 9 and a new section “Comparisons of HepG4-seq and HBD-seq with previous methods” to carefully compare our methods to other methods.

      (3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.

      Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.

      (4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.

      Thank you for the suggestions. We have included the discussion in the revised version.

      (5) The manuscript lacks appropriate statistical analyses to support the major conclusions.

      We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We have carefully added details of all statistical analysis.

      (6) The discussion could be expanded to address potential limitations and alternative explanations for the results.

      Thank you for the suggestions. We have included the discussion about this point in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.

      Strengths:

      By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.

      The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.

      Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.

      Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.

      In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.

      We appreciate your valuable points.

      Weaknesses:

      As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.

      Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figures 1A&1G. Although no merge images were provided, it seems that the biotin signals are strongly enriched outside the nucleus. This suggests that hemin is not specific for G4s in DNA. Does it mean that Hemin can also recognise G4 on RNAs? How do the authors understand the cytoplasmic signal?

      Hemin indeed could interact with RNA G4 to obtain the peroxidase activity like DNA G4-hemin complex (PMID: 27422869, 32329781, 31257395). The cytoplasmic signals in Figure 1A&1G were derived from RNA G4.

      Figure 1A: The fact that there is no Alexa647 signal without hemin or Bio-An does not actually demonstrate that the signals are specific. These controls do not actually test for the specificity of the G4-Hemin interaction.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In this study, we performed the IF to confirm this phenomena.

      Figure 1C: It looks like the HepG4-seq signals are simply an amplification of the noise given by the Tn5 (the non-label ctrl has the same pattern, albeit weaker). It is unclear why this happens but it might happen if somehow hemin increased the probability that the Tn5 is close to chromatin in an unspecific manner (it would cut G-rich, nucleosome-poor, accessible sites in an unspecific manner). To discard this possibility, it would be interesting to investigate directly which loci are biotinylated. For this, the authors could extract and sonicate the genomic DNA and use streptavidin to enrich for biotinylated fragments. Strand-specific DNA sequencing could then be used to map the biotinylated loci.

      In the cell culture medium, there were a certain amount of hemin from serum and a low dosage of biotin from the basal medium DMEM, which could not be avoid. Thus, these contaminated hemin and biotin would generate the background signals observed in the Non-label control samples. The biotinylated sites were specifically recognized by the recombinant Streptavidin monomer which further recruits Tn5 to the biotinylated sites with the help of Moon-tag. Different from the signals in the HEK293 samples, a much more robust HepG4-seq signals were observed in the mESC samples and the signals were also abolished in the non-label control samples. Thus, the relatively small signal-to-noise ratio in the HEK293 samples suggest the week abundance of endogenous G4s in the HEK293 cells. Thus, we politely disagree that hemin increased the non-specific recruitment of Th5. In addition, the CUT&Tag technology has been wildly demonstrated to have a much lower background, high signal-to-noise ratio and high sensitivity. Thus, we also politely disagree to replace the CUT&Tag with the traditional DNA library preparation method.

      Figure 1H: No spike-in was added and the data are not quantitative. The number of replicates is unclear. 70000 extra peaks (10x) after inhibition of BLM or WRN seems enormous. These extra peaks should be better characterised: do they contain G4 motifs? Are they transcribed? etc...; again what kind of controls should be used here, in case the inhibition of BLP and WRN has a global impact on chromatin accessibility?

      To quantitatively compare different samples, we have normalized all samples according their de-duplicated uniquely mapping reads numbers. Given that the inhibitors were dissolved in the DMSO, we used the DMSO as the control. Since the Tn5 were specifically recruited the biotinylated G4 sites through the recombinant Streptavidin monomer protein and the moon tag system, the chromatin accessibility will not affect the Tn5, which were normally observed in the ATAT-seq.

      As suggested, we have analyzed the enriched motifs of the extra peaks induced by BLM or WRN inhibition and showed that the top enriched motifs are also G-rich in the supplementary Fig.1E. In addition, we analyzed the RNA-seq levels of genes-associated with these extra peaks. As shown in the figure below, the majority of these genes are actively transcribed.

      Author response image 1.

      Figure 2: The mutated version of HBD should have been used as a control. As shown clearly in PMID: 37819055, the HBD domain does interact in an unspecific manner with chromatin at low levels. As above, this might be enough to increase the local concentration of the Tn5 close to chromatin in the Cut&Tag approach and to cleave accessible sites close to TSS in an unspecific manner.

      As shown in Fig.2B and Fig.4A, we have included the RNase treatment as the control and showed that the HBD-seq-identified R-loops signals are dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). These data demonstrate the specificity of HBD-seq.

      Figure 2: What fraction of the HEPG4-seq signal is sensitive to RNase treatment? The authors used a combination of RNase A and RNase H but previous data have shown that the RNase A treatment is sufficient to remove the HBD-seq signal (which means that it is not actually possible on this sole basis to claim or disclaim that the signals do correspond to genuine R-loops). Do the authors have evidence that the RNase H treatment alone does impact their HBD-seq or HEPG4-seq signals?

      As shown in Fig.2B and Fig.4A, the HBD-seq-identified R-loops signals are all dramatically attenuated (Fig.2B) or almost completely abolished after the RNase treatment (Fig.4A). The specificity of HBD on recognizing R-loops has been carefully demonstrated in the previous study (PMID: 33597247). In this study, we used the same two copies of HBD (2xHBD) and replaced the GST tag to EGFP-V5 to reduce the possibility of variable high molecular-weight aggregates caused by GST tag. In addition, RNase H treatment has been shown to fail to completely abolish the CUT&Tag signals since a subset of DNA-RNA hybrids with high GC skew are partially resistant to RNase H (PMID: 32544226, 33597247). In consideration of the high GC skew of co-localized G4s and R-loops, we combined the RNase A and RNase H. We currently did not have the RNaseH alone samples.

      Figure 3A: "RNA-seq analysis revealed that the RNA levels of co-localized G4s and R-loops-associated genes are significantly higher": the differences are not very convincing.

      In the Figure 3A, we have performed the Mann-Whitney test to examine the significance in the revised manuscript. RNA levels of co-localized G4s and R-loops-associated genes are indeed significantly higher than all genes, G4s or R-loops- associated genes with the Mann-Whitney test p < 2.2E-16.

      Figure 3B: the patterns for "G4" and "co-localised G4 and R-loop" are extremely similar, suggesting that nearly all G4s mapped here could also form R-loops. If this is the case, most of the HEPG4-seq signals should be sensitive to exogenous RNase H treatment or to the in vivo over-expression of RNase H1. This should be tested (see above).

      The percentage of co-localized G4 and R-loop in G4 peaks is 80.3% ( 5,459 out of 6,799) in HEK293 cells and 72.0% (68,482 out of 95,128) in mESC cells, respectively. The co-localization does not mean that G4 and R-loop interact with each other. We have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test.

      Figure 3C: there is no correlation between the FC of G4 and the FC of RNA; this is not really consistent with the idea that the stabilisation of G4 is the driver rather than a consequence of the transcriptional changes.

      Given that the treatment of WRN or BLM inhibition induced a large mount of G4 accumulation (Fig.1H-I), we examined the transcription effect on genes associated with these accumulated G4s in Fig.3C. We indeed observed the effect of G4 accumulation on transcription of G4-associated genes. Given that G4 stabilization triggers the transcriptional changes, it does not mean that the transcriptional changes should be highly correlated with the increase levels of G4s. To our knowledge, we have not observed this type of connections in the previous studies. 

      l279: the overlap with H3K4me1 is really not convincing.

      For all G4 peaks, the signals of H3K4me1 indeed exhibit a high background around the center of G4 peaks but we still could observe a clear peak in the center.

      Figure 5C: it should be clearly indicated here that the authors compare Cut&Tag and ChIP data. The origin of the ChIP-seq data is also unclear and should be indicated.

      Thank you for the suggestions. We have clarified this point.

      For the ChIP data, we have described the origin of ChIP-seq data in the “Data availability” section as below: “The ChIP-seq data of histone markers and RNAP are openly available in GNomEx database (accession number 44R) (Wamstad et al., 2012).”

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1A. An experimental condition lacking H2O2 (-H2O2) should be included.

      We have added this control in Fig.1A

      (2) Does RNAse H affect G4 profiles?

      We have not tested the effect of RNase H on G4 forming. However, we have showed that only small proportion of co-localized G4s and R-loops displayed differential G4s and R-loops at the same time in the dhx9KO mESCs (Fig. 6D, Supplementary Fig. 3B), suggesting that the majority of co-localized G4s and R-loops do not interact with each other. Thus, we thought that it is not necessary to perform the RNase H test on G4. In addition, to treat cells wit RNase H, we have to permeabilize cells first to let RNase H enter the nuclei. If so, we will lose the pictures of endogenous G4s.

      (3) Figure 2G. R-loops are detected upstream of the KPNB1 gene. What is this region? Is it transcribed?

      We are so sorry to make a mistake when we prepared this figure. We have change it to the correct one in Fig. 2G. The R-loop is around the TSS of KPNB1. We also showed the RNA-seq data in this region in Author response image 2 below. This region is indeed transcribed.

      Author response image 2.

      (4) Did BLM and WRN inhibition specifically affect the expression of genes containing colocalized G4s and R-loops? Was the effect seen in other genes as well? Appropriate statistical analyses are needed.

      In the Fig.3, we have shown that the accumulation of co-localized G4 and R-loops induced by the inhibition of BLM or WRN significantly caused the changes of genes (480 in BLM inhibition, 566 in WRN inhibition) containing these structures most of which are localized at the promoter-TSS regions. We indeed detected the effect in other genes as well. There were 918 and 1020 genes with significantly changes (padjust <0.05 & FC >=2 or FC <=0.5) in BLM and WRN inhibition, respectively.

      (5) The claim that "The co-localized G4s and R-loops-mediated transcriptional regulation in HEK293 cells" (title of Figure 3) is not supported by the presented data. A causality link is not established in this study, which only reports correlations between G4s/R-loops and transcription regulation.

      We politely disagree with this point. BLM and WRN are the best characterized DNA G4-resolving helicase ((Fry and Loeb, 1999; Mendoza et al., 2016; Mohaghegh et al., 2001). Here, we used the selective small molecules to specifically inhibit their ATPase activity and observed dramatical induction of G4 accumulation. Notably, the accumulated G4s that trigger the transcriptional changes are mainly located at the promoter-TSS region. If the transcriptional changes trigger the G4 accumulations, we should not observe such a biased distribution and more accumulated G4s should be detected in the gene body.

      (6) The effect of Dhx9 KO on colocalized G4s/R-loops and transcription is not clear. The suggestion that Dhx9 could regulate transcription by modulating G4s, R-loops, and co-localized G4s and R-loops is not supported by the presented data. Additional experiments and statistical analyses are needed to conclude the role of Dhx9 on colocalized G4s/Rloops and transcription.

      Dhx9 has been extensively studied and reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Thus, it is not necessary to repeat these assays again. To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. These data have clearly shown the roles of Dhx9 directly modulating the stabilities of G4s and R-loops. Furthermore, we showed that loss of Dhx9 caused 816 Dhx9 directly bound colocalized G4 and R-loop associated genes significantly differentially expressed, supporting the transcriptional regulation of Dhx9. We performed the differential analysis following the standard pipeline: DESeq2 for RNA-seq and DiffBind for HepG4-seq and HBD-seq. The statistical details have been described in the figure legends.

      (7) The conclusion that Dhx9 regulates the self-renewal and differentiation capacities of mESCs is vague. Additional experiments are needed to elucidate the exact contribution of Dhx9.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. In this study, we have shown that depletion of Dhx9 significantly attenuated the proliferation of the mESCs and also influenced the capacity of mESCs differentiation into three germline lineages during the EB assay. In addition, we showed that depletion of Dhx9 significantly reduced the protein levels of mESCs pluripotent markers Nanog and Lin28a. The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      Reviewer #3 (Recommendations For The Authors):

      The study on the involvement of native co-localized G4s and R-loops in transcriptional regulation further enriches the readers' understanding of genomic regulatory networks, and the functional dissection of Dhx9 also lays a good foundation for the study of the dynamic regulatory mechanisms of co-localized G4s and R-loops. Unfortunately, however, the authors lack a strong basis for questioning the widely used BG4 and S9.6 antibodies, and the co-localized G4s and R-loops sequencing data obtained by the developed and optimized method also lack parallel comparison with existing sequencing technologies, which cannot indicate that HepG4-seq and HBD-seq are more reliable and superior than BG4 and S9.6 antibody-based sequencing technologies. There are also some minor errors in the manuscript that need to be corrected.

      Thank you for the constructive comments. We have added a new section (Comparisons of HepG4-seq and HBD-seq with previous methods) and a new figure 9 to parallelly compare our methods to other widely-used methods.

      (1) This work mainly focuses on co-localized G4s and R-loops, but in the introduction section, the interplay between G4s and R-loops is only briefly mentioned. It is suggested that the importance of the interplay of G4s and R-loops for gene regulation should be further expanded to help readers better understand the significance of studying co-localized G4s and R-loops.

      Thank you for the comments. The current studies about the interplay between G4s and R-loops are limited. We have summarized all we could find in the literatures.

      (2) The authors mentioned that "a steady state equilibrium is generally set at low levels in living cells under physiological conditions (Miglietta et al., 2020) and thus the addition of high-affinity antibodies may pull the equilibrium towards folded states", in my understanding this is one of the important reasons why the authors optimized the G4s and R-loops detection assays, I wonder if there is a reliable basis for this statement. If there is, I suggest that the authors can supplement it in the manuscript.

      The main reason we develop the new method is to develop an antibody-free method to label the endogenous G4s in living cells. We ever tried to capture endogenous G4s using the tet-on controlled BG4. Unfortunately, we found that even a short time induction of BG4 in living cells was toxic. The traditional antibody-based methos rely on permeabilizing cells first to let the antibodies enter the nuclei. In this case, it is easy to lost the physiological pictures of endogenous G4s. We will add more discussion about this point. For R-loops, we just further optimized the GST-2xHBD-mediated method to avoid the problem of GST-tag. GST-fusion proteins are prone to form variable high molecular-weight aggregates and these aggregates often undermine the reliability of the fusion proteins.

      (3) Some questions about HepG4-seq:

      Is there a difference in hemin affinity for intramolecular G quadruplexes, interstrand G quadruplexes, and their different topologies? If so, does this bias affect the accuracy of sequencing results based on G4-hemin complexes?

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s (PMID: 32329781). Thus, the dynamics of G4 conformation possibly affect the HepG4-seq signals. In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We have discussed this point in the revised version.

      HepG4-seq is based on proximity labeling and peroxidase activity of the G4-hemin complex. The authors tested and confirmed that the addition of hemin and Bio-An in the experiment had no significant influences on sequencing results, but the effect of exogenous H2O2 treatment may also need to be taken into account since ROS can mediate the formation of G4s.

      For HepG4-seq protocol, we only treat cells with H2O2 for one minute. Thus, we thought that the side effect of H2O2 treatment should be limited in such a short time.

      (4) As we know, there have been at least two structure data of the S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the author's bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We have included the comparison between S9.6-derived data and our HBD-seq data in the Fig.9 and the section “Comparisons of HepG4-seq and HBD-seq with previous methods”.

      (5) It is hoped that the results of immunofluorescence experiments can be statistically analyzed.

      We have performed the statistical analysis and included the data in the new figure.

      (6) Some minor errors:

      Line 168, "G4-froming" should be "G4-forming";

      Figure 5E, the color of the "Repressed" average signal at the top of the HepG4-seq heatmap should be blue;

      Figure 7C, the abbreviation "Gloop" should be indicated in the text or in the figure caption.

      Thank you for pointing out these issues. We are sorry for these mistakes. We have corrected them in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this useful study, a solid machine learning approach based on a broad set of systems to predict the R2 relaxation rates of residues in intrinsically disordered proteins (IDPs) is described. The ability to predict the patterns of R2 will be helpful to guide experimental studies of IDPs. A potential weakness is that the predicted R2 values may include both fast and slow motions, thus the predictions provide only limited new physical insights into the nature of the relevant protein dynamics.

      Fast motions are less sequence-dependent (e.g., as shown by R1). Hence the sequence-dependent part of R2 singles out slow motion.

      Public Reviews:

      Reviewer #1 (Public Review):

      Solution state 15N backbone NMR relaxation from proteins reports on the reorientational properties of the N-H bonds distributed throughout the peptide chain. This information is crucial to understanding the motions of intrinsically disordered proteins and as such has focussed the attention of many researchers over the last 20-30 years, both experimentally, analytically and using numerical simulation.

      This manuscript proposes an empirical approach to the prediction of transverse 15N relaxation rates, using a simple formula that is parameterised against a set of 45 proteins. Relaxation rates measured under a wide range of experimental conditions are combined to optimize residuespecific parameters such that they reproduce the overall shape of the relaxation profile. The purely empirical study essentially ignores NMR relaxation theory, which is unfortunate, because it is likely that more insight could have been derived if theoretical aspects had been considered at any level of detail.

      NMR relaxation theory is very valuable in particular regarding motions on different timescales. However, it has very little to say about the sequence dependence of slow motions, which is the focus of our work.

      Despite some novel aspects, in particular the diversity of the relaxation data sets, the residuespecific parameters do not provide much new insight beyond earlier work that has also noted that sidechain bulkiness correlated with the profile of R2 in disordered proteins.

      The novel insight from our work is that R2 can mostly be predicted based on the local sequence.

      Nevertheless, the manuscript provides an interesting statistical analysis of a diverse set of deposited transverse relaxation rates that could be useful to the community.

      Thank you!

      Crucially, and somewhat in contradiction to the authors stated aims in the introduction, I do not feel that the article delivers real insight into the nature of IDP dynamics. Related to this, I have difficulty understanding how an approximate prediction of the overall trend of expected transverse relaxation rates will be of further use to scientists working on IDPs. We already know where the secondary structural elements are (from 13C chemical shifts which are essential for backbone assignment) and the necessary 'scaling' of the profile to match experimental data actually contains a lot of the information that researchers seek.

      Again, the novel insight is that slow motions that dictate the sequence dependence of R2 can mostly be predicted based on the local sequence. The scaling factor may contain useful information but does not tell us anything about the sequence dependence of IDP dynamics.

      This reviewer brings up a lot of valuable points, clearly from an NMR spectroscopist’s perspective. The emphasis of our paper is somewhat different from that perspective. For example, we were interested in whether tertiary contacts make significant contributions to R2, as sometimes claimed. Our results show that, in general, they do not; instead local contacts dominate the sequence dependence of R2.

      (1) The introduction is confusing, mixing different contributions to R2 as if they emanated from the same physics, which is not necessarily true. 15N transverse relaxation is said to report on 'slower' dynamics from 10s of nanoseconds up to 1 microsecond. Semi-classical Redfield theory shows that transverse relaxation is sensitive to both adiabatic and non-adiabatic terms, due to spin state transitions induced by stochastic motions, and dephasing of coherence due to local field changes, again induced by stochastic motions. These are faster than the relaxation limit dictated by the angular correlation function. Beyond this, exchange effects can also contribute to measured R2. The extent and timescale limit of this contribution depends on the particular pulse sequence used to measure the relaxation. The differences in the pulse sequences used could be presented, and the implications of these differences for the accuracy of the predictive algorithm discussed.

      Indeed pulse sequences affect the measured R2 values. We make the modest assumption that such experimental idiosyncrasy would not corrupt the sequence dependence of IDP dynamics. As for exchange effects, our expectation is that the current SeqDYN may not do well for R2s where slow exchange plays a dominant role in generating sequence dependence, as tertiary contacts would be prominent in those cases; we now present one such case (new Fig. S5).

      (2) Previous authors have noted the correlation between observed transverse relaxation rates and amino acid sidechain bulkiness. Apart from repeating this observation and optimizing an apparently bulkiness-related parameter on the basis of R2 profiles, I am not clear what more we learn, or what can be derived from such an analysis. If one can possibly identify a motif of secondary structure because raised R2 values in a helix, for example, are missed from the prediction, surely the authors would know about the helix anyway, because they will have assigned the 13C backbone resonances, from which helical propensity can be readily calculated.

      We think that a sequence-based method that is demonstrated to predict well R2 values from expensive NMR experiments is significant. That pi-pi and cation-pi interactions are prominent features of local contacts and may seed tertiary contacts and mediate inter-chain contacts that drive phase separation is a valuable insight.

      (3) Transverse relaxation rates in IDPs are often measured to a precision of 0.1s-1 or less. This level of precision is achieved because the line-shapes of the resonances are very narrow and high resolution and sensitivity are commonly measurable. The predictions of relaxation rates, even when applying uniform scaling to optimize best-agreement, is often different to experimental measurement by 10 or 20 times the measured accuracy. There are no experimental errors in the figures. These are essential and should be shown for ease of comparison between experiment and prediction.

      Again, our focus is not the precision of the absolute R2 values, but rather the sequence dependence of R2.

      (4) The impact of structured elements on the dynamic properties of IDPs tethered to them is very well studied in the literature. Slower motions are also increased when, for example the unfolded domain binds a partner, because of the increased slow correlation time. The ad hoc 'helical boosting' proposed by the authors seems to have the opposite effect. When the helical rates are higher, the other rates are significantly reduced. I guess that this is simply a scaling problem. This highlights the limitation of scaling the rates in the secondary structural element by the same value as the rest of the protein, because the timescales of the motion are very different in these regions. In fact the scaling applied by the authors contains very important information. It is also not correct to compare the RMSD of the proposed method with MD, when MD has not applied a 'scaling'. This scaling contains all the information about relative importance of different components to the motion and their timescales, and here it is simply applied and not further analysed.

      Actually, applying the boost factor achieves the effect of a different scaling factor for the secondary structure element than for the rest of the protein.

      Regarding comparing RMSEs of SeqDYN and MD, it is true that SeqDYN applies a scaling factor whereas MD does not. However, even if we apply scaling to MD results it will not change the basic conclusion that “SeqDYN is very competitive against MD in predicting _R_2, but without the significant computational cost.”

      (5) Generally, the uniform scaling of all values by the same number is serious oversimplification. Motions are happening on all timescales they are giving rise to different transverse relaxation. It is not possible to describe IDP relaxation in terms of one single motion. Detailed studies over more than 30 years, have demonstrated that more than one component to the autocorrelation function is essential in order to account for motions on different timescales in denatured, partially disordered or intrinsically unfolded states. If one could 'scale' everything by the same number, this would imply that only one timescale of motion were important and that all others could be neglected, and this at every site in the protein. This is not expected to be the case, and in fact in the examples shown by the authors it is also never the case. There are always regions where the predicted rates are very different from experiment (with respect to experimental error), presumably because local dynamics are occurring on different timescales to the majority of the molecule. These observations contain useful information, and the observation that a single scaling works quite well probably tells us that one component of the motion is dominant, but not universally. This could be discussed.

      The reviewer appears to equate a single scaling factor with a single type of motion -- this is not correct. A single scaling factor just means that we factor out effects (e.g., temperature or magnetic field) that are uniform across the IDP sequence.

      (6) With respect to the accuracy of the prediction, discussion about molecular detail such as pi-pi interactions and phase separation propensity is possibly a little speculative.

      It is speculative; we now add more support to this speculation (p. 18 and new Fig. S6).

      (7) The authors often declare that the prediction reproduces the experimental data. The comparisons with experimental data need to be presented in terms of the chi2 per residue, using the experimentally measured precision which as mentioned, is often very high.

      Again, our interest is the sequence dependence of R2, not the absolute R2 value and its measurement precision.

      Reviewer #2 (Public Review):

      Qin, Sanbo and Zhou, Huan-Xiang created a model, SeqDYN, to predict nuclear magnetic resonance (NMR) spin relaxation spectra of intrinsically disordered proteins (IDPs), based primarily on amino acid sequence. To fit NMR data, SeqDYN uses 21 parameters, 20 that correspond to each amino acid, and a sequence correlation length for interactions. The model demonstrates that local sequence features impact the dynamics of the IDP, as SeqDYN performs better than a one residue predictor, despite having similar numbers of parameters. SeqDYN is trained using 45 IDP sequences and is retrained using both leave-one-out cross validation and five-fold cross validation, ensuring the model's robustness. While SeqDYN can provide reasonably accurate predictions in many cases, the authors note that improvements can be made by incorporating secondary structure predictions, especially for alpha-helices that exceed the correlation length of the model. The authors apply SeqDYN to study nine IDPs and a denatured ordered protein, demonstrating its predictive power. The model can be easily accessed via the website mentioned in the text.

      While the conclusions of the paper are primarily supported by the data, there are some points that could be extended or clarified.

      (1) The authors state that the model includes 21 parameters. However, they exclude a free parameter that acts as a scaling factor and is necessary to fit the experimental data (lambda). As a result, SeqDYN does not predict the spectrum from the sequence de-novo, but requires a one parameter fitting. The authors mention that this factor is necessary due to non-sequence dependent factors such as the temperature and magnetic field strength used in the experiment.

      Given these considerations, would it be possible to predict what this scaling factor should be based on such factors?

      There are still too few data to make such a prediction.

      (2) The authors mention that the Lorentzian functional form fits the data better than a Gaussian functional form, but do not present these results.

      We tested the different functional forms at the early stage of the method development. The improvement of the Lorentzian over the Gaussian was slight and we simply decided on the Lorentzian and did not go back and do a systematic analysis.

      (3) The authors mention that they conducted five-fold cross validation to determine if differences between amino acid parameters are statistically significant. While two pairs are mentioned in the text, there are 190 possible pairs, and it would be informative to more rigorously examine the differences between all such pairs.

      We now present t-test results for other pairs in new Fig. S3.

      Reviewer #3 (Public Review):

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

      Our method predicts the sequence dependence of R2, which is dominated by slower dynamics.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Should make sure to define abbreviations such as NMR and SeqDYN.

      We now spell out NMR at first use. SeqDYN is the name of our method and is not an abbreviation.

      (2) The authors do not mention how the curves in Figure 2A are calculated.

      As we stated in the figure caption, these curves are drawn to guide the eye.

      (3) May be interesting to explore how the model parameters (q) correlate with different measures of hydrophobicity (especially those derived for IDPs like Urry). This may point to a relationship between amino acid interactions and amino acid dynamics

      We now present the correlation between q and a stickiness parameter refined by Tesei et al. (new ref 45) and used for predicting phase separation equilibrium (new Fig. S6).

      (4) The authors demonstrate that secondary structure cannot be fully accounted for by their model. They make a correction for extended alpha-helices, but the strength of this correction seems to only be based on one sequence. Would a more rigorous secondary structure correction further improve the model and perhaps allow its transferability to ordered proteins?

      We have five 4 test cases (Figs. 4E, F and 5H, I). However, we doubt that the SeqDYN method will be transferable to ordered proteins.

      Reviewer #3 (Recommendations For The Authors):

      Changes that could strengthen the manuscript substantially.

      (1) The authors do not really define what they mean by dynamics, but given that they train and benchmark on R2 measurements, the directly probe whatever goes into the measured R2. Using a direct measurement is a strength since it makes it clear what they are predicting. It also, however, makes it difficult to interpret. This is made clear in the text when the authors, for example write "𝑅2 is the one most affected by slower dynamics (10s of ns to 1 μs and beyond)." First, with the "and beyond" it could literally mean anything. Second, the "normal" R2 rate is limited up to motions up to the (local) "tumbling/reorganization" time (which is much faster), so any slow motions that go into R2 would be what one would normally call "exchange". The authors should thus make it clearer what exactly it is they are probing. In the end, this also depends on the origin of the experimental data, and whether the "R2" measurements are exchange-free or not. This may be a mixture, which hampers interpretations and which may also explain some of the rescaling that needs to be done.

      We now remove “and beyond”, and also raise the possibility that R2 measurements based on 15N relaxation may have relatively small exchange contributions (p. 17).

      (2) Related to the above, the authors might consider comparing their predictions to the relaxation experiments from Kriwacki and colleagues on a fragment of p27. In that work, the authors used dispersion experiments to probe the dynamics on different timescales. The authors would here be able to compare both to the intrinsic R2 rates (when slow motions are pulsed away) as well as the effective R2 rates (which would be the most common measurement). This would help shed light on (at least in one case) which type of R2 the prediction model captures. https://doi.org/10.1021/jacs.7b01380

      We now report this comparison in new Fig. S5 and discuss its implications (p. 17-18).

      (3) In some cases, disagreement between prediction and experiments is suggested to be due to differences in temperature, and hence is used as an argument for the rescaling done. Here, the authors use a factor of 2.0 to explain a difference between 278K and 298K, and a factor of 2.4 to explain the difference between 288K and 298K. It would be surprising if the temperature effect from 288K->298K is larger than from 278K->298K. Does this not suggest that the differences come as much from other sources?

      Note that the scaling factors 2.0 and 2.4 were obtained on two different IDPs. It is most likely that different IDPs have different scaling factors for temperature change. As a simple model, the tumbling time for a spherical particle scales with viscosity and the particle volume; correspondingly the scaling factor for temperature change should be greater for a larger particle than for a smaller particle.

      (4) The authors find (as have others before) aromatic residues to be common at/near R2 peaks. They suggest this to be indicative for Pi-Pi interactions. Could this not be other types of interactions since these residues are also "just" more hydrophobic? Also, can the authors rule out that the increased R2 rates near aromatic residues is not due to increased dynamics, but simply due to increased Rex-terms due to greater fluctuations in the chemical shifts near these residues (due to the large ring current effects).

      We noted both pi-pi and cation-pi as possible interactions that raise R2. There can be other interactions involving aromatic residues, but it’s unlikely to be only hydrophobic as Arg is also in the high-q end. For the same reason, a ring-current based explanation would be inadequate.

      (5) The authors write: "We found that, by filtering PsiPred (http://bioinf.cs.ucl.ac.uk/psipred) (35) helix propensity scores (𝑝,-.) with a very high cutoff of 0.99, the surviving helix predictions usually correspond well with residues identified by NMR as having high helix propensities." It would be good to show the evidence for this in the paper, and quantify this statement.

      The cases of most interest are the ones with long predicted helices, of which there are only 3 in the training set. For Sev-NT and CBP-ID4, we already summarize the NMR data for helix identification in the first paragraph of Results; the third case is KRS-NT, which we elaborate in p. 14.

      (6) When analysing the nine test proteins, it would be very useful for the reader to get a number for the average accuracy on the nine proteins and a corresponding number for the training proteins. The numbers are maybe there, but hard to find/compare. This would be important so that one can understand how well the model works on the training vs testing data.

      We now present the mean RMSE comparison in p. 14.

      (7) The authors write: "The 𝑞 parameters, while introduced here to characterize the propensities of amino acids to participate in local interactions, appear to correlate with the tendencies of amino acids to drive liquid-liquid phase separation." It would be good to show this data and quantify this.

      We now list supporting data in p. 18 and present new Fig. S6 for further support.

      (8) It is great that the authors have made a webservice available for easy access to the work. They should in my opinion also make the training code and data available, as well as the final trained model. Here it would also be useful to show the results from the use of a Gaussian that was also tested, and also state whether this model was discarded before or after examining the testing data.

      We have listed the IDP characteristics and sequences in Tables S1 and S2. We’re unsure whether we can disseminate the experimental R2 data without the permission of the original authors. As for the Gaussian function, as stated above, it was abandoned at an early state, before examining the testing data.

      Changes that would also be useful

      (1) The authors should make it clearer what they predict and what they don't. They mention transient helix formation and various contacts, but there isn't a one-to-one relationship between these structural features and R2 rates. Hence, they should make it clearer that they don't predict secondary structure and that an increased R2 rate may be indicative of many different structural/dynamical features on many different time scales.

      We clearly state that we apply a helix boost after the regular SeqDYN prediction.

      (2) The authors write "Instead, dynamics has emerged as a crucial link between sequence and function for IDPs" and cite their own work (reference 1) as reference for this statement. As far as I can see, that work does not study function of IDPs. Maybe the authors could cite additional work showing that the dynamics (time scales) affects function of IDPs beyond "just" structure? Otherwise, the functional consequences are not clear. Maybe the authors mean that R2 rates are indicative of (residual) structure, but that is not quite the same. Also, even in that case, there are likely more appropriate references.

      Ref. 1 summarized a number of scenarios where dynamics is related to function.

      (3) The authors might want to look at some of the older literature on interpreting NMR relaxation rates and consider whether some of it is worth citing.

      Fitting/understanding R2 profiles https://doi.org/10.1021/bi020381o https://doi.org/10.1007/s10858-006-9026-9

      MD simulations and comparisons to R2 rates without ad hoc reweighting (in addition to the papers from the authors themselves). https://doi.org/10.1021/ja710366c https://doi.org/10.1021/ja209931w

      The R2 data for the two unfolded proteins are very helpful! We now present the comparison of these data to SeqDYN prediction in Fig. 6C, D. The MD papers are superseded by more recent studies (e.g., refs. 1 and 14).

      There are more like these.

      (4) In the analysis of unfolded lysozyme, I assume that the authors are treating the methylated cysteines (which are used in the experiments) simply as cysteine. If that is the case, the authors should ideally mention this specifically.

      Treatment of methylated cysteines is now stated in the Fig. 6 caption.

      (5) The authors write "Pro has an excessively low ms𝑅2 [with data from only two IDPs (32, 33)], but that is due to the absence of an amide proton." It would be useful with an explanation why lacking a proton gives rise to low 15N R2 rates.

      That assertion originated from ref. 32.

      (6) When applying the model, the authors predict msR2 and then compare to experimental R2 by rescaling with a factor gamma. It would be good to make it clearer whether this parameter is always fitted to the experiments in all the comparisons. It would be useful to list the fitted gamma values for all the proteins (e.g. in Table S1).

      We already give a summary of the scaling factors (“For 39 of the 45 IDPs, Υ values fall in the range of 0.8 to 2.0 s–1”, p. 10).

      (7) p. 14 "nineth" -> "ninth"

      Corrected

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules at specific subcellular locations, in this specific case filopodia, in epifluorescence datasets compared to the more laborious and troublesome single molecule approaches. Based on these preliminary estimates, the authors developed further their analysis and discussed different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia. 

      Strengths: 

      I confirm my previous assessment. Overall, the paper is elegantly written and the data analysis is appropriately presented. Moreover, the novel experimental approach offers advantages to labs with limited access to high-end microscopy setups (super-resolution and/or EM in particular), and the authors proved its applicability to both fixed and live samples. 

      Weaknesses: 

      Myself and the other two reviewers pointed to the same weakness, the use of protein overexpression in U2OS. The authors claim that Myosin10 is not expressed by U2OS, based on Western blot analysis. Does this completely rule out the possibility that what they observed (the polarity of filopodia and the bulge accumulation of Myo10) could be an artefact of overexpression? I am afraid this still remains the main weakness of the paper, despite being properly acknowledged in the Limitations.

      Respectfully, our observations do not capture an “artefact” of overexpression but rather the “response” to overexpression. Our goal in this project was to overexpress Myo10 in a situation where it is the limiting reagent for generating filopodia. As Reviewer 3 notes below, overexpression shows that filopodial tips “can accommodate a surprisingly (shockingly) large number of motors.” This is exactly the point. Reviewer 2 considered our handling of this issue to be a strength of the paper. As far as whether bulges occur in endogenous Myo10 systems, please see our comments to Reviewer 3. 

      I consider all the remaining issues I expressed during the first revision solved. 

      Reviewer #2 (Public Review): 

      Summary: 

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing. 

      Strengths: 

      The paper addresses an important fundamental biological question about how many molecular motors are localized to a specific cellular compartment and how that may relate to other aspects of the compartment such as the actin cytoskeleton and the membrane. The paper demonstrates a method of estimating the number of myosin molecules per cell using the fluorescently labeled HALO tag and SDS-PAGE analysis. There are several important conclusions from this work in that it estimates the number of Myo10 molecules localized to different regions of the filopodia and the minimum number required for filopodia formation. The authors also establish a correlation between number of Myo10 molecules filopodia localized and the number of filopodia in the cell. There is only a small % of Myo10 that tip localized relative to the total amount in the cell, suggesting Myo10 have to be activated to enter the filopodia compartment. The localization of Myo10 is log-normal, which suggests a clustering of Myo10 is a feature of this motor. 

      One of the main critiques of the manuscript was that the results were derived from experiments with overexpressed Myo10 and therefore are hard to extrapolate to physiological conditions. The authors counter this critique with the argument that their results provide insight into a system in which Myo10 is a limiting factor for controlling filopodia formation. They demonstrate that U20S cells do not express detectable levels of Myo10 (supplementary Figure 1E) and thus introducing Myo10 expression demonstrates how triggering Myo10 expression impacts filopodia. An example is given how melanoma cells often heavily upregulate Myo10. 

      In addition, the revised manuscript addresses the concerns about the method to quantitate the number of Myo10 molecules per cell and therefore puncta in the cell. The authors have now made a good faith effort to correct for incomplete labeling of the HALO tag (Figure 2A-C, supplementary Figure 2D-E). The authors also address the concerns about variability in transfection efficiency (Figure 1D-E). 

      A very interesting addition to the revised manuscript was the quantitation of the number of Myo10 molecules present during an initiation event when a newly formed filopodia just starts to elongate from the plasma membrane. They conclude that 100s of Myo10 molecules are present during an initiation event. They also examined other live cell imaging events in which growth occurs from a stable filopodia tip and correlated with elongation rates. 

      Weaknesses: 

      The authors acknowledge that a limitation of the study is that all of the experiments were performed with overexpressed Myo10. They address this limitation in the discussion but also provide important comparisons for how their work relates to physiological conditions, such as melanoma cells that only express large amounts of Myo10 when they are metastatic. Also, the speculation about how fascin can outcompete Myo10 should include a mechanism for how the physiological levels of fascin can complete with the overabundance of Myo10 (page 10, lines 401-408). 

      We have expanded the discussion about fascin competing with high concentrations of Myo10 in filopodial tips on pg. 15. The key feature is that fascin binding in a bundle is essentially irreversible, so it wins if any space opens up and it manages to bind before the next Myo10 arrives.

      Reviewer #3 (Public Review): 

      Summary 

      The work represents progress in quantifying the number of Myo10 molecules present in the filopodia tip. It reveals that cells overexpressing fluorescently labeled Myo10 that the tip can accommodate a wide range of Myo10 motors, up to hundreds of molecules per tip. 

      The revised, expanded manuscript addresses all of this reviewer's original comments. The new data, analysis and writing strengthen the paper. Given the importance of filopodia in many cellular/developmental processes and the pivotal, as yet not fully understood role of Myo10 in their formation and extension, this work provides a new look at the nature of the filopodial tip and its ability to accommodate a large number of Myo10 motor proteins through interactions with the actin core and surrounding membrane. 

      Specific comments - 

      (1) One of the comments on the original work was that the analysis here is done using cells ectopically expressing HaloTag-Myo10. The author's response is that cells express a range of Myo10 levels and some metastatic cancer cells, such as breast cancer, have significantly increased levels of Myo10 compared to non-transformed cell lines. It is not really clear how much excess Myo10 is present in those cells compared to what is seen here for ectopic expression in U2OS cells, making a direct correspondence difficult.

      We agree, a direct correspondence is difficult, and is further complicated by other variables (e.g., expression levels of Myo10 activators, cargoes, fascin, or other filopodial components) that may differ among cell lines. Properly sorting this out will require additional work in a few key cellular systems.

      However, there are two points to keep in mind that somewhat mitigate this concern. First, because ectopic expression of Myo10 causes an ~30x increase in the number of filopodia, the activated Myo10 population is divided over that larger filopodial population. Second, the log-normal distribution of Myo10 across filopodia has a long tail, which means that some cells with low levels of Myo10 will concentrate that Myo10 in a few filopodia. 

      In response to comments about the bulbous nature of many filopodia tips the authors point out that similar-looking tips are seen when cells are immunostained for Myo10, citing Berg & Cheney (2002). In looking at those images as well as images from papers examining Myo10 immunostaining in metastatic cancer cells (Arjonen et al, 2014, JCI; Summerbell et al, 2020, Sci Adv) the majority of the filopodia tips appear almost uniformly dot-like or circular. There is not too much evidence of the elongated, bulbous filopodial tips seen here.

      Yes, the tips in Berg and Cheney are circular, but their size varies considerably (just as a balloon is roughly circular, its size varies with the amount of air it contains). Non-bulbous filopodial tips have a theoretical radius of ~100 nm, which is below the diffraction limit. However, many of the filopodial tips are larger than the diffraction limit in Berg and Cheney, Fig. 1a. We cropped and zoomed in the images to show each fully visible filopodial tip

      We attempted to perform a similar analysis of the images in Arjonen and Summerbell. Unfortunately, their images are too small to do so. 

      However, in reconsidering the approach and results, it is the case that the finding here do establish the plasticity of filopodia tips that can accommodate a surprisingly (shockingly) large number of motors. The authors discuss that their results show that targeting molecules to the filopodia tip is a relatively permissive process (lines 262 - 274). That could be an important property that cells might be able to use to their advantage in certain contexts. 

      (2) The method for arriving at the intensity of an individual filopodium puncta (starting on line 532 and provided in the Response), and how this is corrected for transfection efficiency and the cell-to-cell variation in expression level is still not clear to this reviewer. The first part of the description makes sense - the authors obtain total molecules/cell based on the estimation on SDS-PAGE using the signal from bound Halo ligand. It then seems that the total fluorescence intensity of each expressing cell analyzed is measured, then summed to get the average intensity/cell. The 'total pool' is then arrived at by multiplying the number of molecules/cell (from SDS-PAGE) by the total number of cells analyzed. After that, then: 'to get the number of molecules within a Myo10 filopodium, the filopodium intensity was divided by the bioreplicate signal intensity and multiplied by 'total pool.' ' The meaning of this may seem simple or straightforward to the authors, but it's a bit confusing to understand what the 'bioreplicate signal intensity' is and then why it would be multiplied by the 'total pool'. This part is rather puzzling at first read.

      We agree, such information is critical. We have now revised this description with more precise terms and have included a formula on pg. 20.

      Since the approach described here leads the authors to their numerical estimates every effort should be made to have it be readily understood by all readers. A flow chart or diagram might be helpful. 

      We have added a diagram of the calculations to the supplemental material (Figure 1—figure supplement 3). We hope that both changes will make it easier for others to follow our work.

      (3) The distribution of Myo10 punctae around the cell are analyzed (Fig 2E, F) and the authors state that they detect 'periodic stretches of higher Myo10 density along the plasma membrane' (line 123) and also that there is correlation and anti-correlation of molecules and punctae at opposite ends of the cells. 

      In the first case, it is hard to know what the authors really mean by the phrase 'periodic stretches'. It's not easy to see a periodicity in the distribution of the punctae in the many cells shown in Supp Fig 3. Also, the correlation/anti-correlation is not so easily seen in the quantification shown in Fig 2F. Can the authors provide some support or clarification for what they are stating? 

      The periodic pattern that we refer to is most apparent in the middle panels of Fig. 2E, F. These panels show the density of Myo10 puncta. These puncta numbers closely correspond to filopodia counts, with the caveat that some filopodia might have multiple puncta. This periodic density might not be as apparent in the raw data shown in Supp. Fig. 3. We have therefore rewritten this paragraph to clarify our observations (pg. 6).

      (4) The authors are no doubt aware that a paper from the Tyska lab that employs a completely different method of counting molecules arrives at a much lower number of Myo10 molecules at the filopodial tip than is reported here was just posted (Fitz & Tyska, 2024, bioRxiv, DOI: 10.1101/2024.05.14.593924). 

      While it is not absolutely necessary for the authors to provide a detailed discussion of this new work given the timing, they may wish to consider adding a note briefly addressing it. 

      We are aware of this manuscript and that it uses a different approach for calibrating the fluorescence signal in microscopy. However, we are not comfortable commenting on that manuscript at this time, given that it has not yet been peer reviewed with the chance for author revisions.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript the authors are now presenting does not comply with the formatting limits of a Short report, but it is instead presented as a full article type. I believe the authors could shorten the Discussion, and meet the criteria for a more appropriate Short Report format. 

      For instance, I continue to believe that the study of truncation variants could sustain the claim that membrane binding represents the driving force that leads to Myo10 accumulation. I understand the authors want to address these mechanisms in a follow-up story, for this reason, I encourage them to shorten the discussion, which seems unnecessarily long for a technique-based manuscript.

      In the first round of review, Reviewer 3 asked us to expand the discussion. Given that, we are happy with where we have landed on the length of the discussion.

      Figure 2, could include some images to facilitate the readers on the different messages of the two rose plots E and F, by picking one of the examples from the supplementary Figure 3 

      We have now added a supplemental figure showing an example cell (Fig. 2 figure supplement 2). But please note that the averaging of ~150 cells (Fig. 2E, F) should be more reliable to show these overall trends.

      Reviewer #2 (Recommendations For The Authors): 

      Also, the speculation about how fascin can outcompete Myo10 should include a mechanism for how the physiological levels of fascin can complete with the overabundance of Myo10 (page 10, lines 401-408). 

      As noted above, we have now clarified this point. 

      Reviewer #3 (Recommendations For The Authors): 

      line 495 - what is GOC? 

      We have now defined this oxygen scavenger system in the main text.

      lines 603/604 - it is stated that 'velocity analysis does not only account for Myo10 punctum that moved away from the starting point of the trajectory.' It's not clear what this really means. 

      The sentence now reads: "For Figure 4 parts G-H, note that velocity analysis includes a few Myo10 puncta that switch direction within a single trajectory (e.g., a retracting punctum that then elongates)."

      References #4 and #14 are the same. 

      Thank you for catching that; it has now been corrected.

      Fig 1C - the plot for signal intensity versus fmol of protein has numbers for the standard and then live and fixed cells. While the R2 value is quite good, it seems a bit odd that the three (?) data points for live cells are all quite small relative to the fixed cells and all bunched together at the left side of the plot. 

      As mentioned in the main text, the time post-transfection has a noticeable effect on the level of Myo10 expression. The three fixed-cell bioreplicates had higher Myo10 expression because they were analyzed 48 hours post-transfection compared to the three live-cell bioreplicates (24 hours). Therefore, the fixed cell data points are larger in value because they represent more molecules, and the live cell data points are on the left side of the plot because they represent fewer molecules.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Strengths: 

      The paper clearly presents the resource, including the testing of candidate enhancers identified from various insects in Drosophila. This cross-species analysis, and the inherent suggestion that training datasets generated in flies can predict a cis-regulatory activity in distant insects, is interesting. While I can not be sure this approach will prevail in the future, for example with approaches that leverage the prediction of TF binding motifs, the SCRMShaw tool is certainly useful and worth consideration for the large community of genome scientists working on insects. 

      We thank the reviewer for the positive comments, and would just like to point out that we agree: while we cannot of course know if other methods will overtake SCRMshaw for enhancer prediction—we assume they will, at some point (although motif-based approaches have not fared as well in the past)—for now, SCRMshaw provides strong performance and is a useful part of the current toolkit.

      Weaknesses: 

      While the authors made the effort to provide access to the SCRMShaw annotations via the RedFly database, the usefulness of this resource is somewhat limited at the moment. First, it is possible to generate tables of annotated elements with coordinates, but it would be more useful to allow downloads of the 33 genome annotations in GFF (or equivalent) format, with SCRMshaw predictions appearing as a new feature. Also, I should note that unlike most species some annotations seem to have issues in the current RedFly implementation. For example, Vcar and Jcoen turn empty. 

      We have addressed these weaknesses in several ways:

      (1) We have created GFF versions of the SCRMshaw predictions and provide them standalone and also merged into the available annotation GFFs for each of the 33 species

      (2) We have made these GFF files, and also the original SCRMshaw output files, available for download in a Dryad repository linked to the publication (https://doi.org/10.5061/dryad.3j9kd51t0).

      (3) We have added the inadvertently omitted species to the REDfly/SCRMshaw database.

      We agree that the database functions are still somewhat limited, but note that database development is ongoing and we expect functionality to increase over time. In the meantime, the Dryad repository ensures that all results reported in this paper are directly available.

      Reviewer #2 (Public Review): 

      Summary: 

      … Upon identification of predicted enhancer regions, the authors perform post-processing step filtering and identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene. …

      We respectfully point out a small misunderstanding here on the part of the reviewer. We stress that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences, i.e., they are not “based on the proximity of an orthologous target gene.” Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We have added language on page 8 (lines 189ff) to make this more clear in the text.

      Weaknesses:

      This work provides predicted enhancer annotations across many insect species, with reporter gene analysis being conducted on selected regions to test the predictions. However, the code for the SCRMshaw analysis pipeline used in this work is not made available, making reproducibility of this work difficult. Additionally, while the authors claim the predicted enhancers are available within the REDfly database, the predicted enhancer coordinates are currently not downloadable as Supplementary Material or from a linked resource. 

      We have placed all the code for this paper into a GitHub repository “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife) to address this concern. As described in our response to Reviewer 1, above, all results are now available in multiple formats in a linked Dryad repository in addition to the REDfly/SCRMshaw database.

      The authors do not validate or benchmark the application of SCRMshaw against other published methods, nor do they seek to apply SCRMshaw under a variety of conditions to confirm the robustness of the returned predicted enhancers across species. Since SCRMshaw relies on an established k-mer enrichment of the training loci, its performance is presumably highly sensitive to the selection of training regions as well as the statistical power of the given k-mer counts. The authors do not justify their selection of training regions by which they perform predictions. 

      Our objective in this study was not to provide proof-of-principle for the SCRMshaw method, as we have established the efficacy of the approach at this point in several previous publications. Rather, the objective here was to make use of SCRMshaw to provide an annotation resource for insect regulatory genomics. Note that the training regions we used here are the same as those we have used in earlier work. Naturally, we performed various assessments to establish that the method was working here, but we make no claims in this work about SCRMshaw’s relative efficiency compared to other methods. Some of our prior publications include assessments of the sort the reviewer references, which suggest that SCRMshaw is at least comparable to other enhancer discovery approaches. We note that benchmarking of such methods is in fact extremely complicated due to the fact that there are no established true positive/true negative data sets against which to benchmark (we have explored this in Asma et al. 2019 BMC Bioinformatics).

      While there is an attempt made to report and validate the annotated predicted enhancers using previously published data and tools, the validation lacks the depth to conclude with confidence that the predicted set of regions across each species is of high quality. In vivo, reporter assays were conducted to anecdotally confirm the validity of a few selected regions experimentally, but even these results are difficult to interpret. There is no large-scale attempt to assess the conservation of enhancer function across all annotated species. 

      We respectfully disagree that there is insufficient validation. We bring several different lines of evidence to bear suggesting that our results fall into the accuracy range—roughly 75%—established both here and in previous work. We are also clear about the fact that these are predictions only and need to be viewed as such (e.g. line 638). Although “large-scale” in vivo validation assays would certainly be both interesting and worthwhile, the necessary resources for such an assessment places it beyond our present capability.

      Lastly, it is suggested that predicted regions are derived from the shared presence of sequence features such as transcription factor binding motifs, detected through k-mer enrichment via SCRMshaw. This assumption has not been examined, although there are public motif discovery tools that would be appropriate to discover whether SCRMshaw is assigning predicted regions based on previously understood motif grammar, or due to other sequence patterns captured by k-mer count distributions. Understanding the sequence-derived nature of what drives predictions is within the scope of this work and would boost confidence in the predicted enhancers, even if it is limited to a few training examples for the sake of clarity of interpretation. 

      Again, we respectfully disagree that “this assumption has not been examined.” Although we did not undertake this analysis here, we have in the past, where we have shown that known TFBS motifs can be recovered from sets of SCRMshaw predictions (e.g., Kazemian et al. 2014 Genome Biology and Evolution). We return to this point when we address the Comments to Authors, below.

      Reviewer #3 (Public Review): 

      Weaknesses:  

      The rates of predicted true positive enhancer identification vary widely across the genomes included here based on the simulations and comparison to datasets of accessible chromatin in a manner that doesn't map neatly onto phylogenetic distance. At this point, it is unclear why these patterns may arise, although this may become more clear as regulatory annotation is undertaken for more genomes. 

      We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.

      Functional assessment of predicted enhancers was performed through reporter gene assays primarily in Drosophila melanogaster imaginal discs, a system amenable to transgenics. Unfortunately, this mode of canonical imaginal disc development is only representative of a subset of all holometabolous insects; therefore, it is difficult to interpret reporter gene expression in a fly imaginal disc as evidence of a true positive enhancer that would be active in its native species whose adult appendages develop differently through the larval stage (for example, Coleopteran and Lepidopteran legs). However, the reporter gene assays from other tissues do offer strong evidence of true positive enhancer detection, and constraints on transgenic experiments in other systems mean that this approach is the best available. 

      Please see an extensive discussion of this point in our response to Reviewer 3, below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Major Concerns: 

      (1) While the GitHub source code for SCRMshaw is provided, the authors do not provide a repository of manuscriptspecific code and scripts for readers. This is a barrier to reproducibility and the code used to perform the analysis should be made available. Additionally, links to available scripts do not work, see Line 690. Post-processing scripts point to a general lab folder, but again, no specific analysis or code is sourced for the work in this specific manuscript (e.g. Line 637). 

      As noted above, we have corrected this oversight and established a specific GitHub repository for this manuscript “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife). 

      (2) On lines 479-488, there is a discussion about the annotations being provided on REDfly, though no link is provided. 

      We have included a link in the text at this point (now line 515).

      Additionally, for transparency, it would be valuable to provide in Supplementary Table 1 the genomic coordinates of the original training sets in addition to their identity. 

      These coordinates have been added to Supplementary Table 1 as suggested.

      Also, it is suggested to provide genomic coordinates of the predicted enhancers for each training set across all species, perhaps with a column denoting a linked ID of one genomic coordinate in a species to another species (i.e. if there is a linked region found from D. melanogaster to J. coenia, labeling this column in both coordinate sets as blastoderm.mapping1_region1). Providing these annotations directly in the work enhances the transparency of the results. 

      We are unsure exactly what the reviewer means here by “a linked region.” It is critical to understanding our approach to recognize that the genome sequences have diverged to the point where there is no alignment of non-coding regions possible. Thus there is no way to directly “link” coordinates of a predicted enhancer from one species to those of a predicted enhancer in another species. The coordinates for each prediction are available on a per-species basis either through the database or in the files now available in the linked Dryad repository; these can be filtered for results from a specific training set. The database will allow users to select all results for a given orthologous locus, from any subset of species. More complex searches will continue to become available as we improve functionality of the database, an ongoing project in collaboration with the REDfly team.

      (3) Figure 2B: It is unclear what this figure shows. Are the No Fly Orthologs false positives, Orthology pipeline issues, or interesting biology? 

      We have clarified this in the Figure 2 legend. “No Mapped Fly Orthologs” indicates that our orthology mapping pipeline did not identify clear D. melanogaster orthologs. For any given gene, this could reflect either a true lack of a respective ortholog, or failure of our procedure to accurately identify an existing ortholog.

      (4) SCRMshaw appears to be a versatile tool, previously published in a variety of works. However, in this manuscript, there is little discussion of the sensitivity of SCRMshaw to different initial parameters, how the selection of training loci can impact outcomes, or how SCRMshaw k-mer discovery methods compare to other similar tools.

      - This paper would be strengthened by addressing this weakness. Some specific suggestions below: 

      In order to strengthen confidence that SCRMshaw is a reliable predictor of enhancer regions in other species, it is suggested that you benchmark against other k-mer-derived methods to assign enhancers, such as GSK-SVM developed by the Beer Lab in 2016  (https://www.beerlab.org/gkmsvm/, https://www.biorxiv.org/content/10.1101/2023.10.06.561128v1). 

      We have established the effectiveness of SCRMshaw as an enhancer discovery method in previous work, and the main goal of this study was to make use of the established method to annotate numerous insect genomes as a community resource. Our claim here is that SCRMshaw works well for this purpose; we do not attempt a strong claim about whether other approaches may work equally well or marginally better (although we do not believe this is the case, based on prior work). Benchmarking enhancer discovery is challenging, as we point out in Asma et al. 2019 (BMC Bioinformatics), and, while important, best left for a dedicated comprehensive study. A major problem is that there are no independent objective “truth” sets for enhancers from the various species we interrogate here. Thus, while we could also run, e.g., GSK-SVM, what criteria would we use to establish which method had better accuracy for a given species? Note that the work from Beer’s lab took advantage of the ability to match human-mouse orthologous (or syntenic) regions and available open-chromatin data to assess whether conserved enhancers were discovered, but this is not possible given the degree of divergence, limited synteny, and relative lack of additional data for the insect genomes we are annotating.

      - In Table S1, we see that 7-146 regions are used as training sets, which is a huge variety. Does an increase in training set size provide a greater "rate of return" for predicted regions? Is the opposite true? Addressing this question would allow readers to understand if they wish to use SCRMshaw, a reasonable scope for their own training region selections. 

      - Within a training set, does subsampling provide the same outcomes in terms of prediction rates? There is no exploration of how "brittle" the training sets are, and whether the generalized k-mer count distributions that are established in a training set are consistent across randomly selected subgroups. Performing this analysis would raise confidence in the method applied and the resulting annotations. 

      These are interesting and important questions, but again we feel they are beyond the scope of this particular study, which is focused primarily on using SCRMshaw and not on optimizing various search parameters. That said, this is of course something we have investigated, although as with other aspects of enhancer discovery, the absence of a true gold standard enhancer set makes evaluation difficult. We have not found a clear correlation between training set size and performance beyond the very general finding that performance appears to be best when training set size is moderate, e.g. 20-40 initial enhancers. We suspect that larger training sets often contain too many members that don’t fit the core regulatory model and thus add noise, whereas sets that are too small may not contain enough signal for best performance (although small sets can still be useful, especially if used in an iterative cycle; see Weinstein et al. 2023 PLoS Genetics). However, establishing this rigorously is highly challenging given the limitations with assessing true and false positive rates at scale.

      (5) In Figure 2C, when plotting hexMCD, IMM, pacRC, and then the merged set, it is unclear whether the scorespecific bar allows coordinate redundancy, though this is implied. What might be more useful is a revision of this plot where the hexMCD/IMM/pac-RC-specific loci are plotted, with the merged set alongside as is currently reported. This would give the reader a clearer understanding of the variability between these scoring methods and why this variability occurs. 

      We have added the breakdowns between IMM, hexMCD, and pacRC in Supplementary Table S2, and made more complete reference to this in the text (lines 682ff). Both the database and the data files in the Dryad repository allow exploration of the overlap between the different methods and contain both separate and merged (for overlap and redundancy) results.

      Additionally, there is no information in the Methods section of these three SCRMshaw scores and what they represent, even colloquially. While SCRMshaw has been applied in several papers previously, it would help with scientific clarity to describe in a sentence or two what each score is meant to represent and why one is different from another. 

      We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.

      (6) When describing results in Figure 2, an important question arises: "Is there an anti-correlation between the number of predicted regions and evolutionary distance?" This would be an expected result that could complement Figure 4's point that shared orthology across 16 species is rarer than across 10 species. Visualizing and adding this to Figure 2 or Figure 4 would be a powerful statement that would boost confidence in the returned predicted enhancers and/or orthologous regions. 

      This is an important question and one in which we are very interested. Unfortunately, we do not have sufficient data at this time to address this proper statistical rigor. As we remarked above in response to Reviewer 3, “We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.”

      (7) In Figure 3, the authors seek to convey that SCRMshaw predicts enhancer regions that are mapped nearby one another, across different loci widths, and that this occurrence of nearby predicted regions occurs more than a randomly selected control. This is presumably meant to validate that SCRMshaw is not providing predictions with low specificity, but rather to highlight the possibility that SCRMshaw is identifying groups of shadow enhancers. However, these plots are extremely difficult to decipher and do not strongly support the claims due to the low resolution and difficult interpretability of the boxplot interquartile distributions.

      Additionally, as the majority of predicted regions are around ~750bp, how does that address loci groups of <1000bp? This suggests that predicted regions are overlapping, and therefore cannot be meaningfully interpreted as shadow enhancers. This plot should either be moved to the supplements or reworked to more effectively convey the point that "SCRMshaw is detecting predicted regions that are proximal to one another and that this proximity is not due to chance". 

      - A suggestion to rework this plot is to change this instead to a bar plot, where the y-axis instead represents "number of predictions with at least 2 predicted regions proximal to one another" divided by "total number of predictions", separating bar color by simulated/observed values. The x-axis grouping can remain the same. Because this plot is a broad generalization of the statement you're trying to make above, knowing whether a few loci have 2 versus 4 proximal predicted enhancers doesn't enhance your point. 

      We agree with the reviewer that these are not the clearest plots, and thank them for the suggestions regarding revision. We tried many variations on visualizing these complex data, including those suggested by the reviewer, and have concluded that despite their weaknesses, these plots are still the best visualization. The main problem is that the observed data cluster heavily around zero, so that the box plots are very squat and mainly only the outlier large values are observed. The key point, however, is that the expected values almost never give values much greater than one, so that the observed outlier points are the only points seen in the upper ranges of the y-axis. This is true across the three species, across the bins of locus sizes, and across training sets (averaged into the box plots). The reviewer is correct as well about the bins where locus size is < 1000. However, inspection of the data shows that this is not a large concern, as very few data points lie in this range and we never see multiple predicted enhancers there. Thus we believe while not the prettiest of graphs, Figure 3 does effectively support the claims made in the text. In keeping with our view that it is preferable to have data in the main paper whenever possible, we choose to keep the figure in place rather than move it to the Supplement.

      - Label the species for the reader's understanding of each subplot on the plot. 

      We apologize for this oversight and have now labeled each plot with its relevant species.

      (8) SCRMshaw operates on k-mer count distributions compared to a genomic background across different species, allowing it to assign predicted regions without prior knowledge of an organism's cis-regulatory sequences. This is powerful and boosts the versatility of the method. However, understanding the cis-regulatory origins of the kinds of kmers that are driving the detection of orthologous regions across species is crucial and absolutely within the scope of the paper, particularly for the justification of the provided annotations. Is SCRMshaw making use of enriched motifs within the training region set to assign regions in other species? One would presume so, but it is necessary to show this. There are many motif discovery tools that are readily available and require little up-front knowledge and little to no use of a CLI, such as MEMESuite (https://meme-suite.org/meme/tools/meme). It is highly recommended that, even for a few training pairs that are well understood (e.g. mesoderm.mapping1, dorsal_ectoderm.mapping1), assess the motif enrichment within the original sequence set, then see whether motif enrichments are reflected in the predicted enhancers. As evolutionary distance increases between D. melanogaster and the species of interest, is the assignment of enriched motifs more sparse? Is there a loss of a key motif? These are the kinds of questions that will allow readers to understand how these annotations are assigned as well as boost confidence in their usage. 

      This is a very important point and a subject of significant interest to us. We have demonstrated in earlier work (e.g., Kazemian et al. 2014 Genome Biol. Evol.) that SCRMshaw-predicted enhancers do contain expected TFBS motifs, across multiple species—and that even an overall arrangement of sites is sometimes conserved. Thus we have previously answered, in part, the reviewer’s question. 

      What we also learned from our previous work is that filtering out relevant motifs from the noise inherent in motif-finding is both arduous and challenging. As the reviewer is no doubt aware, while using motif discovery tools is simple, interpreting the output is much less so. In response to the reviewer’s comments, we revisited this issue with data from a small sample of training sets. We can discover motifs; we can see that the motif profiles are different between different training sets; and we can observe the presence of expected motifs based on the activity profile of the enhancers (e.g., Single-minded binding sites in our mesectoderm/midline training and result data). However, to do this cleanly and with appropriate statistical rigor is beyond what we feel would be practical for this paper. We hope to return to this important question in the future when we have a larger and phylogenetically more evenly-distributed set of species, and the time and resources to address it appropriately.

      (9) Figures 5-7 need to have better descriptions. 

      We have added to the figure 6 and 7 legends in response to this comment; please note as well that there is substantial detail provided in the text. If there are specific aspects of the figures that are not clear or which lack sufficient description, we are happy to make additional changes.

      Minor Concerns 

      (1)  In Figure 1A, it is implied that "k-mer count distributions" are actually only "5-mer count distributions". However, in the published documentation of SCRMshaw, it is suggested that k-mers between 1-6 bp are involved in establishing sequence distributions. Please add a justification for the selection of these criteria. It would be helpful to understand the implications of using up to a 3-mer versus a 12-mer when assessing k-mer counts using SCRMshaw.

      We have clarified in the Figure 1 legend that this is just an example, and the k-mers of different sizes are used in the IMM method; we have also increased the description of the basic method in the Methods section. To be clear, the hexMCD sub-method is 6-mer based (5th-order Markov chain), as is pacRC, while the IMM method considers Markov chains of orders 0-5.

      (2) Control the y-axis to remove white space from Figure 2D. 

      We have amended the figure as suggested.

      Additionally, expand in the manuscript on expected results from SCRMshaw. Given training regions of 750 bp, is the expectation that you return predicted enhancers of the same length? This is not explicitly stated, only a description of outliers. 

      The scoring is not dependent on the length of the training sequences, and there is no direct expectation of predicted enhancer length. Scores are calculated on 10-bp intervals, and a peak-calling algorithm is used to determine the endpoints of each prediction based on where the scores drop below a cutoff value. Thus there is no explicit minimum prediction length beyond the smallest possible length of 10-bp. That said, the initial scoring takes place over a 500-bp sequence window (for reasons of computational efficiency), which does influence scores away from the smaller end of the possible range. We correct for this in part by reducing scores below a certain threshold to zero, to prevent multiple low-scoring regions from combining to give a low but positive score over a long interval. Indeed, we found that in the original version of SCRMshawHD (Asma et al. 2019), multiple low-scoring but above-threshold intervals would get concatenated together in broad peaks, leading to an unrealistically large average prediction length. In the version used here, described in Supplementary Figure S6, low-scoring windows are now first reset to zero and a new threshold is calculated before overlapping scores are summed. This helps to prevent the broad peak problem, and we find that it results in a median prediction length ~750 bp, more in line with expected enhancer sizes.

      Reviewer #3 (Recommendations For The Authors): 

      Line 161: Given that the SCRMshaw HD method is the basis for the pipeline, the methodology deserves at least an "in brief" recapitulation in this manuscript. 

      As we remark in our response to Reviewer 2, above, “We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.” 

      Line 219: Throughout the reporting of the results, there appeared to be a bit of inconsistency/potential typos regarding whether threshold or exact P values were reported. In lines 219, 222, 265, 696, and 811, the reported values seem to clearly be thresholds (< a standard cutoff), while in lines 291,293, 297,300, values appear to be exact but are reported as thresholds (<). 

      This is not an error but rather reflects two different types of analysis. The predictions per locus (originally lines 219, 222 etc) are evaluated using an empirical P-value based on 1000 permutations. As such, they are thresholded at 1/1000. The overlap with open chromatin regions, on the other hand, are based on a z-score with the P-values taken from a standard conversion of z-scores to P-values.

      Page 13/Table 2: At face value, it seems surprising that the overlap between Dmel SCRMshaw predictions with open chromatin is so much smaller than the overlap between predictions and open chromatin in other species, both in raw % (Tcas, D plexippus, H. himera) and fold enrichment (Tcas), given that the training sets for SCRMshaw are all derived from Dmel data. The discussion here does not touch on this aspect of the results, and the interpretation of this approach, in general, would be strengthened if the authors could comment on potential reasons why this pattern may be arising here, or at least acknowledge that this is an open question.

      There are many variables at play here, as the data are from different species, from different tissues, and from different methods. Thus we think it is difficult to read too much into the precise results from these comparisons—the main take-home is really just that there is a significant amount of overlap. In acknowledgment of this, we have slightly modified the text in this section so that it now notes (line 302ff): “These comparisons are imperfect, as the tissues used to obtain the chromatin data do not precisely correspond to the training sequences used for SCRMshaw, and the data were obtained using a variety of methods.”

      Line 318-329: The inferences from the reporter gene assay deserve a more nuanced treatment than they are given here. The important nuance that was not addressed by the discussion here is that the imaginal disc mode of development in Drosophila is not broadly representative of the development of larval/adult epithelial tissues across Holometabola; thus, inference of a true positive validation becomes complicated in cases where predicted enhancers from a species were tested and shown to drive expression in a fly imaginal disc that the native species have no direct disc counterpart to. For example, in line 388 a Tcas enhancer is reported to drive expression in the eye-antennal disc, and in lines 404 and 423 additional Tcas enhancers were reported to drive expression in the leg discs; however, Tribolium larvae do not possess antennal discs or leg discs set aside during embryogenesis in the sense that flies do - instead the homologous epithelial tissues form larval antennae and larval legs external to the body wall that are actively used at this life stage and are starkly different in morphology than an internally invaginated epithelial disc, that will directly give rise to adult tissues in subsequent molts. Is the interpretation of an expression pattern driven in a fly disc as a true positive really as straightforward as it was presented here, when in the native species the expression pattern driven by the enhancer in question would be in the context of an extremely different tissue morphology? That said, I understand and am deeply sympathetic to the constraints on the authors in performing transgenic experiments outside of the model fly; but these divergent modes of development across Holometabola deserve a mention and nuance in the interpretation here. 

      This is indeed a very important point, and we greatly appreciate Reviewer 3 pointing out this caveat when interpreting the outcomes of our cross-species reporter assay. Reviewer 3 is correct that the imaginal disc mode of adult tissue (i.e. imaginal) development found in Diptera does not represent the imaginal development across Holometabola. 

      In fact, imaginal development is quite diverse among Holometabola. For instance, larval leg and antennal cells appear to directly develop into the adult legs and antennae in Coleoptera (i.e. primordial imaginal cells function as larval appendage cells), while some cells within the larval legs and antennae are set aside during larval development specifically for adult appendages in Lepidopteran species (i.e. imaginal cells exist within the larval appendages but do not contribute to the formation of larval appendages). In contrast, an almost entire set of cells that develop into adult epithelia are set aside as imaginal discs during embryogenesis in Diptera. Furthermore, the imaginal disc mode of development appears to have evolved independently in

      Hymenoptera. Therefore, determining how imaginal primordial tissues correspond to each other among Holometabola has been a challenging task and a topic of high interest within the evo-devo and entomology communities.

      Nevertheless, despite these differences in mode of imaginal development, decades of evo-devo studies suggest that the gene regulatory networks (GRNs) operating in imaginal primordial tissues appear to be fairly well conserved among holometabolan species (for example, see Tomoyasu et al. 2009 regarding wing development and Angelini et al. 2012 regarding leg development between flies and beetles). These outcomes imply that a significant portion of the transcriptional landscape might be conserved across different modes of imaginal development. Therefore, an enhancer functioning in the Tribolium larval leg tissue (which also functions as adult leg primordium) could be active even in the leg imaginal disc of Drosophila, if the trans factors essential for the activation of the enhancer are conserved between the two imaginal tissues. 

      That being said, we fully expect there to be both false negative and false positive results in our cross-species reporter assay. We are optimistic about the biological relevance of the positive outcomes of our crossspecies reporter assay, especially when the enhancer activity recapitulates the expression of the corresponding gene in Drosophila (for example, Am_ex Fig6B and Tc_hth Fig7B). Nonetheless, the biological relevance of these enhancer activities needs to be further verified in the native species through reporter assays, enhancer knock-outs, or similar experiments.

      In recognition of the Reviewer’s important point, we added the following caveat in our Discussion (lines 549553): “Furthermore, the unique imaginal disc mode of adult epithelial development in D. melanogaster  might have prevented some enhancers of other species from working properly in D. melanogaster imaginal discs, likely producing additional false negative results. Evaluating enhancer activities in the native species will allow us to address the degree of false negatives produced by the cross-species setting.” We moreover mention this caveat in the Results section when we first introduce the reporter assays (line 342).

      Line 580: This is the first time that the weakness of the closest-gene pairing approach is mentioned. This deserves mention earlier in the manuscript, as unfortunately, this is one of the major bottlenecks to this and any other approaches to investigating enhancer function. Could the authors address this earlier, perhaps pages 7-8, and provide citations for current understanding in the field of how often closest-gene pairing approaches correctly match enhancers to target genes? 

      We have added text as suggested on p.7-8 acknowledging the shortcomings of the closest-gene approach. We also clarify at the end of that section (lines 173-181) that target gene assignments, while useful for interpretation, have no bearing on the enhancer predictions themselves (which are generated prior to the target gene assignment steps).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The additional data included in this revision nicely strengthens the major claim.

      I apologize that my comment about K+ concentration in the prior review was unclear. The cryoEM structure of KCNQ1 with S4 in the resting state was obtained with lowered K+ relative to the active state. Throughout the results and discussion it seems implied that the change in voltage sensor state is somehow causative of the change in selectivity filter state while the paper that identified the structures attributes the change in selectivity filter state not to voltage sensors, but to the change in [K+] between the 2 structures. Unless there is a flaw in my understanding of the conditions in which the selectivity filter structures used in modeling were generated, it seems misleading to ignore the change in [K+] when referring to the activated vs resting or up vs down structures. My understanding is that the closed conformation adopted in the resting/low [K+] is similar to that observed in low [K+] previously and is more commonly associated with [K+]-dependent inactivation, not resulting from voltage sensor deactivation as implied here. The original article presenting the low [K+] structure also suggests this. When discussing conformational changes in the selectivity filter, I strongly suggest referring to these structures as activated/high [K+] vs resting/low [K+] or something similar, as the [K+] concentration is a salient variable.

      There seems to be some major confusion here and we will try to explain how we think. Note that in the Mandela and MacKinnon paper, there is no significant difference in the amino acid positions in the selectivity filter between low and high K+ when S4 is in the activated position (See Mandala and Mackinnon, PNAS Suppl. Fig S5 C and D). There are only fewer K+ in the selectivity filter in low K+. So, the structure with the distorted selectivity filter is not due to low K+ by itself. Note that there is no real difference between macroscopic currents recorded in low and high K+ solutions (except what is expected from changes in driving force) for KCNQ1/KCNE1 channels (Larsen et al., Bioph J 2011), suggesting that low K+ do not promote the non-conductive state (Figure 1). We now include a section in the Discussion about high/low K+ in the structures and the absence of effects of K+ on the function of KCNQ1/KCNE1 channels.

      Author response image 1.

      Macroscopic KCNQ1/KCNE1 currents recorded in different K+ conditions.  Note that there is no difference between current recorded in low K+ (2 mM) conditions and high (96 mM) K+ conditions (n=3 oocytes). Currents were normalized in respect to high K+.

      Note also that, in the previous version of the manuscript, we did not propose that the position of S4 is what determines the state of the selectivity filter. We only reported that the CryoEM structure with S4 resting shows a distorted selectivity filter. It seems like our text confused the reviewer to think that we proposed that S4 determines the state of the selectivity filter, when we did not propose this earlier. We previously did not want to speculate too much about this, but we have now included a section in the Discussion to make our view clear in light of the confusion of the reviewers.

      It is clear from our data that the majority of sweeps are empty (which we assume is with S4 up), suggesting that the selectivity filter can be (and is in the majority of sweeps) in the non-conducting state even with S4 up.  We think that the selectivity filter switches between a non-conductive and a conductive conformation both with S4 down and with S4 up. The cryoEM structure in low K+ and S4 down just happened to catch the non-conductive state of the selectivity filter.  We have now added a section in the Discussion to clarify all this and explain how we think it works.

      However, S4 in the active conformation seems to stabilize the conductive conformation of the selectivity filter, because during long pulses the channel seems to stay open once opened (See Suppl Fig S2). So, one possibility is that the selectivity filter goes more readily into the non-conductive state when S4 is down (and maybe, or not, low K+ plays a role) and then when S4 moves up the selectivity filter sometimes recovers into the conductive state and stays there. We now have included a section in the Discussion to present our view. Since this whole discussion was initiated and pushed by the reviewer, we hope that the reviewers will not demand more data to support these ideas. We think that this addition makes sense since other readers might have the same questions and ideas as the reviewer, and we would like to prevent any confusion about this topic.

      Figure 1

      It remains unclear in the manuscript itself what "control" refers to. Are control patched the same patches that later receive LG?

      Yes, the control means the same patch before LG. We now indicate that in legends and text throughout.

      Supplementary Figure S1

      Unclear if any changes occur after addition of LG in left panel and if the LG data on right is paired in any way to data on left.

      Yes, in all cases the left and right panel in all figures are from the same patch. We now indicate that in legends and text throughout.

      The letter p is used both to represent open probability open probability from the all-point amplitude histogram and as a p-value statistical probability indicator sometime lower case, sometimes upper case. This was confusing.

      We have now exclusively use lower case p for statistical probability and Po for open probability.

      "This indicates that mutations of residues in the more intracellular region of the selectivity filter do not affect the Gmax increases and that the interactions that stabilize the channel involve only residues located near the external region part of the selectivity filter. "

      Seems too strongly worded, it remains possible that mutations of other residues in the more intracellular region of the selectivity filter could affect the Gmax increases.

      We have changed the text to: "Mutations of residues in the more intracellular region of the selectivity filter do not affect the Gmax increases, as if the interactions that stabilize the channel involve residues located near the external region part of the selectivity filter. "

      Supplementary Figure S7

      Please report Boltzmann fit parameters. What are "normalized" uA?

      We removed the uA, which was mistakenly inserted. The lines in the graphs are just lines connecting the dots and not Boltzmann fits, since we don’t have saturating curves in all panels to make unique fits.

      "We have previously shown that the effects of PUFAs on IKs channels involve the binding of PUFAs to two independent sites." Was binding to the sites actually shown? Suggest changing to: "We have previously proposed models in which the effects of PUFAs..."

      We have now changed this as the Reviewer suggested: " We have previously proposed models in which the effects of PUFAs on IKs channels involve the binding of PUFAs to two independent sites."

      Statistics used not always clear. Methods refer to multiple statistical tests but it is not clear which is used when.

      We use two different tests and it is now explained in figure legends when either was used.

      n values confusing. Sometimes # of sweeps used as n. Sometimes # patches used as n. In one instance "The average current during the single channel sweeps was increased by 2.3 {plus minus} 0.33 times (n = 4 patches, p =0.0006)" ...this sems a low p value for this n=4 sample?

      We have now more clearly indicated what n stands for in each case. There was an extra 0 in the p value, so now it is p = 0.006. Thanks for catching that error.

      Reviewer #2 (Recommendations For The Authors):

      I still have some comments for the revised manuscript.

      (1) (From the previous minor point #6) Since D317E and T309S did not show statistical significance in Figure 5A, the sentences such as "This data shows that Y315 and D317 are necessary for the ability of Lin-Glycine to increase Gmax" or "the effect of Lin-Glycine on Gmax of the KCNQ1/KCNE1 mutant was noticeably reduced compared to the WT channel showing the this residue contributes to the Gmax effect (Figure 5A)." may need to be toned down. Alternatively, I suggest the authors refer to Supplementary Figure S7 to confirm that Y315 and D317 are critical for increasing Gmax.

      We have redone the analysis and statistical evaluation in Fig 5. We no use the more appropriate value of the fitted Gmax (which use the whole dose response curve instead of only the 20 mM value) in the statistical evaluation and now Y315F and D317E are statistically different from wt.

      (2) Supplementary Fig. S1. All control diary plots include the green arrows to indicate the timing of lin-glycine (LG) application. It is a bit confusing why they are included. Is it to show that LG application did not have an immediate effect? Are the LG-free plots not available?

      Not sure what the Reviewer is asking about? In the previous review round the Reviewers asked specifically for this. The arrow shows when LG was applied and the plot on the right shows the effect of LG from the same patch.

      (3) The legend to Supplementary Figure S4, "The side chain of residues ... are highlighted as sticks and colored based on the atomic displacement values, from white to blue to red on a scale of 0 to 9 Å." They look mostly blue (or light blue). Which one is colored white? It might be better to use a different color code. It would also be nice to link the color code to the colors of Supplementary Figure S5, which currently uses a single color.

      We have removed “from white to blue to red on a scale of 0 to 9 Å” and instead now include a color scale directly in Fig S4 to show how much each atom moved based on the color.

      We feel it is not necessary to include color in Fig S5 since the scale of how much each atom moves is shown on the y axis.

      (4) Add unit (pA) to the y-axis of Supplementary Figure S2.

      pA has been added.

      Reviewer #3 (Recommendations For The Authors):

      Some issues on how data support conclusions are identified. Further justifications are suggested.

      186: “The decrease in first latency is most likely due to an effect of Lin-Glycine on Site I in the VSD and related to the shift in voltage dependence caused by Lin-Glycine." The results in Fig S1B do not seem to support this statement since the mutation Y315F in the pore helix seemed to have eliminated the effect of Lin-Glycine in reducing first latency. The authors may want to show that a mutation that eliminating Site I would eliminate the effect of Lin-Glycine on first latency. On the other hand, it will be also interesting to examine if another pore mutation, such as P320L (Fig 5) also reduce the effect of Lin-Glycine on first latency.

      These experiments are very hard and laborious, and we feel these are outside the scope of this paper which focuses on Site II and the mechanism of increasing Gmax. Further studies of the voltage shift and latency will have to be for a future study.

      The mutation D317E did not affect the effect of Lin-Glycine on Gmax significantly (Fig 5A, and Fig S7F comparing with Fig S7A), but the authors conclude that D317 is important for Lin-Glycine association. This conclusion needs a better justification.

      We have redone the analysis and statistical evaluation in Fig 5. We no use the more appropriate value of the fitted Gmax (which use the whole dose response curve instead of only the 20 mM value) in the statistical evaluation and now D317E is statistically different from wt

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary - This study was designed to investigate changes in gene expression and associated chromatin accessibility patterns in spermatogonia in mice at different postnatal stages from pups to adults. The objective was to describe dynamic changes in these patterns that potentially correlate with functional changes in spermatogonia as a function of development and reproductive maturation. The potential utility of this information is to serve as a reference against which similar data from animals subjected to various disruptive environmental influences can be compared.

      Major Strengths and Weaknesses of the Methods and Results - A strength of the study is that it reviews previously published datasets describing gene expression and chromatin accessibility patterns in mouse spermatogonia. A weakness of the study is that it is not clear what new information is provided by the data provided that was not already known from previously published studies (see below). Specific weaknesses include the following:

      • Terminology - in the Abstract and first part of the Introduction the authors use the generic term "spermatogonial cells" in a manner that seems to be referring primarily to spermatogonial stem cells (SSCs) but initially ignores the well-known heterogeneity among spermatogonia - particularly the fact that only a small proportion of developing spermatogonia become SSCs - and ONLY those SSCs and NOT other developing spermatogonia - support steady-state spermatogenesis by retaining the capacity to either self-renew or contribute to the differentiating spermatogenic lineage throughout the male reproductive lifespan. The authors eventually mention other types of developing male germ cells, but their description of prospermatogonial stages that precede spermatogonial stages is deficient in that M-prospermatogonia - which occur after PGCs but before T1-prospermatogonia - are not mentioned. This description also seems to imply that all T2-prospermatogonia give rise to SSCs which is far from the case. It is the case that prospermatogonia give rise to spermatogonia, but only a very small proportion of undifferentiated spermatogonia form the foundational SSCs and ONLY SSCs possess the capacity to either self-renew or give rise to sequential waves of spermatogenesis.

      We thank Reviewer 1 for the comments and clarifications. As suggested in the previous revision, we use the term spermatogonial cells (SPGs) to make it clear that our cell preparations do not exclusively contain SSCs but all SPGs since they derive from a FACS enrichment strategy. This is explained in the manuscript. Further, we conducted deconvolution analyses on the datasets to examine the composition of the enriched SPGs preparations and provide new sequencing information confirming the presence of SSCs and differentiating SPGs.

      • Introduction - Statements regarding distinguishing transcriptional signatures in spermatogonia at different postnatal stages appear to refer to ALL subtypes of spermatogonia present at each stage collectively, thereby ignoring the well-known fact that there are distinct spermatogonial subtypes present at each postnatal stage and that some of those occur at certain stages but not at others. This brings into question the usefulness of the authors' discussion of what types of genes are expressed and/or what types of changes in chromatin accessibility are detected in spermatogonia at each stage.

      We agree that our data do not provide information about the transcriptional program of each subtype of SPGs. Rather they provide information about the dynamics of transcriptional programs in the transition from postnatal stage to adulthood in an enriched population of SPGs. The datasets are comprehensive and contain mRNA and non-coding RNA (with and without a polyA+ tail), which provides more precise transcriptomic information than classical single cell methods.

      • Methodology - The authors based recovery (enrichment) of spermatogonia from male pups on FACS sorting for THY1 and RMV-1. While sorting total testis cells for THY1+ cells does enrich for spermaogonia, this approach is now known to not be highly specific for spermatogonia (somatic cells are also recovered) and definitely not for SSCs. There are more effective means for isolating SSCs from total testis cells that have been validated by transplantation experiments (e.g. use of the Id4/eGFP transgene marker).

      We acknowledge the technical limitations of our enrichment strategy and made them clear in our revised manuscript.

      The authors then used "deconvolution" of bulk RNA-seq data in an attempt to discern spermatogonial subtype-specific transcriptomes. It is not clear why this is necessary or how it is beneficial given the availability of multiple single-cell RNA-seq datasets already published that accomplish this objective quite nicely - as the authors essentially acknowledge. Beyond this concern, a potential flaw with the deconvolution of bulk RNA-seq data is that this is a derivative approach that requires assumptions/computational manipulations of apparent mRNA abundance estimates that may confound interpretation of the relative abundance of different cellular subtypes within the hetergeneous cell population from which the bulk RNA-seq data is derived. Bottom line, it is not clear that this approach affords any experimental advantage over use of the publicly available scRNA-seq datasets and it is possible that attempts to employ this approach may be flawed yielding misleading data.

      The deconvolution analyses were necessary to address the question of the cell composition of our preparations raised by reviewers. These analyses were highly beneficial because they clarify the presence of different SPGs including SSCs in the samples. They are also advantageous because the datasets they are conducted upon have significantly higher sequencing coverage than published single cell datasets. They contain the full transcriptome and not just polyA+ transcripts as 10x datasets thus they provide considerably richer and more comprehensive transcriptomic information. This is very important to correctly interpret the results and to gain additional biological information. For the deconvolution analyses, we used state-of-the-art methods with proper computational controls for calibration. We selected published single-cell RNA-seq datasets of the highest quality. These analyses are extremely useful because they confirm the predominance of SSCs in the postnatal and adult cell samples and a minimal contamination by somatic cells. Our approach also provides a useful workflow that can easily be used by other researchers who cannot afford single-cell RNA-seq and allow them gain more information about the cellular composition of their samples. Finally, the execution of any computational analyses, including analyses of single-cell RNA-seq datasets requires to make assumptions during the development and the use of a method. The assumptions made for deconvolution analyses are not special in this respect and do not introduce more confounds than other methods. What is critical for such analyses is to include proper controls for calibration, which we carefully did and validated using our own previously published datasets for Sertoli cells.

      • Results & Discussion - In general, much of the information reported in this study is not novel. The authors' discussion of the makeup of various spermatogonial subtypes in the testis at various ages does not really add anything to what has been known for many years on the basis of classic morphological studies. Further, as noted above, the gene expression data provided by the authors on the basis of their deconvolution of bulk RNA-seq data does not add any novel information to what has been shown in recent years by multiple elegant scRNA-seq studies - and, in fact, as also noted above - represents an approach fraught with potential for misleading results. The potential value of the authors' report of "other cell types" not corresponding to major somatic cell types identified in earlier published studies seems quite limited given that they provide no follow-up data that might indicate the nature of these alternative cell types. Beyond this, much of the gene expression and chromatin accessibility data reported by the authors - by their own admission given the references they cite - is largely confirmatory of previously published results. Similarly, results of the authors' analyses of putative factor binding sites within regions of differentially accessible chromatin also appear to confirm previously reported results. Ultimately, it is not at all novel to note that changes in gene expression patterns are accompanied by changes in patterns of chromatin accessibility in either related promoters or enhancers. The discussion of these observations provided by the authors takes on more of a review nature than that of any sort of truly novel results. As a result, it is difficult to discern how the data reported in this manuscript advance the field in any sort of novel or useful way beyond providing a review of previously published studies on these topics.

      • Likely impact - The likely impact of this work is relatively low because, other than the value it provides as a review of previously published datasets, the new datasets provided are not novel and so do not advance the field in any significant manner.

      We acknowledge that much of the reported information is not novel but this is not necessarily a drawback as sequencing datasets on the same tissues or cells produced by different groups using comparable methods are common. This does not diminish the validity and usefulness of the datasets but rather enriches the respective fields as omics methods and data analyses can deliver different findings. Thus, our study cannot be criticized and disqualified because other datasets have been published but instead it should be acknowledged for providing high resolution full transcriptome information from different stages and adult of SCs that other studies do not provide. In this respect, the subjective nature of Reviewer 1’s statements is of concern. For instance, the statement: “…represents an approach fraught with potential for misleading results”. Such declaration suggests that all studies that previously used enrichment strategies are “fraught with potential for misleading results», which disqualifies the work of many colleagues. Further, this wrongly assumes that newer technologies are exempt of “potential for misleading results» which is not the case. Single-cell RNA-seq methods, extensively used to study SPGs, has been questioned for their limitation and potential biases due to low sequencing coverage, issues with transcript detection, low capture efficiency and higher degree of noise than bulk RNA datasets. Thus, caution is needed to interpret single-cell datasets on SPGs and these datasets also have their biases. For our datasets, we made major efforts to address the criticisms raised by the reviewer and reduce any potential misleading information by conducting additional analyses, by providing more details on the methods and enrichment strategy and by being careful with data interpretation. We would be grateful if these efforts could be acknowledged and the improvements on the manuscript and the value of the datasets be evaluated with objectivity.

      Reviewer #2 (Public Review):

      This revised manuscript attempts to explore the underlying chromatin accessibility landscape of spermatogonia from the developing and adult mouse testis. The key criticism of the first version of this manuscript was that bulk preparations of mixed populations of spermatogonia were used to generate the data that form the basis of the entire manuscript. To address this concern, the authors applied a deconvolution strategy (CIBERSORTx (Newman et al., 2019)) in an attempt to demonstrate that their multi-parameter FACS isolation (from Kubota 2004) of spermatogonia enriched for PLZF+ cells recovered spermatogonial stem cells (SSCs). PLZF (ZBTB16) protein is a transcription factor known to mark all or nearly all undifferentiated spermatogonia and some differentiating spermatogonia (KIT+ at the protein level) - see Niedenberger et al., 2015 (PMID: 25737569). The authors' deconvolution using single-cell transcriptomes produced at postnatal day 6 (P6) argue that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults. Quite frankly given the established overlap between PLZF and KIT and known identity of spermatogonia at these developmental stages, this is impossible. Indeed - the authors' own analysis of the reference dataset demonstrates abundant PLZF mRNA in P6 progenitor spermatogonia - what is the authors' explanation for this observation? The same is essentially true in the use of adult references for celltype assignment. The authors found 63-82% of SSCs using this different definition of types (from a different dataset), begging the question of which of these results is true.

      For full transparency, we provided information about the deconvolution analyses for all libraries that use cell-type specific matrices generated from PND6 and adult single-cell RNA-seq reference datasets in our previous response (Fig1-3, response to reviewer 1). However, we don’t claim “that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults”. Of these percentages, the ones that correspond to our postnatal libraries are the ones reported in our updated manuscript (Please see FigS2). Importantly, we never claimed that these percentages correspond to “PLZF+ spermatogonia», exclusively. Rather, they were inferred using gene expression-specific signature matrices (Fig1-c response to Reviewer 1 as example). As clearly evident in feature maps in FigS2 of our updated manuscript, the cellular population identified as SSCs using the dataset from Hermann et al., 2018 shows overlap for the expression of Ddx4, Zbtb16 (PLZF), Gfra1 and Id4 but minimal Kit. In agreement with the reviewer’s observation, progenitors also show a signal for Zbtb16 but have a different gene expression signature matrix (see Fig.1c and 2c for an example of gene signature matrices from PND6 and adult samples from the same publication).

      Regarding the question of which of these results are true, we observed that deconvolution analyses of our postnatal libraries using two different single-cell postnatal RNA-seq reference datasets consistently suggest a high contribution (>90%) by SSCs (defined using cell-specific expression matrices following identification of cell-types that match the closest ones reported by each study (See FigS2 updated manuscript). The analyses of our adult libraries using published adult datasets from the same group (Hermann et al., 2018; Fig1 response to Reviewer 1 and FigS2 updated manuscript) suggest that the contribution of adult SSCs to the cell population is lower than at postnatal stages, but SSCs still are the most abundant cell stage identified in our libraries (FigS2g). We reported these analyses and acknowledge that in our adult samples, we also likely have differentiating SPGs.

      In their rebuttal, the authors also raise a fair point about the precision of differential gene expression among spermatogonial subsets. At the mRNA level, Kit is definitely detectable in undifferentiated spermatogonia, but it is never observed at the protein level until progenitors respond to retinoic acid (see Hermann et al., 2015). I agree with the authors that the mRNAs for "cell type markers" are rarely differentially abundant at absolute levels (0 or 1), but instead, there are a multitude of shades of grey in mRNA abundance that "separate" cell types, particularly in the male germline and among the highly related spermatogonial subtypes of interest (SSCs, progenitor spermatogonia and differentiating spermatogonia). That is, spermatogonial biology should be considered as a continuous variable (not categorical), so examining specific cell populations with defined phenotypes (markers, function) likely oversimplifies the underlying heterogeneity in the male germ lineage. But, here, the authors have ignored this heterogeneity entirely by selecting complex populations and examining them in aggregate. We already know that PLZF protein marks a wide range of spermatogonia, complicating the interpretation of aggregate results emerging from such samples. In their rebuttal, the authors nicely demonstrate the existence of these mixtures using deconvolution estimation. What remains a mystery is why the authors did not choose to perform single-cell multiome (RNA-seq + ATAC-seq) to validate their results and provide high-confidence outcomes. This is an accessible technique and was requested after the initial version, but essentially ignored by the authors.

      We agree with the reviewer that the male germ lineage should be considered as a continuous variable and that examining specific cell populations with defined features oversimplifies its heterogeneity. Regarding the use of single-cell multiome (RNA-seq + ATAC-seq), we also agree that this technology can provide additional insight by integrating RNA and chromatin accessibility in the same cells. However, it is an refined method that is expensive, time consuming and requires human resources that are beyond our capacity for this project.

      A separate question is whether these data are novel. A prior publication by the Griswold lab (Schleif et al., 2023; PMID: 36983846) already performed ATAC-seq (and prior data exist for RNA-seq) from germ cells isolated from synchronized testes. These existing data are higher resolution than those provided in the current manuscript because they examine germ cells before and after RA-induced differentiation, which the authors do not base on their selection methods. Another prior publication from the Namekawa lab extensively examined the transcriptome and epigenome in adult testes (Maezawa et al., 2000; PMID: 32895557; and several prior papers). The authors should explain how their results extend our knowledge of spermatogonial biology in light of the preceding reports.

      Our data do extend previous studies because they provide high-resolution transcriptomic (full transcriptome) and chromatin accessibility profiling in postnatal and adult stages. They now also provide an approach for deconvolution analyses of bulk RNA datasets that can be of use to the community. Novelty in the field of omics is usually not a prime feature and it is common that datasets on the same tissues or cells be published by different groups using comparable methods and analyses.

      The authors are also encouraged to improve their use of terminology to describe the samples of interest. The mitotic male germ cells in the testis are called spermatogonia (not spermatogonial cells, because spermatogonia are cells). Spermatogonia arise from Prospermatogonia. Spermatogonia are divisible into two broad groups: undifferentiated spermatogonia (comprised of few spermatogonial stem cells or SSCs and many more progenitor spermatogonia - at roughly 1:10 ratio) and differentiating spermatogonia that have responded to RA. The authors also improperly indicate that SSCs directly produce differentiating spermatogonia - indeed, SSCs produce transit-amplifying progenitor spermatogonia, which subsequently differentiate in response to retinoic acid stimulation. Further, the use of Spermatogonial cells (and SPGs) is imprecise because these terms do not indicate which spermatogonia are in question. Moreover, there have been studies in the literature which have used similar terms inappropriately to refer to SSCs, including in culture. A correct description of the lineage and disambiguation by careful definition and rigorous cell type identification would benefit the reader.

      Overall, my concern from the initial version of this manuscript stands - critical methodological flaws prevent interpretation of the results and the data are not novel. Readers should take note that results in essentially all Figures do not reflect the biology of any one type of spermatogonium.

      We revised and improved the terminology wherever possible and also considering requests from other reviewers about terminology.

      Reviewer #3 (Public Review):

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seems sound but there are several concerns with the cell population isolated from testes and lack of acknowledgement for previous studies that have also performed ATAC-sequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question validity of the interpretations and reduce the potential merit of the findings.

      I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      This was suggested in the previous review by Reviewer 1 and was modified in the revised version of the manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice. The FACS sorting approach used was based on cell surface proteins that are not germline specific so there was undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ cell specific and single cell RNA-seq analyses of testicular tissue has revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for interpretation of the RNA-seq and ATAC-seq data that was generated.

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced to in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATAC-seq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      These points have been addressed in our previous response and in the revised manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Weaknesses:

      There appears to be a lack of basic knowledge of the process of spermatogenesis. For instance, the statement that "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." is inaccurate. The Aal cells are the spermatogonial chains, the two are not distinct from one another. In addition, the authors fail to mention spermatogonial stem cells which form the basis for steady-state spermatogenesis. The authors also do not acknowledge the well-known fact that, in the mouse, the first wave of spermatogenesis is distinct from subsequent waves. Finally, the authors do not mention the presence of both undifferentiated spermatogonia (aka - type A) and differentiating spermatogonia (aka - type B). The premise for the study they present appears to be the implication that little is known about the dynamics of chromatin during the development of spermatogonia. However, there are published studies on this topic that have already provided much of the information that is presented in the current manuscript.

      Regarding the inaccuracy and incompleteness of some of the statements about spermatogonial cells and spermatogenesis. In the Introduction, we replaced the following statement: "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." by: “Spermatogonial cells (SPGs) are the initiators and supporting cellular foundation of spermatogenesis in testis in many species, including mammals. In the mammalian testis, the founding germ cells are primordial germ cells (PGCs), which give rise sequentially to different populations of SPGs : primary transitional (T1)-prospermatogonia (ProSG), secondary transitional (T2)-ProSG, and then spermatogonial stem cells (SSCs) (McCarrey, 2013; Rabbani et al., 2022; Tan et al., 2020). The ProSG population is exhausted by postnatal day (PND) 5 (Drumond et al., 2011) and by PND6-8, distinct SPGs subtypes can be distinguished on the basis of specific marker proteins and regenerative capacity (Cheng et al., 2020; Ernst et al., 2019; Green et al., 2018; Hermann et al., 2018; Tan et al., 2020).

      SSCs represent an undifferentiated population of SPGs that retain regenerative capacity and divide to either self-renew or generate progenitors that initiate spermatogenic differentiation, giving rise to differentiating SPGs (diff-SPGs ). Diff-SPGs form chains of daughter cells that become primary and secondary spermatocytes around PND10 to 12. Spermatocytes then undergo meiosis and give rise to haploid spermatids that develop into spermatozoa. Spermatozoa are then released into the lumen of seminiferous tubules and continue to mature in the epididymis until becoming capable of fertilization by PND42-48 in mice  (Kubota and Brinster, 2018; Rooij, 2017).”

      Regarding the premise and implications of our findings. We clarified the premise of our finding in the revised manuscript. The following statement was included in the Discussion: "our findings complement existing datasets on spermatogonial cells by providing parallel transcriptomic and chromatin accessibility maps at high resolution from the same cell populations at early postnatal, late postnatal and adult stages collected from single individuals (for adults)".  

      It is not clear which spermatogonial subtype the authors intended to profile with their analyses. On the one hand, they used PLZF to FACS sort cells. This typically enriches for undifferentiated spermatogonia. On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected. The authors cite multiple previously published studies of gene expression during spermatogenesis, including studies of gene expression in spermatogonia. It is not at all clear what the authors' data adds to the previously available data on this subject.

      The authors analyzed cells recovered at PND 8 and 15 and compared those to cells recovered from the adult testis. The PND 8 and 15 cells would be from the initial wave of spermatogenesis whereas those from the adult testis would represent steady-state spermatogenesis. However, as noted above, there appears to be a lack of awareness of the well-established differences between spermatogenesis occurring at each of these stages.

      We applied computational deconvolution to our bulk RNA-seq datasets, employing publicly available single-cell RNA-seq datasets, to estimate and identify cellular composition. Trained on high-quality RNA-seq datasets from pure or single-cell populations, deconvolution algorithms create expression matrices reflecting the cellular diversity in reference datasets. These cell-type-specific expression matrices are subsequently used to determine the cellular composition of bulk RNA-seq samples with unknown cellular components (Cobos et al., 2023).

      For our analysis, we chose CIBERSORTx (Newman et al., 2019), recognized as the most advanced deconvolution algorithm to date, employing it with three high-quality, publicly available single-cell RNA-seq datasets. First, we assessed the cellular composition of all our RNA-seq libraries, using datasets generated by (Hermann et al., 2018) which characterized the single-cell transcriptomes of testicular cells and various populations of spermatogonial progenitor cells (SPGs) in early postnatal (PND6) and adult stages. This enabled us to not only address potential somatic cell contamination but also to analyse the composition of isolated SPGs using a unified dataset source.

      Author response image 1.

      Deconvolution analysis of bulk RNA-seq samples using PND6 single-cell RNA seq from Hermann et al, 2018 a. Seurat clusters from PND6 single-cell RNA-seq. b. Feature maps of gene expression for markers of SPGs and somatic cells. c. Gene expression signature matrix from PND6  single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      By re-analyzing the single-cell RNA-seq datasets, we identified distinct cell-type clusters, marked by specific cellular markers as reported in the original and subsequent studies (Author response image 1a,b and Author response image 2a,b). Then, CIBERSORTx generated gene-expression signature matrices and estimated the cell-type proportions within our 18 bulk RNA-seq libraries. Evaluation of our postnatal libraries (PND8 and 15) against a PND6 signature matrix revealed a predominant derivation from SPGs, with average estimated proportions of spermatogonial stem cells (SSCs) being 0.99 and 0.85 for PND8 and PND15 samples, respectively (Author response image 1c-e). Notably, the analysis of PND15 libraries also suggested the presence of additional SPGs types, including progenitors and differentiating SPGs (Author response image 1d), albeit at lower frequency. 

      Similarly, evaluation of our adult RNA-seq libraries, using an adult signature matrix, showed an average SSC proportion of 0.82, indicating a primary derivation from SSC cells. Consistent with the findings from PND15 libraries, our deconvolution analysis also suggests the presence of additional SPG types, including progenitors and differentiating SPGs (Author response image 1d). However, unlike our early and late postnatal stage libraries, the deconvolution analysis of adult libraries indicated the presence of other cell types (labeled "Other"), not corresponding to the major somatic cell types identified by Hermann et al. 2018. The estimated average proportion of these cells was less than 0.05 in two adult libraries and 0.10 in the others. This variance in cellular composition underlines the deconvolution method's effectiveness in dissecting complex cellular compositions in bulk RNA-seq samples.

      Author response image 2.

      Deconvolution analysis of bulk RNA-seq samples using Adult single-cell RNA seq (Hermann et al, 2018) a. Seurat clusters from Adult single-cell RNA-seq. b. Feature maps of gene expression for markers of SPG and somatic cells. c. Gene expression signature matrix from Adult single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      To further validate our observations, we re-analyzed two additional testicular single-cell RNA-seq datasets derived from an early postnatal stage (PND7) (Tan et al., 2020) and adult (Green et al., 2018) (Author response image 3a,b and Author response image 4a,b). We identified distinct cell-type clusters, marked by specific cellular markers (Author response image 3a,b and Author response image 4a,b), and proceeded with the deconvolution analysis using CIBERSORTx. Evaluation of our postnatal libraries (PND8 and 15) against the PND7 signature matrix from Tan et al., 2020 confirmed a derivation from germ cells (Author response image 3d,e), in particular from SSCs (Author response image 3g), with average estimated proportions of SSCs being 0.93 and 0.86 for PND8 and PND15 samples, respectively, and the rest estimated to be in origin from differentiating SPGs (Author response image 3g,h). In the case of the adult samples, evaluation against the adult signature matrix from Green et al., 2018 confirmed a predominant derivation from SSCs, with average estimated proportions of SSCs being 0.79, consistent with the 0.82 estimated proportion from Hermann et al., 2018. 

      Author response image 3.

      Deconvolution analysis of bulk RNA-seq samples with additional single-cell datasets. Seurat clusters from PND7 single-cell RNA-seq (Tang 2020). b. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. c. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study. d. Re-clustering of germ cell cluster shown in a. e. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. f. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study. g. Seurat clusters from adult single-cell RNA-seq (Green et al., 2018). h. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. i. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study.

      To further validate our deconvolution strategy, we interrogated the cellular composition of bulk RNA-seq libraries derived from cellular populations enriched in Sertoli cells, generated by our group using a similar enrichment/sorting strategy (Thumfart et al., 2022). As expected, our results show that all our libraries are mainly composed of Sertoli cells suggesting that the deconvolution strategy employed is accurate in detecting cell-type composition (Author response image 4).

      Author response image 4.

      Deconvolution analysis of Sertoli bulk RNA-seq samples. Barplots of estimated cellular proportions for bulk RNAseq libraries reported in Thumfart et al., 2022. Expression matrices were derived from the analysis of single-cell RNA-seq datasets used to asses cellular composition of the SPGs bulk libraries.

      Author response image 5.

      Id4 and Kit are transcribed in SSCs. Seurat clusters from PND6 single-cell RNA-seq (left) and feature maps of gene expression for Id4 (center) and Kit (right). Zoom in into SSCs (red).

      Finally, regarding the following observation by the reviewer: "On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected." It was recently shown using single-cell RNA that “nearly all differentiating spermatogonia at P3 (delineated as c-KIT+) are ID4-eGFP” (Law et al., 2019).  While this finding does not exclude the fact that we have a mixture of SPGs cells, this finding supports the possibility that SPG cells express both markers of undifferentiated and differentiated cells, particularly in the early stages of postnatal development. Indeed, we observe that some cells labeled as SSC show signals for both Id4 and Kit in single-cell RNA-seq data from Hermann et al., 2018 (Author response image 5).

      Therefore, the results from the deconvolution analysis and our immunofluorescence data showing 85-95% PLZF+  cells in our cellular preparations underscore that our bulk RNA-seq libraries are mainly composed of SPGs. The deconvolution analysis also suggests a predominantly cellular composition of SSCs and to a lesser degree of differentiating SPGs. Our adult RNA-seq libraries show a small proportion of somatic cells (<0.10). 

      In the revised manuscript, we compiled the deconvolution analyses and present them in a condensed version in Supplementary Fig 2. 

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      We acknowledge that RNA-seq and ATAC-seq datasets like ours are observational and that their interpretation can be speculative. Nevertheless, our datasets represent an additional useful resource for the community because they are comprehensive and high resolution, and can be exploited for instance, for studies in environmental epigenetics and epigenetic inheritance examining the immediate and long-term effects of postnatal exposure and their dynamics. The depth of our RNA sequencing allowed detect transcripts with a high dynamic range, which has been limited with classical RNA sequencing analyses of spermatogonial cells and with single-cell analyses (which have comparatively low coverage). Further, our experimental pipeline is affordable (more than single cell sequencing approaches) and in the case of adults, provides data per animal informing on the intrinsic variability in transcriptional and chromatin regulation across males. These points will be discussed in the revised manuscript.

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      Relevant information for both points was included in the Discussion of the revised manuscript.  

      The phenomenon of epigenetic priming is discussed, but then it seems that there is some expression of surprise that the data demonstrate what this reviewer would argue are examples of that phenomenon. The authors discuss the "modest correspondence between transcription and chromatin accessibility in SCs." Chromatin accessibility is an example of an epigenetic parameter associated with the primed state. The primed state is not fully equivalent to the actively expressing state. It appears that certain histone modifications along with transcription factors are critical to the transition between the primed and actively expressing states (in either direction). The cell types that were investigated in this study are closely related spermatogenic, and predominantly spermatogonial cell types. It is very likely that the differentially expressed loci will be primed in both the early (PND 8 or 15) and adult stages, even though those genes are differentially expressed at those stages. Thus, it is not surprising that there is not a strict concordance between +/- chromatin accessibility and +/- active or elevated expression.

      Relevant information was included in the Discussion of the revised manuscript.

      Reviewer #2:

      The objective of this study from Lazar-Contes et al. is to examine chromatin accessibility changes in "spermatogonial cells" (SCs) across testis development. Exactly what SCs are, however, remains a mystery. The authors mention in the abstract that SCs are undifferentiated male germ cells and have self-renewal and differentiation activity, which would be true for Spermatogonial STEM Cells (SSCs), a very small subset of total spermatogonia, but then the methods they use to retrieve such cells using antibodies that enrich for undifferentiated spermatogonia encompass both undifferentiated and differentiating spermatogonia. Data in Fig. 1B prove that most (85-95%) are PLZF+, but PLZF is known to be expressed both by undifferentiated and differentiating (KIT+) spermatogonia (Niedenberger et al., 2015; PMID: 25737569). Thus, the bulk RNA-seq and ATAC-seq data arising from these cells constitute the aggregate results comprising the phenotype of a highly heterogeneous mixture of spermatogonia (plus contaminating somatic cells), NOT SSCs. Indeed, Fig. 1C demonstrates this by showing the detection of Kit mRNA (a well-known marker of differentiating spermatogonia - which the authors claim on line 89 is a marker of SCs!), along with the detection of markers of various somatic cell populations (albeit at lower levels).

      The reviewer is correct that our spermatogonial cell populations are mixed and include undifferentiated and differentiated cells, hence the name of spermatogonia (SCs), and probably also contains some somatic cells. We acknowledge that this is a limitation of our isolation approach. To circumvent this limitation, we will conduct in silico deconvolution analysis using publicly available single-cell RNA sequencing datasets to obtain information about markers corresponding to undifferentiated and differentiated spermatogonia cells, and somatic cells. These additional analyses will provide information about the cellular composition of the samples and clarify the representation of undifferentiated and differentiated spermatogonial cells and other cells.

      This admixture problem influences the results - the authors show ATAC-seq accessibility traces for several genes in Fig. 2E (exhibiting differences between P15 and Adult), including Ihh, which is not expressed by spermatogenic cells, and Col6a1, which is expressed by peritubular myoid cells. Thus, the methods in this paper are fundamentally flawed, which precludes drawing any firm conclusions from the data about changes in chromatin accessibility among spermatogonia (SCs?) across postnatal testis development.

      The reviewer raises concern about the lack of correspondence between chromatin accessibility and expression observed for some genes, arguing that this precludes drawing firm conclusions. However, a dissociation between chromatin accessibility and gene expression is normal and expected since chromatin accessibility is only a readout of protein deposition and occupancy e.g. by transcription factors, chromatin regulators, or nucleosomes, at specific genomic loci that does not give functional information of whether there is ongoing transcriptional activity or not. A gene that is repressed or poised for expression can still show a clear signal of chromatin accessibility at regulatory elements. The dissociation between chromatin accessibility and transcription has been reported in many different cells and conditions (PMID: 36069349, PMID: 33098772) including in spermatogonial cells (PMID: 28985528) and in gonads in different species (PMID: 36323261). Therefore, the dissociation between accessibility and transcription is not a reason to conclude that our data are flawed.

      In addition, there already are numerous scRNA-seq datasets from mouse spermatogenic cells at the same developmental stages in question.

      This is true but full transcriptomic profiling like ours on cell populations provides different transcriptional information that is deeper and more comprehensive. Our datasets identified >17,000 genes while scRNA-seq typically identifies a few thousand of genes. Our analyses also identified full-length transcripts, variants, isoforms, and low abundance transcripts. These datasets are therefore a valuable addition to existing scRNAseq.

      Moreover, several groups have used bulk ATAC-seq to profile enriched populations of spermatogonia, including from synchronized spermatogenesis which reflects a high degree of purity (see Maezawa et al., 2018 PMID: 29126117 and Schlief et al., 2023 PMID: 36983846 and in cultured spermatogonia - Suen et al., 2022 PMID: 36509798) - so this topic has already begun to be examined. None of these papers was cited, so it appears the authors were unaware of this work.

      We apologize for not mentioning these studies in our manuscript, we will do so in the revised version.

      The authors' methodological choice is even more surprising given the wealth of single-cell evidence in the literature since 2018 demonstrating the exceptional heterogeneity among spermatogonia at these developmental stages (the authors DID cite some of these papers, so they are aware). Indeed, it is currently possible to perform concurrent scATAC-seq and scRNA-seq (10x Genomics Multiome), which would have made these data quite useful and robust. As it stands, given the lack of novelty and critical methodological flaws, readers should be cautioned that there is little new information to be learned about spermatogenesis from this study, and in fact, the data in Figures 2-5 may lead readers astray because they do not reflect the biology of any one type of male germ cell. Indeed, not only do these data not add to our understanding of spermatogonial development, but they are damaging to the field if their source and identity are properly understood. Here are some specific examples of the problems with these data:

      Fig. 2D - Gata4 and Lhcgr are not expressed by germ cells in the testis.

      Fig. 3A - WT1 is expressed by Sertoli cells, so the change in accessibility of regions containing a WT1 motif suggests differential contamination with Sertoli cells. Since Wt1 mRNA was differentially high in P15 (Fig. 3B) - this seems to be the most likely explanation for the results. How was this excluded?

      Fig. 3D - Since Dmrt1 is expressed by Sertoli cells, the "downregulation" likely represents a reduction in Sertoli cell contamination in the adult, like the point above. Did the authors consider this?

      Regarding concerns about contamination by somatic cells (Transcription). In addition to the results of our deconvolution analysis (see response to Reviewer #1), we addressed the specific concern of the paradoxical expression of genes considered markers of somatic cells in the testis. For instance, we plotted the expression values of Ihh, Lhcgr, Gata4, Col16a, Wt1, and Dmrt1 along with the expression values of Ddx4 and Zbtb16. We observe that the expression level of Ddx4 and Zbtb16, genes expressed predominantly in SPGs, is orders of magnitude higher than the one observed for the rest of the genes with the notable exception of Dmrt1 which is also highly expressed (Fig.6). Indeed, our analysis of publicly available single-cell RNA-seq datasets shows that Dmrt1 is robustly expressed in germ cells (Author response image 7), and as also noted by the reviewer, in Sertoli cells in postnatal stages. Notably, we observe a significant stepwise decrease in the expression of Dmrt1 across the postnatal maturation of SPG cells. This is highly unlikely to be a result of major contamination by Sertoli cells of just our postnatal libraries. We based this statement on three observations. First, the deconvolution analysis of all our RNA-seq libraries using four different expression signature matrices from high-quality single-cell RNAseq from testis showed that our libraries are largely derived from SPGs. Second, the evaluation of our adult libraries with the PND6 signature matrix from Green et al., 2018 suggested that the proportion of Sertoli cells in our adult libraries, if any, would be higher than in our postnatal libraries (Author response image 3d, blue bars). This makes it unlikely that the observed decrease in expression of Dmrt1 in adult samples is due to prominent somatic contamination of the postnatal libraries. Third, the step-wise decrease in Dmrt1 expression seems to correlate with progression during postnatal development (Author response image 7) as feature maps of Dmrt1 expression derived from public single-cell RNA-seq experiments show a reduction in expression in adult SPGs in comparison with early postnatal stages (Author response image 7 last two panels). Then, the observed effects are likely the result of developmental gene regulatory processes that operate during the developmental maturation of SPGs. 

      Author response image 6.

      Expression of germ and somatic cell markers in our RNA-seq datasets. Boxplots of log2(CPM) (Top) and CPM (Bottom) values for selected genes from our RNAseq datasets. Each point in boxplots represent the expression value of a biological replicate.

      Author response image 7.

      Expression of germ and somatic cell markers in publicly available single-cell RNA-seq datasets. Seurat clusters from all analyzed single-cell RNA-seq datasets (first column from left) and feature maps of gene expression for Zbtb16, Dmrt1 and Wt1.

      Consistent with the reviewer’s observation, Ihh is not expressed in germ cells and indeed we do not detect signal at this locus nor Lhcgr. Furthermore, while we indeed observe a significant increase in the expression of Wt1 in PND15 samples, its expression level is considerably lower than that of SPG markers. This is even more evident when plotting expression data in a linear scale rather than as a log2 transformation of the expression values. Whether such transcriptional profiles reflect developmentally regulated transcription, stochastic effects on gene expression, or potential somatic contamination is difficult to determine. However, based on our deconvolution data we believe it is unlikely that major contamination could account for our observations. 

      Notably, while Wt1 is robustly expressed in nearly all Sertoli cells across postnatal development (Author response image 7), it is also detected in other cell types including SPGs -although in fewer cells and with lower expression levels-, consistent with our observations (Author response image 6 and 8). Therefore, the assignment of a gene as a marker of a particular cell type does not imply that such a gene is expressed uniquely in such cell, rather it is expressed in more cells and likely at higher levels. 

      Author response image 8.

      Expression of Wt1 in publicly available single-cell RNA-seq datasets. Feature maps of gene expression for Wt1. In dashed boxes, a zoom-in into germ cells cluster that show expression of Wt1 at some of these cells.

      Regarding concerns about contamination by somatic cells (chromatin accessibility). In Figure 2 of our manuscript, we show the chromatin accessibility landscape of different genes, including genes either not expressed in testicular cells (Ihh) and those believed to be expressed exclusively in somatic cells (Lhcgr, Gata4, Col16a1, Wt1). For some of these genes, we reported changes in chromatin accessibility at specific sites between PND15 and adults (e.g. Wt1 and Col16a1). The observation of "traces of chromatin accessibility" at these loci and the reported changes in accessibility raised concerns of potential contamination which "fundamentally flaw" our results, as stated by the reviewer. While we acknowledge that all enrichment methods have a margin of potential contamination, we fundamentally disagree with the reviewer's observations. 

      The term chromatin accessibility can be misleading. In principle, the term accessibility might suggest the literal lack of protein deposition at a given place in the genome. Rather, chromatin accessibility as evaluated by ATAC- seq (as in this case) must be interpreted as a measure of protein occupancy genome-wide (PMID: 30675018). Depending on the type of fragments analyzed we can obtain information regarding the occupancy of transcription factors (TFs), nucleosomes, and other chromatin-associated proteins that are present at genomic locations at a given time within a population of cells. The detection of chromatin accessibility at a given locus does not necessarily indicate transcription of the gene in a given cell type. A gene can be repressed or poised for expression and still show a clear signal of chromatin accessibility at its regulatory elements or along the gene body. For instance, in agreement with the reviewer's observation, neither Ihh nor Lhcgr is expressed in our datasets (Author response image 6 and Author response image 9), however, they show a distinctive pattern of chromatin accessibility in our datasets and publicly available ATAC-seq data derived from undifferentiated (Id4bright) and differentiating SPGs (Id4-dim) (Cheng et al., 2020) (Author response image 9). A similar argument can be applied regarding other loci such as Wt1 and Col6a1 for which we also observe extremely low levels of transcription. Therefore, the lack of transcription does not exclude that these loci display clear patterns of chromatin accessibility (Author response image 9). Notably, while traces of  chromatin accessibility can also be observed in ATAC-seq datasets from embryonic Sertoli cells (Garcia-Moreno et al., 2019) and other somatic stem cells (hematopoietic stem cells; HSCs) (Xiang et al., 2020) (Author response image 9), the pattern of chromatin accessibility markedly differs with that observed in SPG cells. Therefore, the observed changes in chromatin accessibility are unlikely to result from contaminating somatic cells.

      To strengthen our observation, we identified regions of chromatin accessibility in SPGs, Sertoli, and HSCs using both our datasets and publicly available ATAC-seq datasets. Overlap analysis revealed at least four groups of ATAC-seq peaks: 1) peaks shared among all analyzed cell types, 2)peaks shared just among SPG cells, 3) peaks specific to Sertoli cells and 4) peaks specific to HSCs (Author response image 10). Peaks shared among all tested cell-types are predominantly located at promoters of genes involved in translation and DNA replication (GO analysis adj p-value<0.05). In contrast, cell-type specific peaks are localized at intergenic and intragenic regions, suggesting localization at enhancer elements (Author response image 10). Indeed, GO analysis of cell-type specific peaks revealed enrichment for genes involved in male meiosis for SPGs, vesicle-mediated transport for Sertoli cells and in immune system process for HSCs, consistent with cell-type specific functions. If contamination by somatic cells, such as Sertoli cells, would be prominent as stated by the reviewer, we would expect to observe prominent ATAC-seq signal from our datasets at peaks specific to Sertoli cells. Notably, we don't observe ATAC-seq signal at peaks specific for Sertoli cells using our ATAC-seq samples. However, we observe robust signals at shared peaks and peaks specific to SPG cells. This observation, strongly argues against the possibility of major contamination by somatic cells. 

      Author response image 9.

      Chromatin accessibility profiles at specific loci differ between SPG cells and other cell types. Genome-browser tracks for Ihh, Wt1, Col16a1 and Zbtb16. For each gene, an extended locus view is presented with RNA-seq data (this study) and normalized ATAC-seq tracks from our study and public sources (SPG Id4; GSE131657; Sertoli; GSM3346484; HSC; ENCFF204JEE). Public ATAC-seq datasets were generated enrichment methods similar to the one employed in our study.

      Author response image 10.

      Shared and cell-type specific ATAC-seq peaks among SPGs, Sertoli and HSC. Up, Normalized ATACseq signal heatmaps of shared and unique ATAC-seq peaks. PND15 and Adult samples are derived from our study. ATAC-seq signal is plotted +/- 500bp from peak center. Bottom, pie charts of ATAC-seq peaks genomic distribution.

      Reviewer #3:

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases of postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using a cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seem sound but there are several concerns with the cell population isolated from testes and lack of acknowledgment for previous studies that have also performed ATACsequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question the validity of the interpretations and reduce the potential merit of the findings. I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      We thank the reviewer for the suggestion and will rename SCs into SPG cells in the revised manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice.

      We will provide a rationale for the use of postnatal 8 and 15 stages in the revised manuscript. Briefly, these stages are interesting to study because early to mid postnatal life is a critical window of development for germ cells during which environmental exposure can have strong and persistent effects. The possibility that changes in germ cells can happen during this period and persist until adulthood is an important area of research linked to disciplines like epigenetic toxicology and epigenetic inheritance.

      The FACS sorting approach used was based on cell surface proteins that are not germline-specific so there were undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ-cell specific and single-cell RNA-seq analyses of testicular tissue have revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ-cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for the interpretation of the RNA-seq and ATAC-seq data that was generated.

      In agreement with the reviewer’s observation, Zbtb16 (PLZF) is expressed in germ cells but also in somatic cells, in particular in the dataset derived from Green et al., 2018 (Author response image 11). However, when evaluating the expression patterns of Ddx4, we noticed that similar to Zbtb16, it is expressed both in the germ line and in the somatic compartment (Author response image 11). Notably, we observe expression of Ddx4 in SSC but also in progenitors and differentiating SPGs (Author response image 11g). These observations suggest that at least at the transcript level, both genes are transcribed in germ cells and to a lesser degree in somatic cells. 

      Author response image 11.

      Single-cell expression of Ddx4 and Zbtb16. Seurat clusters from all analyzed single-cell RNA-seq datasets (a,c,e,g,i) and feature maps of gene expression for Ddx4 and Zbtb16 (b,d,f,j, h).

      Finally, our deconvolution analysis using geneexpression signature matrices for different cellular populations suggest that our RNA-seq and ATAC-seq libraries are largely derived from SPG cells and in particular of SSCs.

      Furthermore, while this analysis suggested the presence of somatic cells, their proportion is minimal in comparison with germ cells (Author response images 1-4). This is also supported by ATAC-seq analysis of somatic cells from testis (Author response images 9 and 10). 

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATACseq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      We compared our ATAC-seq datasets with the ones from (Maezawa et al., 2017) and those from (Cheng et al., 2020). All these datasets were generated from FACSs sorted cells enriched for undifferentiating and differentiating SPGs. Sequencing files from Cheng et al, 2020 were equally processed as described in out methods section, while our pipeline was adjusted to process files from Maezawa et al., 2018 as they were single-end sequencing files. We generated a reference set of peaks from SPGs and calculated signal scores for all peaks across all samples. Then, calculated the Pearson correlation for all pairwise comparisons and generated a heatmap of correlations (Author response image 12). Two clusters emerge that separate the SPG samples from the pachytene spermatocytes and round spermatids reported by Maezawa et al., 2018. As expected SPG samples clustered together based on study of origin. Consistently, our postnatal samples formed one cluster next to but separated from the adult one. Similarly, the id4-bright samples clustered together and next to the id4-sim and the sample applied for the Thy1 and cKit samples. Notably, our samples and the ones from Cheng et al., 2020 have a higher correlation with each other when compared with the ones from Maezawa et al., 2018. Given the fundamental difference in library sequencing (single-end instead of the widely used paired-end for ATAC-seq experiments) we reasoned a comparison with the Maezawa et al., 2018 datasets is not optimal. Therefore, this data in addition to the one presented before (see response to Reviewer 1 and 2) strongly supports a predominantly SPG derivation of all our sequencing libraries. 

      Author response image 12.

      Pearson correlation at the peak level among different ATAC-seq datasets. a) Our ATAC-seq libraries and ATAC-seq libraries from b) Cheng et al., 2020 and c) Maezawa et al., 2020. Thy1-1 and cKit libraries correspond to undifferentiated and differentiating SPGs, respectively. PS, pachytene spermatocytes and RS, round spermatids. Correlation analysis was done using Deeptools.

      References

      Cheng K, Chen I-C, Cheng C-HE, Mutoji K, Hale BJ, Hermann BP, Geyer CB, Oatley JM, McCarrey JR. 2020. Unique Epigenetic Programming Distinguishes Regenerative Spermatogonial Stem Cells in the Developing Mouse Testis. iScience 23:101596. doi:10.1016/j.isci.2020.101596

      Cobos FA, Panah MJN, Epps J, Long X, Man T-K, Chiu H-S, Chomsky E, Kiner E, Krueger MJ, Bernardo D di, Voloch L, Molenaar J, Hooff SR van, Westermann F, Jansky S, Redell ML, Mestdagh P, Sumazin P. 2023. Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes. Genome Biol 24:177. doi:10.1186/s13059-023-03016-6

      Drumond AL, Meistrich ML, Chiarini-Garcia H. 2011. Spermatogonial morphology and kinetics during testis development in mice: a high-resolution light microscopy approach. Reproduction 142:145–155. doi:10.1530/rep-10-0431

      Ernst C, Eling N, Martinez-Jimenez CP, Marioni JC, Odom DT. 2019. Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nat Commun 10:1251. doi:10.1038/s41467-019-09182-1

      Garcia-Moreno SA, Futtner CR, Salamone IM, Gonen N, Lovell-Badge R, Maatouk DM. 2019. Gonadal supporting cells acquire sex-specific chromatin landscapes during mammalian sex determination. Dev Biol 446:168–179. doi:10.1016/j.ydbio.2018.12.023

      Green CD, Ma Q, Manske GL, Shami AN, Zheng X, Marini S, Moritz L, Sultan C, Gurczynski SJ, Moore BB, Tallquist MD, Li JZ, Hammoud SS. 2018. A Comprehensive Roadmap of Murine Spermatogenesis Defined by Single-Cell RNA-Seq. Dev Cell 46:651-667.e10. doi:10.1016/j.devcel.2018.07.025

      Hermann BP, Cheng K, Singh A, Cruz LR-DL, Mutoji KN, Chen I-C, Gildersleeve H, Lehle JD, Mayo M, Westernströer B, Law NC, Oatley MJ, Velte EK, Niedenberger BA, Fritze D, Silber S, Geyer CB, Oatley JM, McCarrey JR. 2018. The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids. Cell Rep 25:1650-1667.e8. doi:10.1016/j.celrep.2018.10.026

      Kubota H, Brinster RL. 2018. Spermatogonial stem cells†. Biol Reprod 99:52–74. doi:10.1093/biolre/ioy077

      Law NC, Oatley MJ, Oatley JM. 2019. Developmental kinetics and transcriptome dynamics of stem cell specification in the spermatogenic lineage. Nat Commun 10:2787. doi:10.1038/s41467-019-10596-0

      Maezawa S, Yukawa M, Alavattam KG, Barski A, Namekawa SH. 2017. Dynamic reorganization of open chromatin underlies diverse transcriptomes during spermatogenesis. Nucleic Acids Res 46:gkx1052-. doi:10.1093/nar/gkx1052

      McCarrey JR. 2013. Toward a More Precise and Informative Nomenclature Describing Fetal and Neonatal Male Germ Cells in Rodents1. Biol Reprod 89:Article 47, 1-9. doi:10.1095/biolreprod.113.110502

      Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA. 2019. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37:773–782. doi:10.1038/s41587-019-0114-2

      Rabbani M, Zheng X, Manske GL, Vargo A, Shami AN, Li JZ, Hammoud SS. 2022. Decoding the Spermatogenesis Program: New Insights from Transcriptomic Analyses. Annu Rev Genet 56:339–368.

      doi:10.1146/annurev-genet-080320-040045

      Rooij DG de. 2017. The nature and dynamics of spermatogonial stem cells. Development 144:3022–3030. doi:10.1242/dev.146571

      Tan K, Song H-W, Wilkinson MF. 2020. Single-cell RNAseq analysis of testicular germ and somatic cell development during the perinatal period. Development 147:dev183251. doi:10.1242/dev.183251

      Thumfart KM, Lazzeri S, Manuella F, Mansuy IM. 2022. Long-term effects of early postnatal stress on Sertoli cells. Front Genet 13:1024805. doi:10.3389/fgene.2022.1024805

      Xiang G, Keller CA, Heuston EF, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. 2020. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res 30:gr.255760.119. doi:10.1101/gr.255760.119

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Fig. 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 can no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4-induced insulin secretion fold increase under these circumstances, as shown in Fig. 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation appears as working less well. A simple experiment to rule out this possibility would be to test insulin secretion following KCl stimulation under these conditions to determine if maximal stimulation has been reached or not. We will perform this control experiment in the revised manuscript to clarify this point. We will also include absolute insulin results as well as percentages of secretion to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Fig. 1H, I) are relatively variable and we therefore plan to perform further biological repeats of this experiment for the paper revision to consolidate our current findings. 

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1RA responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We will highlight this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We do however agree that it would be interesting to perform further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol, and we will therefore add this experiment to the paper revisions.

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      To clarify the intent of the Laurdan experiments early in the manuscript, we will add the following text to the methods section in the paper revisions: “Laurdan, 6-dodecanoyl-2-dimethylaminonaphthalene (product D250) was purchased from ThermoFisher.  Laurdan (40 μM) was excited using a 405 nm solid state laser and SNAP/FLAG-hGLP-1R labelled with SNAP-Surface Alexa Fluor 647 with a pulsed (80 MHz) super-continuum white light laser at 647 nm. Laurdan emission was recorded in the ranges of 420–460 nm (IB) and 470–510 nm (IR), and the general polarisation (GP) formula (GP = IB-IR/IB+IR) used to retrieve the relative lateral packing order of lipids at the plasma membrane. Values of GP vary from 1 to −1, where higher numbers reflect lower fluidity or higher lateral lipid order, whereas lower numbers indicate increasing fluidity.”

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F. 

      Figs. 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We will add a schematic of this assay to the methods section in the paper revisions to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP-1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, in the paper revisions we will include ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Fig. 1E.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We will include these data and compare islet cholesterol levels after the high cholesterol diet with those of HFD-fed mouse islets in the paper revisions.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, we will perform exendin-4 binding affinity experiments in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells for the paper revisions.

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we will add this worthy point to the discussion and hypothesise on possible effects of the V229A mutation on central GLP-1R effects in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editorial team for a thoughtful and constructive assessment. We appreciate all comments, and we try our best to respond appropriately to every reviewer’s queries below. It appears to us that one main worry was regarding appropriate modelling of the complex and rich structure of confounding variables in our movie task. 

      One recent approach fits large feature vectors that include confounding variables along the variable(s) of interest to the activity of each voxel in the brain to disentangle the contributions of each variable to the total recorded brain response. While these encoding models have yielded some interesting results, they have two major drawbacks which makes using them unfeasible for our purposes (as we explain in more detail below): first, by fitting large vectors to individual voxels, they tend to over-estimate effect size; second, they are very ineffective at unveiling group-level effects due to high variability between subjects. Another approach able to deal with at least the second of these worries is “inter-subject-correlation”. In this technique brain responses are recorded from multiple subjects while they are presented with natural stimuli. For each brain area, response time courses from different subjects are correlated to determine whether the responses are similar across subjects. Our “peak and valley” analysis is a special case of this analysis technique, as we explain in the manuscript and below. 

      For estimating individual-level brain-activation, we opted for an approach that adapts a classical method of analysing brain data – convolution - to naturalistic settings. Amplitude modulated deconvolution extends classical brain analysis tools in several ways to handle naturalistic data:

      (1) The method does not assume a fixed hemodynamic response function (HRF). Instead, it estimates the HRF over a specified time window from the data, allowing it to vary in amplitude based on the stimulus. This flexibility is crucial for naturalistic stimuli, where the timing and nature of brain responses can vary widely. 

      (2) The method only models the modulation of the amplitude of the HRF above its average with respect to the intensity or characteristics of the stimulus. 

      (3) By allowing variation in the response amplitude, non-linear relationships between the stimulus and brain-response can be captured. 

      It is true that amplitude modulated deconvolution does not come without its flaws – for example including more than a few nuisance regressors becomes computationally very costly. Getting to grips with naturalistic data (especially with fMRI recordings) continuous to be an active area of research and presents a new and exciting challenge. We hope that we can convince reviewers and editors with this response and the additional analyses and controls performed, that the evidence presented for the visual context dependent recruitment of brain areas for abstract and concrete conceptual processing is not incomplete. 

      Overview of Additional Analyses and Controls Performed by the Authors:

      (1) Individual-Level Peaks and Valleys Analysis (Supplementary Material, Figures S3, S4, and S5)

      (2) Test of non-linear correlations of BOLD responses related to features used in the Peak and Valley Analysis (Supplementary Material, Figures S6, S7)

      (3) Comparison of Psycholinguistic Variables Surprisal and Semantic Diversity between groups of words analysed (no significant differences found)  

      (4) Comparison of Visual Variables Optical Flow, Colour Saturation, and Spatial Frequency for 2s Context Window between groups of words analysed (no significant differences found)

      These controls are in addition to the five low-level nuisance regressors included in our model, which are luminance, loudness, duration, word frequency, and speaking rate (calculated as the number of phonemes divided by duration) associated with each analysed word. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Peaks and Valleys Analysis: 

      (1) Doesn't this method assume that the features used to describe each word, like valence or arousal, will be linearly different for the peaks and valleys? What about non-linear interactions between the features and how they might modulate the response? 

      Within-subject variability in BOLD response delays is typically about 1 second at most (Neumann et al., 2003). As individual words are presented briefly (a few hundred Ms at most) and the BOLD response to these stimuli falls within that window (1s/TR), any nonlinear interactions between word features and a participant’s BOLD response within that window are unlikely to significantly affect the detection of peaks and valleys.

      To quantitatively address the concern that non-linear modulations could manifest outside of that window, we include a new analysis in Figure S6, which compares the average BOLD responses of each participant in each cluster and each combination of features, showing that only a very few of all possible comparisons differ significantly from each other (~ 5000 combinations of features were significantly different from each other given an overall number of ~130.000 comparisons between BOLD responses to features, which amounts to 3.85%), suggesting that there are no relevant non-linear interactions between features. For a full list of the most non-linearly interacting features see Figure S7. 

      (2) Doesn't it also assume that the response to a word is infinitesimal and not spread across time? How does the chosen time window of analysis interact with the HRF? From the main figures and Figures S2-S3 there seem to be differences based on the timelag. 

      The Peak and Valley (P&V) method does not assume that the response to a word is infinitesimal or confined to an instantaneous moment. The units of analysis (words) fall within one TR, as they are at most hundreds of Ms long – for this reason, we are looking at one TR only. The response of each voxel at that TR will be influenced by the word of interest, as well as all other words that have been uttered within the 1s TR, and the multimodal features of the video stimulus that fall within that timeframe. So, in our P&V, we are not looking for an instantaneous response but rather changes in the BOLD signal that correspond to the presence of linguistic features within the stimuli. 

      The chosen time window of analysis interacts with the human response function (HRF) in the following way: the HRF unfolds over several seconds, typically peaking around 5-6 seconds after stimulus onset and returning to baseline within 20-30 seconds (Handwerker et al., 2004).

      Our P&V is designed to match these dynamics of fMRI data with the timing of word stimuli. We apply different lags (4s, 5s, and 6s) to account for the delayed nature of the HRF, ensuring that we capture the brain's response to the stimuli as it unfolds over time, rather than assuming an immediate or infinitesimal effect. We find that the P&V yields our expected results for a 5s and a 6s lag, but not a 4s lag. This is in line with literature suggesting that the HRF for a given stimulus peaks around 5-6s after stimulus onset (Handwerker et al., 2004). As we are looking at very short stimuli (a few hundred ms) it makes sense that the distribution of features would significantly change with different lags. The fact that we find converging results for both a 5s and 6s lag, suggests that the delay is somewhere between 5s and 6s. There is no way of testing this hypothesis with the resolution of our brain data, however (1 TR). 

      (3) Were the group-averaged responses used for this analysis? 

      Yes, the response for each cluster was averaged across participants. We now report a participant-level overview of the Peak and Valley analysis (lagged at 5s) with similar results as the main analysis in the supplementary material see Figures S3, S4, and S5.

      (4) Why don't the other terms identified in Figure 5 show any correspondence to the expected categories? What does this mean? Can the authors also situate their results with respect to prior findings as well as visualize how stable these results are at the individual voxel or participant level? It would also be useful to visualize example time courses that demonstrate the peaks and valleys. 

      The terms identified in figure 5 are sensorimotor and affective features from the combined Lancaster and Brysbaert norms. As for the main P&V analysis, we only recorded a cluster as processing a given feature (or term) when there were significantly more instances of words highly rated in that dimension occurring at peaks rather than valleys in the HRF. For some features/terms, there were never significantly more words highly rated on that dimension occurring at peaks compared to valleys, which is why some terms identified in figure 5 do not show any significant clusters.  We have now also clarified this in the figure caption. 

      We situate the method in previous literature in lines 289 – 296. In essence, it is a variant of the well-known method called “reverse correlation” first detailed in Hasson et al., 2004 (reference from the manuscript) and later adapter to a peak and valley analysis in Skipper et al., 2009 (reference from the manuscript). 

      We now present a more fine-grained characterisation of each cluster on an individual participant level in the supplementary material. We doubt that it would be useful to present an actual example time-course as it would only represent a fraction of over one hundred thousand analysed time-series. We do already present an exemplary time-course to demonstrate the method in Figure 1. 

      Estimating contextual situatedness: 

      (1) Doesn't this limit the analyses to "visual" contexts only? And more so, frequently recognized visual objects? 

      Yes, it was the point of this analysis to focus on visual context only, and it may be true that conducting the analysis in this way results in limiting it to objects that are frequently recognized by visual convolutional neural networks. However, the state-of-the-art strength of visual CNNs in recognising many different types of objects has been attested in several ways (He et al., 2015). Therefore, it is unlikely that the use of CNNs would bias the analysis towards any specific “frequently recognised” objects. 

      (2) The measure of situatedness is the cosine similarity of GloVe vectors that depend on word co-occurrence while the vectors themselves represent objects isolated by the visual recognition models. Expectedly, "science" and the label "book" or "animal" and the label "dog" will be close. But can the authors provide examples of context displacement? I wonder if this just picks up on instances where the identified object in the scene is unrelated to the word. How do the authors ensure that it is a displacement of context as opposed to the two words just being unrelated? This also has a consequence on deciding the temporal cutoff for consideration (2 seconds). 

      The cosine similarity is between the GloVe vectors of the word (that is situated or displaced) and the words referring to the objects identified by the visual recognition model. Therefore, the correlation is between more than just two vectors and both correlated representations depend on co-occurrence. The cosine similarity value reported is not from a comparison between GloVe vectors and vectors that are (visual) representations of objects from the visual recognition model. 

      A word is displaced if all the identified object-words in the defined context window (2s before word-onset) are unrelated to the word (_see lines 105-110 (pg. 5); lines 371-380 pg. 1516 and Figure 2 caption). Thus, a word is considered to be displaced if _all identified objects (not just two as claimed by the reviewer) in the scene are unrelated to the word. Given a context of 60 frames and an average of 5 identified objects per frame (i.e. an average candidate set of 300 objects that could be related) per word, the bar for “displacement” is set high. We provide some further considerations justifying the context window below in our responses to reviewers 2 and 3. 

      (3) While the introduction motivated the problem of context situatedness purely linguistically, the actual methods look at the relationship between recognized objects in the visual scene and the words. Can word surprisal or another language-based metric be used in place of the visual labeling? Also, it is not clear how the process identified in (2) above would come up with a high situatedness score for abstract concepts like "truth". 

      We disagree with the reviewer that the introduction motivated the problem of context situatedness purely linguistically, as we explicitly consider visual context in the abstract as well as the introduction. Examples in text include lines 71-74 and lines 105-115. This is also reflected in the cited studies that use visual context, including Kalenine et al., 2014; Hoffmann et al., 2013; Yee & Thompson-Schill, 2016; Hsu et al., 2011. However, we appreciate the importance of being very clear about this point, so we added various mentions of this fact at the beginning of the introduction to avoid confusion.

      We know that prior linguistic context (e.g. measured by surprisal) does affect processing. The point of the analysis was to use a non-language-based metric of visual context to understand how this affects conceptual representation in naturalist settings. Therefore, it is not clear to us why replacing this with a language-based metric such as surprisal would be an adequate substitution. However, the reviewer is correct that we did not control for the influence of prior context. We obtained surprisal values for each of our words but could not find any significant differences between conditions and therefore did not include this factor in the analyses conducted.  For considerations of differences in surprisal between each of the analysed sets of words, see the supplementary material.  

      The method would yield a high score of contextual situatedness for abstract concepts if there were objects in the scene whose GloVe embeddings have a close cosine distance to the GloVe embedding of that abstract word (e.g., “truth” and “book”). We believe this comment from the reviewer is rooted in a misconception of our method. They seem to think we compared GloVe vectors for the spoken word with vectors from a visual recognition model directly (in which case it is true that there would be a concern about how an abstract concept like “truth” could have a high situatedness). Apart from the fact that there would be concerns about the comparability of vectors derived from GloVe and a visual recognition model more generally, this present concern is unwarranted in our case, as we are comparing GloVe embeddings.  

      (4) It is a bit hard to see the overlapping regions in Figures 6A-C. Would it be possible to show pairs instead of triples? Like "abstract across context" vs. "abstract displaced"? Without that, and given (2) above, the results are not yet clear. Moreover, what happens in the "overlapping" regions of Figure 3? 

      To make this clearer, we introduced the contrasts (abstract situated vs displaced and concrete situated vs displaced) that were previously in the supplementary materials in the main text (now Figure 6, this was also requested by reviewer 2). We now show the overlap between the abstract situated (from the contrast in Figure 6) with concrete across context and the overlap between concrete displaced (from the contrast in Figure 6) with abstract across context separately in Figure 7. 

      The overlapping regions of Figure 3 indicate that both concrete and abstract concepts are processed in these regions (though at different time-points). We explain why this is a result of our deconvolution analysis on page 23:  

      “Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame. In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus.”

      Miscellaneous comments: 

      (1) In Figure 3, it is surprising that the "concrete-only" regions dominate the angular gyrus and we see an overrepresentation of this category over "abstract-only". Can the authors place their findings in the context of other studies? 

      The Angular Gyrus (AG) is hypothesised to be a general semantic hub; therefore it is not surprising that it should be active for general conceptual processing (and there is some overlap activation in posterior regions). We now situate our results in a wider range of previous findings in the results section under “Conceptual Processing Across Context”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely, activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decision-making, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently time-consuming and reflective of the extended processing time for abstract concepts (Thompson-Schill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      The finding that concrete concepts activate more brain voxels compared to abstract concepts is generally aligned with existing research, which often reports more extensive brain activation for concrete versus abstract words. This is primarily due to the richer sensory and perceptual associations tied to concrete concepts - see for example Binder et al., 2005 (figure 2 in the paper). Similarly, a recent meta-analysis by Bucur & Pagano (2021) consistently found wider activation networks for the “concrete > abstract” contrast compared to the “abstract > concrete contrast”.   

      (2) The following line (Pg 21) regarding the necessary differences in time for the two categories was not clear. How does this fall out from the analysis method? 

      - Both categories overlap **(though necessarily at different time points)** in regions typically associated with word processing - 

      This is answered in our response above to point (4) in the reviewer’s comments. We now also provide more information on the temporal differences in the supplementary material (Figure S9). 

      Reviewer #2 (Public Review):

      The critical contrasts needed to test the key hypothesis are not presented or not presented in full within the core text. To test whether abstract processing changes when in a situated context, the situated abstract condition would first need to be compared with the displaced abstract condition as in Supplementary Figure 6. Then to test whether this change makes the result closer to the processing of concrete words, this result should be compared to the concrete result. The correlations shown in Figure 6 in the main text are not focused on the differences in activity between the situated and displaced words or comparing the correlation of these two conditions with the other (concrete/abstract) condition. As such they cannot provide conclusive evidence as to whether the context is changing the processing of concrete/abstract words to be closer to the other condition. Additionally, it should be considered whether any effects reflect the current visual processing only or more general sensory processing. 

      The reviewer identifies the critical contrast as follows:

      “The situated abstract condition would first need to be contrasted with the displaced abstract condition. Then, these results should be compared to the concrete result.” 

      We can confirm that this is indeed what had been done and we believe the reviewer’s confusion stems from a lack of clarity on our behalf. We have now made various clarifications on this point in the manuscript, and we changed the figures to make clear that our results are indeed based on the contrasts identified by this reviewer as the essential ones.

      Figure 6 in the main text now reflects the contrast between situated and displaced abstract and concrete conditions (as requested by the reviewer, this was previously Figure S7 from the supplementary material). To compare the results from this contrast to conceptual processing across context, we use cosine similarity, and we mention these results in the text. We furthermore show the overlap between the conditions of interest (abstract situated x concrete across context; concrete displaced x abstract across context) in a new figure (Figure 7) to bring out the spatial distribution of overlap more clearly.

      We also discussed the extent to which these effects reflect current visual processing only or more general sensory processing in lines 863 – 875 (pg. 33 and 34).   

      “In considering the impact of visual context on the neural encoding of concepts generally, it is furthermore essential to recognize that the mechanisms observed may extend beyond visual processing to encompass more general sensory processing mechanisms. The human brain is adept at integrating information across sensory modalities to form coherent conceptual representations, a process that is critical for navigating the multimodal nature of real-world experiences (Barsalou, 2008; Smith & Kosslyn, 2007). While our findings highlight the role of visual context in modulating the neural representation of abstract and concrete words, similar effects may be observed in contexts that engage other sensory modalities. For instance, auditory contexts that provide relevant sound cues for certain concepts could potentially influence their neural representation in a manner akin to the visual contexts examined in this study. Future research could explore how different sensory contexts, individually or in combination, contribute to the dynamic neural encoding of concepts, further elucidating the multimodal foundation of semantic processing.”

      Overall, the study would benefit from being situated in the literature more, including a) a more general understanding of the areas involved in semantic processing (including areas proposed to be involved across different sensory modalities and for verbal and nonverbal stimuli), and b) other differences between abstract and concrete words and whether they can explain the current findings, including other psycholinguistic variables which could be included in the model and the concept of semantic diversity (Hoffman et al.,). It would also be useful to consider whether difficulty effects (or processing effort) could explain some of the regional differences between abstract and concrete words (e.g., the language areas may simply require more of the same processing not more linguistic processing due to their greater reliance on word co-occurrence). Similarly, the findings are not considered in relation to prior comparisons of abstract and concrete words at the level of specific brain regions. 

      We now present an overview of the areas involved in semantic processing (across different sensory modalities for verbal and nonverbal stimuli) when we first present our results (section: “Conceptual Processing Across Context”).

      We looked at surprisal as a potential cofound and found no significant differences between any of the set of words analysed. Mean surprisal of concrete words is 22.19, mean surprisal of abstract words is 21.86. Mean surprisal ratings for concrete situated words are 21.98 bits, 22.02 bits for the displaced concrete words, 22.10 for the situated abstract words and 22.25 for the abstract displaced words. We also calculated the semantic diversity of all sets of words and found now significant differences between the sets. The mean values for each condition are: abstract_high (2.02); abstract_low (1.95); concrete_high (1.88); concrete_low (2.19); abstract_original (1.96); concrete_original (1.92). Hence processing effort related to different predictability (surprisal), or greater semantic diversity cannot explain our findings. 

      We submit that difficulty effects do not explain any aspects of the activation found for conceptual processing, because we included word frequency in our model as a nuisance regressor and found no significant differences associated with surprisal. Previous work shows that surprisal (Hale, 2001) and word frequency (Brysbaert & New, 2009) are good controls for processing difficulty.

      Finally, we added considerations of prior findings comparing abstract and concrete words at the level of specific brain regions to the discussion (section: Conceptual Processing Across Context). 

      The authors use multiple methods to provide a post hoc interpretation of the areas identified as more involved in concrete, abstract, or both (at different times) words. These are designed to reduce the interpretation bias and improve interpretation, yet they may not successfully do so. These methods do give some evidence that sensory areas are more involved in concrete word processing. However, they are still open to interpretation bias as it is not clear whether all the evidence is consistent with the hypotheses or if this is the best interpretation of individual regions' involvement. This is because the hypotheses are provided at the level of 'sensory' and 'language' areas without further clarification and areas and terms found are simply interpreted as fitting these definitions. For instance, the right IFG is interpreted as a motor area, and therefore sensory as predicted, and the term 'autobiographical memory' is argued to be interoceptive. Language is associated with the 'both' cluster, not the abstract cluster, when abstract >concrete is expected to engage language more. The areas identified for both vs. abstract>concrete are distinguished in the Discussion through the description as semantic vs. language areas, but it is not clear how these are different or defined. Auditory areas appear to be included in the sensory prediction at times and not at others. When they are excluded, the rationale for this is not given. Overall, it is not clear whether all these areas and terms are expected and support the hypotheses. It should be possible to specify specific sensory areas where concrete and abstract words are predicted to be different based on a) prior comparisons and/or b) the known locations of sensory areas. Similarly, language or semantic areas could be identified using masks from NeuroSynth or traditional metaanalyses.  A language network is presented in Supplementary Figure 7 but not interpreted, and its source is not given. 

      “The language network” was extracted through neurosynth and projected onto the “overlap” activation map with AFNI. We now specify this in the figure caption. 

      Alternatively, there could be a greater interpretation of different possible explanations of the regions found with a more comprehensive assessment of the literature. The function of individual regions and the explanation of why many of these areas are interpreted as sensory or language areas are only considered in the Discussion when it could inform whether the hypotheses have been evidenced in the results section. 

      We added extended considerations of this to the results (as requested by the reviewer) in the section “Conceptual Processing Across Contexts”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely,  activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decisionmaking, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer timeframe (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently timeconsuming and reflective of the extended processing time for abstract concepts (ThompsonSchill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      Additionally, these methods attempt to interpret all the clusters found for each contrast in the same way when they may have different roles (e.g., relate to different senses). This is a particular issue for the peaks and valleys method which assesses whether a significantly larger number of clusters is associated with each sensory term for the abstract, concrete, or both conditions than the other conditions. The number of clusters does not seem to be the right measure to compare. Clusters differ in size so the number of clusters does not represent the area within the brain well. Nor is it clear that many brain regions should respond to each sensory term, and not just one per term (whether that is V1 or the entire occipital lobe, for instance). The number of clusters is therefore somewhat arbitrary. This is further complicated by the assessment across 20 time points and the inclusion of the 'both' categories. It would seem more appropriate to see whether each abstract and concrete cluster could be associated with each different sensory term and then summarise these findings rather than assess the number of abstract or concrete clusters found for each independent sensory term. In general, the rationale for the methods used should be provided (including the peak and valley method instead of other possible options e.g., linear regression). 

      We included an assessment of whether each abstract and concrete cluster could be associated with each different sensory term and then summarised these findings on a participant level in the supplementary material (Figures S3, S4, and S5). 

      Rationales for the Amplitude Modulated Deconvolution are now provided on page 10 (specifically the first paragraph under “Deconvolution Analysis” in the Methods section) and for the P&V on pages 13, 14 and 15 (under “Peaks and Valley” (particularly the first paragraph) in the Methods section). 

      The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach that allows parametric variation within naturalistic stimuli, which is a potential strength of the study. This measure appears to vary little between objects that are present (e.g., animal or room), and those that are strongly (e.g., monitor) or weakly related (e.g., science). Additional information validating this measure may be useful, as would consideration of the range of values and whether the split between situated (c > 0.6) and displaced words (c < 0.4) is sufficient.  

      The main validation of our measure of contextual situatedness derives from the high accuracy and reliability of CNNs in object detection and recognition tasks, as demonstrated in numerous benchmarks and real-world applications. 

      One reason for low variability in our measure of contextual situatedness is the fact that we compared the GloVe vector of each word of interest with an average GloVe vector of all object-words referring to objects present in 56 frames (~300 objects on average). This means that a lot of variability in similarity measures between individual object-words and the word of interest is averaged out. Notwithstanding the resulting low variability of our measure, we thought that this would be the more conservative approach, as even small differences between individual measures (e.g. 0.4 vs 0.6) would constitute a strong difference on average (across the 300 objects per context window).  Therefore, this split ensures a sufficient distinction between words that are strongly related to their visual context and those that are not – which in turn allows us to properly investigate the impact of contextual relevance on conceptual processing.

      Finally, the study assessed the relation of spoken concrete or abstract words to brain activity at different time points. The visual scene was always assessed using the 2 seconds before the word, while the neural effects of the word were assessed every second after the presentation for 20 seconds. This could be a strength of the study, however almost no temporal information was provided. The clusters shown have different timings, but this information is not presented in any way. Giving more temporal information in the results could help to both validate this approach and show when these areas are involved in abstract or concrete word processing. 

      We provide more information on the temporal differences of when clusters are involved in processing concrete and abstract concepts in the supplementary material (Figure S9) and refer to this information where relevant in the Methods and Results sections. 

      Additionally, no rationale was given for this long timeframe which is far greater than the time needed to process the word, and long after the presence of the visual context assessed (and therefore ignores the present visual context). 

      The 20-second timeframe for our deconvolution analysis is justified by several considerations. Firstly, the hemodynamic response function (HRF) is known to vary both across individuals and within different regions of the brain. To accommodate this variability and capture the full breadth of the HRF, including its rise, peak, and return to baseline, a longer timeframe is often necessary. The 20-second window ensures that we do not prematurely truncate the HRF, which could lead to inaccurate estimations of neural activity related to the processing of words. Secondly and related to this point, unlike model-based approaches that assume a canonical HRF shape, our deconvolution analysis does not impose a predefined form on the HRF, instead reconstructing the HRF from the data itself – for this, a longer time-frame is advantageous to get a better estimation of the true HRF. Finally, and related to this point, the use of the 'Csplin' function in our analysis provides a flexible set of basis functions for deconvolution, allowing for a more fine-grained and precise estimation of the HRF across this extended timeframe. The 'Csplin' function offers more interpolation between time points, which is particularly advantageous for capturing the nuances of the HRF as it unfolds over a longer time-frame. 

      Although we use a 20-second timeframe for the deconvolution analysis to capture the full HRF, the analysis is still time-locked to the onset of each visual stimulus. This ensures that the initial stages of the HRF are directly tied to the moment the word is presented, thus incorporating the immediate visual context. We furthermore include variables that represent aspects of the visual context at the time of word presentation in our models (e.g luminance) and control for motion (optical flow), colour saturation and spatial frequency of immediate visual context. 

      Reviewer #3 (Public Review):

      The context measure is interesting, but I'm not convinced that it's capturing what the authors intended. In analysing the neural response to a single word, the authors are presuming that they have isolated the window in which that concept is processed and the observed activation corresponds to the neural representation of that word given the prior context. I question to what extent this assumption holds true in a narrative when co-articulation blurs the boundaries between words and when rapid context integration is occurring. 

      We appreciate the reviewer's critical perspective on the contextual measure employed in our study. We agree that the dynamic and continuous nature of narrative comprehension poses challenges for isolating the neural response to individual words. However, the use of an amplitude modulated deconvolution analysis, particularly with the CSPLIN function, is a methodological choice to specifically address these challenges. Deconvolution allows us to estimate the hemodynamic response function (HRF) without assuming its canonical shape, capturing nuances in the BOLD signal that may reflect the integration of rapid contextual shifts (only beyond the average modulation of the BOLD signal. The CSPLIN function further refines this approach by offering a flexible basis set for modelling the HRF and by providing a detailed temporal resolution that can adapt to the variance in individual responses. 

      Our choice of a 20-second window is informed by the need to encompass not just the immediate response to a word but also the extended integration of the contextual information. This is consistent with evidence indicating that the brain integrates information over longer timescales when processing language in context (Hasson et al., 2015). The neural representation of a word is not a static snapshot but a dynamic process that evolves with the unfolding narrative. 

      Further, the authors define context based on the preceding visual information. I'm not sure that this is a strong manipulation of the narrative context, although I agree that it captures some of the local context. It is maybe not surprising that if a word, abstract or concrete, has a strong association with the preceding visual information then activation in the occipital cortex is observed. I also wonder if the effects being captured have less to do with concrete and abstract concepts and more to do with the specific context the displaced condition captures during a multimodal viewing paradigm. If the visual information is less related to the verbal content, the viewer might process those narrative moments differently regardless of whether the subsequent word is concrete or abstract. I think the claims could be tailored to focus less generally on context and more specifically on how visually presented objects, which contribute to the ongoing context of a multimodal narrative, influence the subsequent processing of abstract and concrete concepts.

      The context measure, though admittedly a simplification, is designed to capture the local visual context preceding word presentation. By using high-confidence visual recognition models, we ensure that the visual information is reliably extracted and reflects objects that have a strong likelihood of influencing the processing of subsequent words. We acknowledge that this does not capture the full richness of narrative context; however, it provides a quantifiable and consistent measure of the immediate visual environment, which is an important aspect of context in naturalistic language comprehension.

      With regards to the effects observed in the occipital cortex, we posit that while some activation might be attributable to the visual features of the narrative, our findings also reflect the influence of these features on conceptual processing. This is especially because our analysis only looks at the modulation of the HRF amplitude beyond the average response (so also beyond the average visual response) when contrasting between conditions of high and low visual-contextual association with important (audio-visual) control variables included in the model. 

      Lastly, we concur that both concrete and abstract words are processed within a multimodal narrative, which could influence their neural representation. We believe our approach captures a meaningful aspect of this processing, and we have refined our claims to specify the influence of visually presented objects on the processing of abstract and concrete concepts, rather than making broader assertions about multimodal context. We also highlight several other signals (e.g. auditory) that could influence processing. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The approach taken here requires a lot of manual variable selection and seems a bit roundabout. Why not build an encoding model that can predict the BOLD time course of each voxel in a participant from the feature-of-interest like valence etc. and then analyze if (1) certain features better predict activity in a specific region (2) the predicted responses/regression parameters are more positive (peaks) or more negative (valleys) for certain features in a specific brain region (3) maybe even use contextual features use a large language model and then per word (like "truth") analyze where the predicted responses diverge based on the associated context. This seems like a simpler approach than having multiple stages of analysis. 

      It is not clear to us why an encoding model would be more suitable for answering the question at hand (especially given that we tried to clarify concerns about non-linear relationships between variables). On the contrary, fitting a regression model to each individual voxel has several drawbacks. First, encoding models are prone to over-estimate effect sizes (Naselaris et al., 2011). Second, encoding models are not good at explaining group-level effects due to high variability between individual participants (Turner et al., 2018). We would also like to point out that an encoding model using features of a text-based LLM would not address the visual context question - unless the LLM was multimodal. Multimodal LLMs are a very recent research development in Artificial Intelligence, however, and models like LLaMA (adapter), Google’s Gemini, etc. are not truly multimodal in the sense that would be useful for this study, because they are first trained on text and later injected with visual data. This relates to our concern that the reviewer may have misunderstood that we are interested in purely visual context of words (not linguistic context).

      (2) In multiple analyses, a subset of the selected words is sampled to create a balanced set between the abstract and concrete categories. Do the authors show standard deviation across these sets? 

      For the subset of words used in the context-based analyses, we give mean ratings of concreteness, log frequency and length and conduct a t-test to show that these variables are not significantly different between the sets. We also included the psycholinguistic control variables surprisal and semantic diversity, as well as the visual variables motion (optical flow), colour saturation and spatial frequency.  

      Reviewer #2 (Recommendations For The Authors):

      Figures S3-5 are central to the argument and should be in the main text (potentially combined).  

      These have been added to the main text

      S5 says the top 3 terms are DMN (and not semantic control), but the text suggests the r value is higher for 'semantic control' than 'DMN'? 

      Fixed this in the text, the caption now reads: 

      “This was confirmed by using the neurosynth decoder on the unthresholded brain image - top keywords were “Semantic Control” and “DMN”.”

      Fig. S7 is very hard to see due to the use of grey on grey. Not used for great effect in the final sentence, but should be used to help interpret areas in the results section (if useful). It has not been specified how the 'language network' has been identified/defined here. 

      We altered the contrast in the figure to make boundaries more visible and specified how the language network was identified in the figure caption. 

      In the Results 'This showed that concrete produced more modulation than abstract modulation in the frontal lobes,' should be parts of /some of the frontal lobes as this isn't true overall. 

      Fixed this in the text.  

      There are some grammatical errors and lack of clarity in the context comparison section of the results. 

      Fixed these in the text.

      Reviewer #3 (Recommendations For The Authors):

      •  The analysis code should be shared on the github page prior to peer review.  

      The code is now shared under: https://github.com/ViktorKewenig/Naturalistic_Encoding_Concepts

      •  At several points throughout the methods section, information was referred to that had not yet been described. Reordering the presentation of this information would greatly improve interpretability. A couple of examples of this are provided below. 

      Deconvolution Analysis: the use of amplitude modulation regression was introduced prior to a discussion of using the TENT function to estimate the shape of the HRF. This was then followed by a discussion of the general benefits of amplitude modulation. Only after these paragraphs are the modulators/model structure described. Moving this information to the beginning of the section would make the analysis clearer from the onset. 

      Fixed this in the text

      Peak and Valley Analysis: the hypotheses regarding the sensory-motor features and experiential features are provided prior to describing how these features were extracted from the data (e.g., using the Lancaster norms). 

      Fixed this in the text.

      •  The justification for and description of the IRF approach seems overdone considering the timing differences are not analyzed further or discussed. 

      We now present a further discussion of timing differences in the supplementary material.

      •  The need and suitability of the cluster simulation method as implemented were not clear. The resulting maps were thresholded at 9 different p values and then combined, and an arbitrary cluster threshold of 20 voxels was then applied. Why not use the standard approach of selecting the significance threshold and corresponding cluster size threshold from the ClustSim table? 

      We extracted the original clusters at 9 different p values with the corresponding cluster size from the ClustSim table, then only included clusters that were bigger than 20 voxels.  

      •  Why was the center of mass used instead of the peak voxel? 

      Peak voxel analysis can be sensitive to noise and may not reliably represent the region's activation pattern, especially in naturalistic imaging data where signal fluctuations are more variable and outliers more frequent. The centre of mass provides a more stable and representative measure of the underlying neural activity. Another reason for using the center of mass is that it better represents the anatomical distribution of the data, especially in large clusters with more than 100 voxels where peak voxels are often located at the periphery. 

      • Figure 1 seems to reference a different Figure 1 that shows the abstract, concrete, and overlap clusters of activity (currently Figure 3). 

      Fixed this in the text.

      • Table S1 seems to have the "Touch" dimension repeated twice with different statistics reported. 

      Fixed this in the text, the second mention of the dimension “touch” was wrong.  

      • It appears from the supplemental files that the Peaks and Valley analysis produces different results at different lag times. This might be expected but it's not clear why the results presented in the main text were chosen over those in the supplemental materials. 

      The results in the main text were chosen over those in the supplementary material, because the HRF is said to peak at 5s after stimulus onset. We added a specification of this rational to the “2. Peak and Valley Analysis” subsection in the Methods section.  

      References (in order of appearance) 

      (1) Neumann J, Lohmann G, Zysset S, von Cramon DY. Within-subject variability of BOLD response dynamics. Neuroimage. 2003 Jul;19(3):784-96. doi: 10.1016/s10538119(03)00177-0. PMID: 12880807.

      (2) Handwerker DA, Ollinger JM, D'Esposito M. Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage. 2004 Apr;21(4):1639-51. doi: 10.1016/j.neuroimage.2003.11.029. PMID: 15050587.

      (3) Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci. 2005 Jun;17(6):90517. doi: 10.1162/0898929054021102. PMID: 16021798

      (4) Bucur, M., Papagno, C. An ALE meta-analytical review of the neural correlates of abstract and concrete words. Sci Rep 11, 15727 (2021). heps://doi.org/10.1038/s41598-021-94506-9 

      (5) Hale., J. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies (NAACL '01). Association for Computational Linguistics, USA, 1–8. heps://doi.org/10.3115/1073336.1073357

      (6) Brysbaert, M., New, B. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41, 977–990 (2009). heps://doi.org/10.3758/BRM.41.4.977 

      (7) Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical Activity During Natural Vision. Science, 303(5664), 6.

      (8) Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. Neuroimage. 2011 May 15;56(2):400-10. doi: 10.1016/j.neuroimage.2010.07.073. Epub 2010 Aug 4. PMID: 20691790; PMCID: PMC3037423.

      (9) Turner BO, Paul EJ, Miller MB, Barbey AK. Small sample sizes reduce the replicability of task-based fMRI studies. Commun Biol. 2018 Jun 7;1:62. doi: 10.1038/s42003-0180073-z. PMID: 30271944; PMCID: PMC6123695.

      (10) He, K., Zhang, Y., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. Bioarchive (Tech Report). heps://doi.org/heps://doi.org/10.48550/arXiv.1512.03385

      (11) Hasson, U., & Egidi, G. (2015). What are naturalistic comprehension paradigms teaching us about language? In R. M. Willems (Ed.), Cognitive neuroscience of natural language use (pp. 228–255). Cambridge University Press. heps://doi.org/10.1017/CBO9781107323667.011

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Weaknesses:

      The only weakness of this study has been acknowledged by the authors: limited sample size.

      We thank the reviewer for the overall enthusiasm for this study.

      Reviewer #1 (Recommendations For The Authors):

      (1) Were there differences in the extent of head motion during sleep among sleep stages? How was the potential motion parameter differences handled during the statistical analyses?

      If there were large head motions that continued for a long time (e.g., longer than 1 minute), how did the authors deal with that scanning session? For an extremely long scanning session (3 hours), how was motion correction conducted? It would be great if the authors could provide more details.

      We found that N3 sleep stage had lowest head motion, followed by REM, N2, N1, and lastly Wake. In other words, participants have lower head motion during sleep than during Wakefulness. We added this information to the Supplemental Results, copied below.

      We performed standardized motion correction during preprocessing using AFNI regardless of the duration of the scans. We did not include motion parameters in the HMM model. Time frames with Excessive head motion (any of 6 head motion parameters exceeding 0.3 mm or degree) was censored. Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019).

      In Supplemental Results, “Motion parameters with sleep stages.

      Averaged motion across six motion parameters decreased from wake to light sleep to deep sleep at night 2. For example, mean (standard deviation) motion for each sleep stage is as follows, N1: 0.043 (0.37); N2: 0.039 (0.033); N3: 0.035 (0.031); REM: 0.035 (0.032); Wake: 0.057 (0.052).

      Similarly, the percentage of timepoints retained after censoring decreased from wake to light sleep to deep sleep at night 2. N1: 91%; N2: 93%; N3: 96%; REM: 89%; Wake 90%.”

      In the method section, “Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019). We also found that motion is lower during deep sleep compared to wake, see Supplemental Results.”

      (2) It is possible that the data input for the HMM analyses might vary among participants and between the two nights, how did the authors deal with this issue during statistical analyses?

      This is a great question. We standardized BOLD timecourses for each participant and each night to avoid differences among participants and between two nights. We revised the description in the method section to make this point clear.

      In the method section, “To prepare the data for analysis, we first standardized the participant-specific sets of 300 ROI timecourses (scaled to a mean of 0, and a standard deviation of 1), which were then concatenated across all participants. This standardization was performed separately for each night. ”

      (3) Figures 2 and 4, the top part seems to be missing, e.g., "Night 2" in Figure 2, and "N2-related" in Figure 4.

      Thank you for pointing out these errors. We fixed them.

      (4) Figure 3 seems to be more stretched vertically than horizontally.

      We revised the figure to ensure it appears balanced on both sides.

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      The study's weaknesses are its small sample size and the limited attempts at relating the identified fMRI brain states back to EEG.

      We thank the reviewer for the positive feedback and helpful suggestions for this study.

      General appraisal:

      The paper's conclusions are generally well-supported, but some additional analyses and discussions could improve the work.

      The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A somewhat missed opportunity is the absence of more analyses relating the HMM states back to EEG. It would be very helpful to the sleep field to see how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG time course at the moment of an HMM class switch (particularly when the PSG stage remains stable). While the authors did look at slow wave density and various physiological signals in different HMM states, a characterization of the EEG itself in terms of spectral features is missing. Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed.

      We thank the reviewer for this great suggestion. We performed EEG spectral analysis on each HMM state. Results were added to Suppementary Results and Supplementary Figure 10 and 11 (Copied below). Specifically, we confirmed that N3-related states had highest Delta power and that the Deep-N2 module showed different spectral profiles compared to Light-N2 module.

      In Supplemental Results: “We conducted spectral analysis for each TR and calculated the average power spectrum for each common EEG brainwave—Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-13 Hz), Beta (13-30 Hz), and Gamma (30-100 Hz)—across the 21 HMM states. See Supplementary Figure 10 and 11 for night 2 and night 1 data, respectively. As expected, we found that N3-related states 8 and 10 had highest Delta power in both nights. In addition, the Deep-N2 module had higher power in Theta and Alpha bands compared to the Light-N2 module.”

      It is unclear how the presently identified HMM brain states relate to the previously identified NREM and wake states by Stevner et al. (2019), who used a roughly similar approach. This is important, as similar brain states across studies would suggest reproducibility, whereas large discrepancies could indicate a large dependence on particular methods and/or the sample (also see later point regarding generalizability).

      This is a great question. There are some similarities and differences between the current study and Stevner et al. (2019). We discussed this in the Supplementary Discussion. Copied below.

      In the Supplementary Discussion: “Both studies demonstrated that HMM states can be effectively divided into meaningful modules solely based on transition probabilities. Furthermore, both studies indicated that pre-sleep wakefulness differs from post-sleep wakefulness.

      However, despite the similar approaches used, key differences in data acquisition and analysis make it challenging to directly compare HMM states between these two studies. Firstly, Stevner et al. (2019) collected only 1-hour-long sleep data from 18 participants, whereas our current study includes 8-hour-long sleep data from 12 participants for two consecutive nights. As discussed in the main text, full sleep cycling cannot be obtained from 1-hour long sleep due to the lack of REM stage and incomplete sleep cycles. Secondly, in Stevner et al. (2019) (Figure 4e), the four wake-NREM stages had roughly the same duration. In contrast, in our current study (Night 2, Figure 2A), the N2 stage comprises 43% of total sleep, which aligns with the natural N2 composition of nocturnal sleep stages. This discrepancy might explain the different number of N2-related states found in the two studies, with 3 out of 19 in Stevner et al. (2019) versus 13 out of 21 in our current study.”

      More justice could be done to previous EEG-based efforts moving beyond conventional AASM-defined sleep/wake states. Various EEG studies performed data-driven clustering of brain states, typically indicating more than 5 traditional brain states (e.g., Koch et al. 2014, Christensen et al. 2019, Decat. et al 2022). Beyond that, countless subdivisions of classical sleep stages have been proposed (e.g., phasic/tonic REM, N2 with/without spindles, N3 with global/local slow waves, cyclic alternating patterns, and many more). While these aren't incorporated into standard sleep stage classification, the current manuscript could be misinterpreted to suggest that improved/data-driven classifications cannot be achieved from EEG, which is incorrect.

      We agree with the reviewer that previous EEG-based efforts should be mentioned. We now added this in the manuscript. Copied below.

      In the Discussion section, “Third, we chose to not include EEG features in our data-driven model. However, the current method is not limited to fMRI data and can be applied to EEG data. Given that previous data-driven studies based on EEG data have suggested that there might be more than five traditional sleep stages (Christensen et al., 2019; Decat et al., 2022; Koch et al., 2014), as well as subdivisions within these traditional sleep stages (Brandenberger et al., 2005; Decat et al., 2022; Simor et al., 2020), future studies may apply data-driven models on both fMRI and EEG data. ”

      More discussion of the limitations of the current sample and generalizability would be helpful. A sample of N=12 is no doubt impressive for two nights of concurrent whole-night EEG-fMRI. Still, any data-driven approach can only capture the brain states that are present in the sample, and 12 individuals are unlikely to express all brain states present in the population of young healthy individuals. Add to that all the potentially different or altered brain states that come with healthy ageing, other demographic variables, and numerous clinical disorders. How do the authors expect their results to change with larger samples and/or varying these factors? Perhaps most importantly, I think it's important to mention that the particular number of identified brain states (here 21, and e.g. 19 in Stevner) is not set in stone and will likely vary as a function of many sample- and methods-related factors.

      We thank the reviewer for the great suggestions. We now included these points when discussing limitations in the Discussion section. We think that a HMM model with larger sample size might produce more fine-grained results, but this remains to be investigated when a more extensive dataset becomes available.

      In the Discussion section, “Secondly, while our study involved a relatively small number of participants (12), it included a large amount of fMRI data (~16 hours scan) per participant. Although the HMM trained on data from 12 participants was robust, the generalizability of the current results to different populations—such as healthy aging individuals and clinical populations—needs to be demonstrated in future studies, particularly with larger sample sizes and more diverse populations.”

      “Fourth, while we selected 21 HMM brain sleep states based on model evaluation parameters in the current study, the exact number of sleep states is not fixed and likely depends on various sample- and methods-related factors, such as sample size and model setups.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment #1: Insufficient Network Analysis for Explainability: The paper does not sufficiently delve into network analysis to determine whether the model's predictions are based on accurately identifying and matching the 18 items of the ROCF or if they rely on global, item-irrelevant features. This gap in analysis limits our understanding of the model's decision-making process and its clinical relevance.

      Response #1: Thank you for your comment. We acknowledge the importance of understanding the decision-making process of AI models is crucial for their acceptance and utility in clinical settings. However, we believe that our current approach, which focuses on providing individual scores for each of the 18 items of the Rey-Osterrieth Complex Figure (ROCF), inherently offers a higher level of explainability and practical utility for clinicians than a network analysis could. Our multi-head convolutional neural network is designed with a dedicated output head for each of the 18 items in the ROCF, and thus provides separate scores for each of the 18 items in the ROCF. This architecture helps that the model focuses on individual elements rather than relying on global, item-irrelevant features.

      This item-specific approach directly aligns with the traditional clinical assessment method, thereby making the results more interpretable and actionable for clinicians. The individual scores for each item provide detailed insights into a patient's performance. Clinicians can use these scores to identify specific areas of strength and weakness in a patient's visuospatial memory and drawing abilities.

      Furthermore, we evaluated the model's performance on each of the 18 items separately, providing detailed metrics that show consistent accuracy across all items. This item-level performance analysis offers clear evidence that the model is not relying on irrelevant global features but is indeed making decisions based on the specific characteristics of each item. We believe that our approach provides a level of explainability that is directly useful and relevant to clinical practitioners.

      Comment #2: Generative Model Consideration: The critique suggests exploring generative models to model the joint distribution of images and scores, which could offer deeper insights into the relationship between scores and specific visual-spatial disabilities. The absence of this consideration in the study is seen as a missed opportunity to enhance the model's explainability and clinical utility.

      Response #2: Thank you for your thoughtful comment and the suggestion to explore generative models. We appreciate the potential benefits that generative models to model the joint distribution of images and scores. However, we chose not to pursue this approach in our study for several reasons: First, our primary goal was to develop a model that provides accurate and interpretable scores for each of the 18 individual items in the ROCF figure. Second, generative models, while powerful, would add a layer of complexity that might diminish the clarity and immediate clinical applicability of our results. Generative models, (particularly deep learning-based) can be challenging to interpret in terms of how they make decisions or why they produce specific outputs. This lack can be a concern in critical applications involving neurological and psychiatric disorders. Clinicians require tools that provide clear insights without the need for additional layers of analysis. Our current model provides detailed, item-specific scores that clinicians can directly use to assess visuospatial memory and drawing abilities. Initially, we explored using generative models (i.e. GANs) for data augmentation to address the scarcity of low-score images compared to high-score images. Moreover, for the low-score images, the same score can be achieved by numerous combinations of figure elements. However, due to our extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, and test them on the data from the current prospective study to compare their performance with our results.

      In the revised manuscript, we have included additional sentences discussing the potential use of generative models and their implications for future research.

      “The data augmentation did not include generative models. Initially, we explored using generative models, specifically GANs, for data augmentation to address the scarcity of low-score images compared to high-score images. However, due to the extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, Future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, which can then be tested on the data from the current prospective study and compared with our results.”

      Comment #3: Lack of Detailed Model Performance Analysis Across Subject Conditions: The study does not provide a detailed analysis of the model's performance across different ages, health conditions, etc. This omission raises questions about the model's applicability to diverse patient populations and whether separate models are needed for different subject types.

      Response #3: Thank you for your this important comment. Although the initial version of our manuscript already provided detailed “item-specific” and “across total scores” performance metrics, we recognize the importance of including detailed analyses across different patient demographics to enhance the robustness and applicability of our findings. In response to your comment, we have conducted additional analyses that provide a comprehensive evaluation of model performance across various patient demographics, such as age groups, gender, and different neurological and psychiatric conditions. This additional analysis demonstrates the generalizability and reliability of our model across diverse populations. We have included these analyses in the revised manuscript.

      “In addition, we have conducted a comprehensive model performance analysis to evaluate our model's performance across different ROCF conditions (copy and recall), demographics (age, gender), and clinical statuses (healthy individuals and patients) (Figure 4A). These results have been confirmed in the prospective validation study (Supplementary Figure S6). Furthermore, we included an additional analysis focusing on specific diagnoses to assess the model's performance in diverse patient populations (Figure 4B). Our findings demonstrate that the model maintains high accuracy and generalizes well across various demographics and clinical conditions.”

      Comment #4: Data Augmentation: While the data augmentation procedure is noted as clever, it does not fully encompass all affine transformations, potentially limiting the model's robustness.

      Response #4: We appreciate your feedback on our data augmentation strategy. We acknowledge that while our current approach significantly improves robustness against certain semantic transformations, it may not fully cover all possible affine transformations.

      Here, we provide further clarification and justification for our chosen methods and their impact on the model's performance: In our study, we implemented a data augmentation pipeline to enhance the robustness of our model against common and realisitc geometric and semantic-preserving transformations. This pipeline included rotations, perspective changes, and Gaussian blur, which we found to be particularly effective in improving the model's resilience to variations in input data. These transformations are particularly relevant for the present application since users in real-life are likely to take pictures of drawings that might be slightly rotated or with a slightly tilted perspective. With these intuitions in mind, we randomly transformed drawings during training. Each transformation was a combination of Gaussian blur, a random perspective change, and a rotation with angles chosen randomly between -10° and 10°. These transformations are representative of realistic scenarios where images might be slightly tilted or photographed from different angles. We intentionally did not explicitly address all affine transformations, such as shearing or more complex geometric transformations because these transformations could alter the score of individual items of the ROCF and would be disruptive to the model.

      As noted in our manuscript and demonstrated in supplementary Figure S1, the data augmentation pipeline significantly improved the model's robustness against rotations and changes in perspective. Importantly, our tablet-based scoring application can further ensure that the photos taken do not exhibit excessive semantic transformations. By leveraging the gyroscope built into the tablet, the application can help users align the images properly, minimizing issues such as excessive rotation or skew. This built-in functionality helps maintain the quality and consistency of the images, reducing the likelihood of significant semantic transformations that could affect model performance.

      Comment #5: Additionally, the rationale for using median crowdsourced scores as ground truth, despite evidence of potential bias compared to clinician scores, is not adequately justified.

      Response #5: Thank you for this valuable comment. Clarifying the rationale behind using the median score of crowdsourcing as the ground truth is indeed important. To reach high accuracy in predicting individual sample scores of the ROCFs, it is imperative that the scores of the training set are based on a systematic scheme with as little human bias as possible influencing the score. However, our analysis (see results section) and previous work (Canham et al., 2000) suggested that the scoring conducted by clinicians may not be consistent, because the clinicians may be unwittingly influenced by the interaction with the patient/participant or by the clinicians factor (e.g. motivation and fatigue). For this reason and the incomplete availability of clinician scores for all figures (i.e. for 19% of the 20’225 figures), we did not use the clinicians scores as ground truth scores. Instead, we have trained a large pool (5000 workers) of human internet workers (crowdsourcing) to score ROCFs drawings guided by our self-developed interactive web application. Each element of the figure was scored by several human workers (13 workers on average per figure). We have obtained the ground truth for each drawing by computing the median for each item in the figure, and then summed up the medians to get the total score for the drawing in question. To further ensure high-quality data annotation, we identified and excluded crowdsourcing participants that have a high level of disagreement (>20% disagreement) with this rating from trained clinicians, who carefully scored manually a subset of the data in the same interactive web application.

      We chose the median score for several reasons: (1) the median score is less influenced by outliers compared to the mean. Given the variability of scoring between different clinicians and human workers (see human MSE and clinician MSE), using the median ensures that the ground truth is not skewed by extreme values, leading to more stable and reliable scores. (2) Crowdsource data do not always follow a normal distribution. In cases where the distribution is skewed or not symmetric, the median can be a more representative measure of the center. (3) The original scoring system involves ordinal scales (0,0.5,1,2). For ordinal scales, the median is often more appropriate than the mean. Finally, by aggregating multiple scores from a large pool of crowdsourced raters, the median provides a consensus that reflects the most common assessment. This approach mitigates the variability introduced by individual rater biases and ensures a more consistent ground truth. In clinical settings, the consensus of multiple expert opinions often serves as the benchmark for assessments. The use of median scores mirrors this practice, providing a ground truth that is representative of collective human judgment.

      Canham, R. O., S. L. Smith, and A. M. Tyrrell. 2000. “Automated Scoring of a Neuropsychological Test:

      The Rey Osterrieth Complex Figure.” Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future. https://doi.org/10.1109/eurmic.2000.874519.

      Reviewer #2:

      Comment #1: There is no detail on how the final scoring app can be accessed and whether it is medical device-regulated.

      Response #1: We appreciate the opportunity to provide more information about the current status and plans for our scoring application. At this stage, the final scoring app is not publicly accessible as it is currently undergoing rigorous beta testing with a select group of clinicians in real-world settings. The feedback from these clinicians is instrumental in refining the app’s features, interface, and overall functionality to improve its usability and user experience. This ensures that the app meets the high standards required for clinical tools. Following the successful completion of the beta testing phase, we aim to seek FDA approval for the scoring app. Achieving this regulatory milestone will guarantee that the app meets the stringent requirements for medical devices, providing an additional layer of confidence in its safety and efficacy for clinical use. Once FDA approval is obtained, we plan to make the app publicly accessible to clinicians and healthcare institutions worldwide. Detailed instructions on how to access and use the app will be provided at that time on our website (https://www.psychology.uzh.ch/en/areas/nec/plafor/research/rfp.html).

      Comment #2: No discussion on the difference in sample sizes between the pre-registration of the prospective study and the results (e.g., aimed for 500 neurological patients but reported data from 288). Demographics for the assessment of the representation of healthy and non-healthy participants were not present.

      Response #2: Thank you for your comment. We believe there might have been a misunderstanding regarding our preregistration details. In the preregistration, we planned to prospectively acquire ROCF drawings from 1000 healthy subjects. Each subject should have drawn two ROCF drawings (copy and memory condition). Consequently, 2000 samples should have been collected. In addition, in our pre-registration plan, we aimed to collect 500 drawings from patients (i.e. 250 patients), not 500 patients as the reviewer suggested (https://osf.io/82796). Thus in total, the goal was to obtain 2500 ROCF figures. The final prospective data set, which contained 2498 ROCF images from 961 healthy adults and 288 patients very closely matches the sample size, we aimed for in the the pre-registration. We do not see a necessity to discuss this negligible discrepancy in the main manuscript. The prospective data set remains substantial and sufficient to test our model on the independent prospective data set. Importantly, we want to highlight that the test set in the retrospective data set (4045 figures) was also never seen by the model. Both the retrospective and prospective data sets demonstrate substantial global diversity as the data has been collected in 90 different countries. Please note, that Supplementary Figures S2 & S3 provide detailed demographics of the participants in the prospectively collected data, present their performance in the copy and (immediate) recall condition across the lifespan, and the worldwide distribution of the origin of the data.

      Comment #3: Supplementary Figure S1 & S4 is poor quality, please increase resolution.

      Response #3: We apologize for the poor quality of Supplementary Figures S1 and S4 in the initial submission. In the revised version of our submission, we have increased the resolution of both Supplementary Figure S1 and Supplementary Figure S4 to ensure that all details are clearly visible and the figures are of high quality.

      Comment #4: Regarding medical device regulation; if the app is to be used in clinical practice (as it generates a score and classification of performance), I believe such regulation is necessary - but there are ways around it. This should be detailed.

      Response #4: We agree that regulation is essential for any application intended for use in clinical practice, particularly one that generates scores and classifications of performance. As discussed in response #1, the final scoring application is currently undergoing intensive beta testing in real-world settings with a limited group of clinicians and is therefore not publicly accessible at this time. We are fully committed to obtaining the necessary regulatory approvals before the app is made publicly accessible for clinical use. Once the beta testing phase is complete and the app has been refined based on clinician feedback, we will prepare and submit a comprehensive regulatory dossier. This submission will include all necessary data on the app's development, testing, validation, and clinical utility. We are adhering to relevant regulatory standards and guidelines, such as ISO 13485 for medical devices and the FDA's guidance on software as a medical device (SaMD).

      Comment #7: Need to clarify that work was already done and pre-printed in 2022 for the main part of this study, and that this paper contributes to an additional prospective study.

      Response #7: We would like to clarify that the pre-print the reviewer is referring to is indeed the current paper submitted to ELife. The submitted paper includes both the work that was pre-printed in 2022 and the additional prospective study, as you correctly identified.

      Reviewer #3:

      Comment #1: The considerable effort and cost to make the model only for an existing neuropsychological test.

      Response #1: We acknowledge that significant effort and resources were dedicated to developing our model for the Rey-Osterrieth Complex Figure (ROCF) test. Below, we provide a detailed rationale for this investment and the broader implications of our work. The ROCF test is one of the most widely used neuropsychological assessments worldwide, providing critical insights into visuospatial memory and executive function. While the initial effort and cost are substantial, the long-term benefits of an automated, reliable, objective, fast and widely applicable neuropsychological assessment tool justify the investment. The scoring application will significantly reduce the time for scoring the test and thus provide more efficient use of clinical resources, and the potential for broader applications makes this a worthwhile endeavor. The methods and infrastructure developed for this model can be adapted and scaled to other neuropsychological tests and assessments (e.g. Taylor Figure).

      Comment #2: I was truly impressed by the authors' establishment of a system that organizes the methods and fields of diverse specialties in such a remarkable way. I know the primary purpose of ROCFT. However, beyond the score, neuropsychologically, these are observed by specialists while ROCFT and that is attractive of the test: the turn of each stroke (e.g., from right to left, from the main structure to the margin or small structure), the process to total completeness as a figure, e.g., confidential speed and concentration, the boldness of strokes, unnatural fragmentation of strokes, the not deviated place in a paper, turning of the figure itself (before the scanning level), the total size, the level compared with the age, education, and experiences of the patient. Those are reflected by the disease, visuospatial intelligence, executive function, and ability to concentrate. Scores are crucial, but by observing the drawing process, we can obtain diverse facts or parts of symptoms that imply the complications of human behavior.

      Response #2: Thank you for your insightful comments and observations regarding our system for organizing diverse specialties within the ROCFT methodology. We agree that beyond the numerical scores, the detailed observation of the drawing process provides invaluable neuropsychological insights. How strokes are executed, from their direction and placement to the overall completion process, offers a nuanced understanding of factors like spatial orientation, concentration, and executive function. In fact, we are working on a ROCF pen tracking application, which enables the patient to draw the ROCF with a digital pen on a tablet. The tablet can 1) assess the sequence order of drawing the items and the number of strokes, 2) record the exact coordinate of each drawn pixel at each time point of the assessment, 3) measure the duration for each pen stroke as well as total drawing time, and 4) assess the pen stroke pressure. Through this, we aim to extract additional information on processing speed, concentration, and other cognitive domains. However, this development is outside the scope of the current manuscript.

    1. Author response:

      We would like to thank the editors and reviewers for their constructive feedback, and we look forward to addressing their comments in the revised manuscript. We also appreciate the acknowledgment that the use of laminar electrodes in awake-behaving animals is an important advancement for the TBI community, and that our results provide a potential physiological link between coalescing TBI pathologies and cognitive deficits. We believe that integrating the reviewer comments will help to make our analyses even more rigorous and will improve the overall manuscript. Please find comments related to specific concerns raised in the public review below:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion… It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments.

      Previous mechanistic and physiological studies suggested interneuronal dysfunction following TBI that we hypothesized would disrupt oscillatory dynamics underlying temporal coding (single unit entrainment to theta/gamma, phase precession, and phase-amplitude coupling). These are known to support hippocampal-dependent learning and memory tasks such as the Morris Water Maze. While we did not record during a goal-directed behavioral task, the goal of recording in a familiar and novel environment was to assess remapping across these environments. Unfortunately, occupancy in the two environments was not high enough to rigorously characterize place cell specificity and phase precession or and investigate remapping, although putative place cells were identified. Despite this shortcoming, we were still able to confirm that the spike timing of interneurons relative to hippocampal oscillations was disrupted which we believe underlies the massive reduction in theta-gamma phase amplitude coupling reported. This opens the door to more strongly hypothesis-driven, mechanistic studies (i.e. closed loop stimulation) to alter the spike timing of interneurons relative to theta phase and potentially rescue these effects on phase amplitude coupling and behavior.

      The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported… There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units.

      The number of rats used for the spatial working memory task was reported in the text and Figure legend where the statistics were reported, but we will ensure that the statistics are more completely reported by including relevant statistical results and parameters outside of the test used and p-value. Additionally, we will report the number of units recorded per animal.

      Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group

      The spatial working memory deficit that we report in the Morris Water Maze is not a novel finding and has been demonstrated numerous times in this TBI model. Our goal in including this was to increase the rigor of the study by verifying this deficit in our hands at the injury level used for these physiology experiments. The dissociation between spatial working memory deficits and other motor, motivational, or sensory deficits from TBI in the Morris Water Maze (e.g. swim speed and escape latency with visible platforms) has been well characterized in this TBI model at many injury levels including more severe injuries than those used in this study. We will address this in the Discussion as it is an important point.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      We agree that there is a broadband downward shift in power following TBI especially in the pyramidal cell layer. We will include a normalization of the power spectra in order to specifically compare the theta and gamma bands between sham and injured rats and include discussion about the broadband decrease.

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We agree that changes in hippocampal physiology that we report could arise due to disrupted inputs from TBI, and this study is inherently limited due to recording exclusively from CA1. We chose to record from the hippocampus due to its importance for learning and memory, and its vulnerability in TBI. Future studies will investigate how hippocampal afferents are affected by injury, and we hope that the layer-specific changes we report will help to inform which inputs may be preferentially disrupted. Importantly, these inputs along with local processing within the hippocampus change drastically depending on the behavior of the animal. We will more rigorously assess movement and the behavioral state of the rats when comparing physiological properties, especially the firing rates reported in Figure 3.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) In Figure 1, it is curious that the authors only chose E.coli and staphytlococcus sciuri to test the induction of Chi3l1. What about other bacteria? Why does only E.coli but not staphytlococcus sciuri induce chi3l1 production? It does not prove that the gut microbiome induces the expression of Chi3l1. If it is the effect of LPS, does it trigger a cell death response or inflammatory responses that are known to induce chi3l1 production? What is the role of peptidoglycan in this experiment? Also, it is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in this figure.

      Thank you for your valuable feedback and insightful suggestions. In our study, we tried to identify bacteria from murine gut contents and feces using 16S sequencing. However, only E. coli and Staphylococcus sciuri were identified (Figure 1D). Consequently, our experiments were limited to these two bacterial strains. While we have not tested other bacteria, our data suggest that not all bacteria can induce the expression of Chi3l1. Given that E. coli is Gram-negative and Staphylococcus sciuri is Gram-positive, we hypothesized that the difference in their ability to induce Chi3l1 expression might be due to variations between Gram-negative and Gram-positive bacteria, such as the presence of lipopolysaccharides (LPS).

      To test this hypothesis, we used LPS to induce Chi3l1 expression. Consistent with our hypothesis, LPS successfully induced Chi3l1 expression (Figure 1F&G). Additionally, we observed that Chi3l1 expression is significantly upregulated in specific pathogen-free (SPF) mice compared to germ-free mice (Figure 1A), demonstrating that the gut microbiome induces the expression of Chi3l1.

      Although we have not examined cell death or inflammatory responses, the protective role of Chi3l1 shown in Figure 5 suggests that any such responses would be mild and negligible. Regarding the role of peptidoglycan in the induction of Chi3l1 expression in DLD-1 cells, we have not yet explored this aspect. However, we agree with your suggestion that it would be worthwhile to investigate this in future experiments.

      We have also made the suggested modifications to the labeling (Figure 1A) and the clarification in the revised manuscript accordingly (page 3, Line 95-96; Line 102-106).

      Thank you again for your constructive feedback.

      (2) In Figure 2, the binding between Chi3l1 and PGN needs better characterization, regarding the affinity and how it compares with the binding between Chi3l1 and chitin. More importantly, it is unclear how this interaction could facilitate the colonization of gram-positive bacteria.

      Thank you for your insightful suggestions and we have performed the suggested experiments and included the results in the revised manuscript (Figure 2E-G, page 3-4, Line 132-146).

      Our results indicate that Chi3l1 interact with PGN in a dose-increase manner (Figure 2E). In contrast, the binding between Chi3l1 and chitin did not exhibit dose dependency (Figure 2E). These findings suggest a specific and distinct binding mechanism for Chi3l1 with PGN compared to chitin.

      We conducted DLD-1 cell-bacteria adhesion experiments, using GlmM mutant (PGN synthesis mutant) and K12 (wild-type) bacteria to test their adhesion capabilities. The results showed that the adhesion ability of the GlmM mutant to cells significantly decreased (Figure 2F). Additionally, after knocking down Chi3l1 in DLD-1 cells, we observed a decreased bacterial adhesion (Figure 2G). These findings suggest that Chi3l1 and PGN interaction plays a crucial role in bacterial adhesion.

      (3) In Figure 3, the abundance of furmicutes and other gram-positive species is lower in the knockout mice. What is the rationale for choosing lactobacillus in the following transfer experiments?

      We appreciate your thorough review. Among the Gram-positive bacteria that we have sequenced and analyzed, Lactobacillus occupies the largest proportion. Given the significant presence and established benefits of Lactobacillus, we chose it for the subsequent transfer experiments to leverage its known properties and availability, thereby ensuring the robustness and reproducibility of our findings.This is supported by the study referenced below.

      Lamas B, Richard ML, Leducq V, Pham HP, Michel ML, Da Costa G, Bridonneau C, Jegou S, Hoffmann TW, Natividad JM, Brot L, Taleb S, Couturier-Maillard A, Nion-Larmurier I, Merabtene F, Seksik P, Bourrier A, Cosnes J, Ryffel B, Beaugerie L, Launay JM, Langella P, Xavier RJ, Sokol H. CARD9 impacts colitis by altering gut microbiota metabolism of tryptophan into aryl hydrocarbon receptor ligands. Nat Med. 2016 Jun;22(6):598-605. doi: 10.1038/nm.4102. Epub 2016 May 9. PMID: 27158904; PMCID: PMC5087285.

      (4) FDAA-labeled E. faecalis colonization is decreased in the knockouts. Is it specific for E. faecalis, or it is generally true for all gram-positive bacteria? What about the colonization of gram-negative bacteria?

      Thank you for your insightful suggestions and we have investigated the colonization of gram-negative bacteria, OP50-mcherry (a strain of E.coli that express mCherry) and included the results in the updated manuscript (Supplementary Figure 3B, page 5, Line 197-200). We performed rectal injection of both wildtype and Chi11-/- mice with mCherry-OP50, and found that Chi11-/- mice had much higher colonization of E. coli compared to wildtype mice.

      (5) In Figure 5, the fact that FMT did not completely rescue the phenotype may point to the role of host cells in the processes. The reason that lactobacillus transfer did completely rescue the phenotypes could be due to the overwhelming protective role of lactobacillus itself, as the experiments were missing villin-cre mice transferred with lactobacillus.

      Thank you for your valuable feedback and thorough review. In our study, pretreatment with antibiotics in mice to eliminate gut microbiota demonstrated that IEC∆Chil1 mice exhibited a milder colitis phenotype (Supplementary Figure 4). This suggests that Chi3l1-expressing host cells are likely to play a detrimental role in colitis. Consequently, the failure of FMT to completely rescue the phenotype is likely due to the incomplete preservation of bacteria in the feces during the transfer experiment.

      We agree with your assessment of the protective role of lactobacillus. This also explains the significant difference in colitis phenotype between Villin-cre and IEC∆Chil1 mice (Figure 5B-E), as lactobacillus levels are significantly lower in IEC∆Chil1 mice (Figure 4F). Given the severity of colitis in Villin-cre mice at 7 days post-DSS, even if lactobacillus were transferred back to these mice, it is unlikely to result in a significant improvement.

      (6) Conflicting literature demonstrating the detrimental roles of Chi3l1 in mouse IBD model needs to be acknowledged and discussed.

      Thank you for your insightful suggestions and we have included additional discussions in the revised manuscript (page 6-7, Line 258-274).

      Reviewer #2 (Public Review):

      (1) Images are of great quality but lack proper quantification and statistical analysis. Statements such as "substantial increase of Chi3l1 expression in SPF mice" (Fig.1A), "reduced levels of Firmicutes in the colon lumen of IEC ∆ Chil1" (Fig.3F), "Chil1-/- had much lower colonization of E.faecalis" (Fig.4G), or "deletion of Chi3l1 significantly reduced mucus layer thickness" (Supplemental Figure 3A-B) are subjective. Since many conclusions were based on imaging data, the authors must provide reliable measures for comparison between conditions, as long as possible, such as fluorescence intensity, area, density, etc, as well as plots and statistical analysis.

      Thank you for your insightful suggestions and we have performed the suggested statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C).Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 4G. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent between the two groups.

      (2) In the fecal/Lactobacillus transplantation experiments, oral gavage of Lactobacillus to IECChil1 mice ameliorated the colitis phenotype, by preventing colon length reduction, weight loss, and colon inflammation. These findings seem to go against the notion that Chi3l1 is necessary for the colonization of Lactobacillus in the intestinal mucosa. The authors could speculate on how Lactobacillus administration is still beneficial in the absence of Chi3l1. Perhaps, additional data showing the localization of the orally administered bacteria in the gut of Chi3l1 deficient mice would clarify whether Lactobacillus are more successfully colonizing other regions of the gut, but not the mucus layer. Alternatively, later time points of 2% DSS challenge, after Lactobacillus transplantation, would suggest whether the gut colonization by Lactobacillus and therefore the milder colitis phenotype, is sustained for longer periods in the absence of Chi3l1.

      Thank you for your thorough review and insightful suggestions. Since we pretreated mice with antibiotics, the intestinal mucus layer is likely damaged according to a previous study (PMID: 37097253). Therefore, gavaged Lactobacillus cannot colonize in the mucus layer. Moreover, existing studies have shown that the protective effect of Lactobacillus is mainly derived from its metabolites or thallus components, rather than the living bacteria itself (PMID: 36419205, PMID: 27516254).

      Zhan M, Liang X, Chen J, Yang X, Han Y, Zhao C, Xiao J, Cao Y, Xiao H, Song M. Dietary 5-demethylnobiletin prevents antibiotic-associated dysbiosis of gut microbiota and damage to the colonic barrier. Food Funct. 2023 May 11;14(9):4414-4429. doi: 10.1039/d3fo00516j. PMID: 37097253.

      Montgomery TL, Eckstrom K, Lile KH, Caldwell S, Heney ER, Lahue KG, D'Alessandro A, Wargo MJ, Krementsov DN. Lactobacillus reuteri tryptophan metabolism promotes host susceptibility to CNS autoimmunity. Microbiome. 2022 Nov 23;10(1):198. doi: 10.1186/s40168-022-01408-7. PMID: 36419205.

      Piermaría J, Bengoechea C, Abraham AG, Guerrero A. Shear and extensional properties of kefiran. Carbohydr Polym. 2016 Nov 5;152:97-104. doi: 10.1016/j.carbpol.2016.06.067. Epub 2016 Jun 23. PMID: 27516254.

      Reviewer #3 (Public Review):

      The claim that mucus-associated Ch3l1 controls colonization of beneficial Gram-positive species within the mucus is not conclusive. The study should take into account recent discoveries on the nature of mucus in the colon, namely its mobile fecal association and complex structure based on two distinct mucus barrier layers coming from proximal and distal parts of the colon (PMID: ). This impacts the interpretation of how and where Ch3l1 is expressed and gets into the mucus to promote colonization. It also impacts their conclusions because the authors compare fecal vs. tissue mucus, but most of the mucus would be attached to the feces. Of the mucus that was claimed to be isolated from the WT and IEC Ch3l1 KO, this was not biochemically verified. Such verification (e.g. through Western blot) would increase confidence in the data presented. Further, the study relies upon relative microbial profiling, which can mask absolute numbers, making the claim of reduced overall Gram-positive species in mice lacking Ch3l1 unproven. It would be beneficial to show more quantitative approaches (e.g. Quantitative Microbial Profiling, QMP) to provide more definitive conclusions on the impact of Ch3l1 loss on Gram+ microbes.

      You raise an excellent point about the data interpretation, and we appreciate your insightful suggestions. We have included the discussion regarding the recent discoveries in the revised manuscript (page 7-8, Line 304-312). According to the recent discovery, the mucus in the proximal colon forms a primary encapsulation barrier around fecal material, while the mucus in the distal colon forms a secondary barrier. Our findings indicate that Chi3l1 is expressed throughout the entire colon, including the proximal, middle, and distal sections (See Author response image 1 below, P.S. Chi3l1 detection in colon presented in the manuscript are from the middle section). This suggests that Chi3l1 likely promotes bacterial colonization across the entire colon. Despite most mucus being expelled with feces, the

      constant production of mucus and the minimal presence of Chi3l1 in feces (Figure 4C) indicate that Chi3l1 continuously plays a role in promoting the colonization of microbiota.

      Author response image 1.

      Chi3l1 express in the proximal and distal colon. Immunofluoresence staining on proximal and distal colon sections to detect Chi3l1 (Red) expression. Nuclei were detected with DAPI (blue). Scale bars, 50um.

      Given the isolation method of the mucus layer, we followed the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Although we did not find a suitable marker representative of the mucus layer for western blotting, we performed protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in established studies (see Author response image 2 below).

      Author response image 2.

      Comparison of mucus layer proteins identified by mass spectrometry between Our team and the Hansson team Mucus layer proteins identified by mass spectrometry between our team and the Hansson team (PMID: 19432394) are compared.

      Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). Therefore, our data is reliable to some extent.

      Other weaknesses lie in the execution of the aims, leaving many claims incompletely substantiated. For example, much of the imaging data is challenging for the reader to interpret due to it being unfocused, too low of magnification, not including the correct control, and not comparing the same regions of tissues among different in vivo study groups. Statistical rigor could be better demonstrated, particularly when making claims based on imaging data. These are often presented as single images without any statistics (i.e. analysis of multiple images and biological replicates). These images include the LTA signal differences, FISH images, Enterococcus colonization, and mucus thickness.

      Thank you for your thorough review and insightful suggestions. We have performed the recommended statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C). We have also added arrows in Figure 2B to make the figure easier to understand. Additionally, we repeated some key experiments to show the same regions of tissues among different groups. We will upload higher resolution figures during the revision. Thank you again for your constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in Figure 1.

      Thank you for your insightful suggestion. We have also made the suggested modifications to the labeling (revised manuscript, Figure 1A).

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written, but it would benefit from a critical reading to correct some typos and small grammar issues. Histological and IF images would be more informative if they contained arrows and labels guiding the reader's attention to what the authors want to show. More details about the structures shown in the figures should be included in the legends.

      Thank you for your thorough review and insightful suggestions. We have revised the manuscript to correct noticeable typos and grammar issues. Arrows have been added to Figure 2A&B to make the figures easier to understand. Additionally, we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend ( revised manuscript, page 19, line 730-733).

      Minor points:

      • Page 1, line 36: Please correct "mice models" to "mouse models".

      Thank you for your insightful suggestion and we have made the suggested correction in the revised manuscript (page 1, line 41).

      • Page 3, line 110: "by comparing the structure of chitin with that of peptidoglycan (PGN), a component of bacterial cells walls, we observed that they have similar structures (Fig.2A)". Although both structures are shown side-by-side, no similarities are mentioned or highlighted in the text, figure, or legend.

      Thank you for your insightful suggestion and we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend (revised manuscript, page 19, line 730-733).

      • Fig.5C and Fig.5G: y axis brings "weight (%)". I believe the authors mean "weight change (%)"?

      We agrees with your suggestion and has corrected the labeling according to your suggestion (revised manuscript, Figure 5C and G)

      • Page 8: Genotyping method is described as a protocol. Please modify it.

      Thank you for your constructive suggestion and we have modified the genotyping method in the revised manuscript (page 8, line 339-349)

      • Please expand on the term "scaffold model" used in the abstract and discussion.

      Thank you for your thorough review. In this model, Chi3l1 acts as a key component of the scaffold. By binding to bacterial cell wall components like peptidoglycan, Chi3l1 helps anchor and organize bacteria within the mucus layer. This interaction facilitates the colonization of beneficial bacteria such as Lactobacillus, which are important for gut health. We included more descriptions regarding scaffold model in the revised manuscript (page 6, line 248-250)

      • Discussion session often recapitulates results description, which makes the text repetitive.

      Thank you for your constructive suggestion and we have removed unnecessary results description in the discussion session in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Major comments

      (1) Figure 1A. The staining is very faint, and hard to see. The reader cannot be certain those are Ch311-positive cells. Higher Mag is needed.

      Thank you for your insightful suggestion and we have included the higher resolution figures in the revised manuscript Figure 1A.

      (2) The mucus is produced largely by the proximal colon, is adherent to the feces, and mobile with the feces (PMID: 33093110). Therefore it is important to determine where the Ch311 is being expressed to be released into the lumen. Further Ch3l1 expression studies are needed to be done in both proximal and distal colon.

      Thank you for your thorough review and insightful suggestions. We have addressed this part in our public review. Additionally, we agree with your suggestions and will conduct further studies on Chi3l1 expression in both the proximal and distal colon.

      (3) Figure 1B. The image is out of focus for the Ileum, and the DAPI signal needs to be brought up for the colon. Which part of the colon is this? The UEA1+ cells do not really look like goblet cells. A better image with clearer goblet cells is needed.

      Thank you for your constructive suggestions. In the revised manuscript, we have included higher-resolution images (Figure 1B). The middle colon (approximately 3 to 4 cm distal from the cecum) was harvested for staining. In addition to UEA-1, we utilized anti-MUC2 antibody to label goblet cells in this colon segment (see Author response image 3 below). The patterns of goblet cells identified by UEA-1 or MUC2 antibodies are similar. The UEA-1-positive cells shown in Figure 1B are presumed to be goblet cells.

      Author response image 3.

      Goblet Cell Distribution in the Middle Colon. Goblet cells in the middle segment of the colon (approximately 3 to 4 cm distal from the cecum) were detected using immunofluorescence with antibodies against UEA-1 (green) and MUC2 (red). Scale bar=50μm. Representative images are shown from three mice individually stained for each antibody.

      (4) Figure 1G. There needs to be some counterstain or contrast imaging to show evidence that cells are present in the untreated sample.

      Thank you for your insightful suggestions. We have annotated the cells present in the untreated sample based on the overexposure in the revised manuscript (Figure 1G).

      (5) Figure 3B. Is this absolute quantification? How were the data normalized to allow comparison of microbial loads?

      Thank you for your thorough review. Figure 3B presents absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we amplified a short segment (179 bp) of the 16S rRNA gene using conserved 16S rRNA-specific primers and OP50 (a strain of E. coli) as the template. After gel extraction and concentration measurement, the PCR products were diluted to gradient concentrations (0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48 pg/µl). These gradient concentrations were used as templates for qPCR to generate a standard curve based on Ct values and bacterial concentration. The standard curve is used to calculate bacterial concentration in the samples. The data presented in Figure 3B represent the weight of bacteria/milligram sample, calculated as (bacterial concentration x bacterial volume) / (weight of feces or gut content).

      (6) Figure 3D. The major case is made for a dramatic reduction in Gram+ species, but Figure 1D does not show a dramatic change. Is this difference significant?

      Thank you for your thorough review. We don’t think we are clear about your question. However, there was no significant difference in Figure 3D. The dramatic reduction in Gram+ species are made based on the LTA, Firmicutes FISH, individual species comparison between WT and KO mice, bacterial QPCR results together (Figure 3E-H).

      (7) Figures 3E and 3F. These stainings are alone not convincing of reduced Gram+ in the KOs. Some stats are required for these images. An independent complementary method is also needed to quantify these with statistics since this data is so central to the study's conclusions.

      Thank you for your constructive suggestions. We have included statistical analysis in the revised manuscript (Figure 3E and F). Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 3E. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent with the Firmicutes FISH results. Complementary method such as bacterial QPCR have been employed to quantify these (Figure 4E, F). Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments.

      (8) Figure 3G. To make quantitative conclusions, the authors need to do quantitative microbial profiling (QMP) of the microbiota. Relative abundance masks absolute numbers, which could be increased. There are qPCR-based QMP platforms the authors could use (PMID: PMIDs: 31940382, 33763385).

      Thank you for your constructive suggestions. Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). In addition to the original bacterial qPCR data presented in the manuscript, we included another bacterial species, Turicibater. Consistent with the 16S rRNA sequencing analysis data, qPCR results showed that Turicibacter was more abundant in IECΔChil1 mice than Villin-cre mice (revised manuscript, supplementary Figure 3A, page 4, line 171-173) Therefore, our data is reliable to some extent.

      (9) Figure 4B. The data nicely shows Ch3l1 in mucus. However, no data supports the authors' main claim Ch3h1 binds Gram-positive bacteria in situ. Dual staining of Ch3l1 with Firmicutes probe would be supportive to show this interaction is happening in vivo.

      You raise an excellent point, and we agree with your suggestion that we should confirm Chi3l1 binding to Gram-positive bacteria in situ. During the study, we attempted dual staining of Chi3l1 with a universal bacterial 16S FISH probe several times, but we were unsuccessful. Despite various optimizations of the protocol, we were only able to detect bacteria, not Chi3l1. It appears that the antibody is not suitable for this method.

      (10) Figures 4D - F. Because mucus is associated with feces (PMID: ), the data with feces likely contains both Muc2/mucus and Feces. Therefore, it is unclear what the "mucus" is referring to in these figures. To support the authors' conclusions, there needs to be some validation that mucus was purified in the assays. This must be confirmed at a minimum by PAS staining on SDS PAGE gel (should be very high molecular weight) or Western blot with UEA lectin.

      Thank you for your insightful suggestions. As mentioned in the public review, the mucus layer was isolated following the protocol described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, after harvesting the middle colon from the mice, we cut open the colon longitudinally. After removing the gut contents, the lumen was vigorously rinsed in PBS while holding one end with forceps. The pellet obtained after centrifuging the rinsate was used as our mucus sample. Fresh feces were collected immediately after the mice defecated in a new, empty cage. We performed Western blot analysis to detect UEA lectin but were unsuccessful.

      However, as noted in the public review, we conducted protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in these established studies.

      (11) Figure 4E/F: The units of measurement are in pg/cm2, implying picogram per area. Can the authors please explain what this unit is referring to?

      We are grateful for your thorough review. The unit pg/cm ² represents picograms per square centimeter. Figures 4E and 4F present absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we harvested a 3x0.5 cm section of colon and a 9x0.4 cm section of ileum. And then we collected the mucus layer as previously described (responses to question 10). We measured bacterial concentration as described in response to question 5 using the equation (y = -1.53ln(x) + 13.581), where x represents the bacterial concentration and y represents the Ct value. After obtaining the bacterial concentration, we multiplied it by the volume of the rinsate and divided it by the area to obtain the values for pg/cm² used in the figures.

      (12) Figure 5E. Normal tissues appear to be from different colon regions from colitis tissues: the "Normal" looks like the proximal colon, while "Colitis" looks like the Distal colon. They cannot be directly compared.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5E to compare the same region of the colons.

      (13) Similarly, in Figure 5I it appears different colon regions are being compared between groups: Proximal colon in the bottom panels, and distal in the top panels. Since the proximal colon is less damaged by DSS, this data could be misleading.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5I to compare the same region of the colons.

      (14) In the DSS studies, are the VillinCre and IEC Chit3l1 mice co-housed littermates?

      Thank you for your insightful suggestion. In the DSS studies, the Villin-Cre and IECΔChil1 mice are not co-housed littermates. However, they are derived from the same lineage and are housed in the same rack within the same room of the animal facility.

      (15) Supplementary Figure 3: Mucus thickness images; are they representative? Stats are needed on multiple mice to support the claim that the mucus is thinner.

      Thank you for your insightful suggestion. The images are representative of 4 mice each group. We have now included the statistical analysis in the revised manuscript Supplementary Figure 3C&D.

      Minor

      (1) Introduction: Reference to "mucosal layer": "Mucosal" and "Mucus" are different things. "Mucosal" refers to the epithelium, lamina propria, and muscularis mucosa. "Mucus" refers to the secreted mucus gel, the focus of the authors' study. Therefore, the statement "mucosal layer" is not proper. "Mucosal layer" should be changed to "mucus layer."

      Thank you for your constructive suggestions and we have learned a lot from it. We have made the replacement of “mucosal layer” to “mucus layer in the revised manuscript.

      (2) Line 366 and related lines: Feces cannot be "dissolved". "Resuspended" is a better term.

      Thank you for your constructive suggestion and we have made the changes of “dissolved” to “resuspended” in the revised manuscript.

      (3) Lines 36-37 and 43-44 are redundant to each other.

      Thank you for your constructive suggestion and we have removed the lines 36-37 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SNG1.

      Strengths:

      The authors developed a tissue specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

      Weaknesses:

      Despite efforts at generating a liver-specific knockout, the phenotypic characterization is not focused on the key readouts. Notably missing are rigorous lipid flux studies and targeted gene expression/protein measurement that would underpin why loss of Snhg3 protects from lipid accumulation. Along those lines, claims linking the Snhg3 to MAFLD would be better supported with careful interrogation of markers of fibrosis and advanced liver disease. In other areas, significance is limited since the presented data is either not clear or rigorous enough. Finally, there is an important conceptual limitation to the work since PPARG is not established to play a major role in the liver.

      We thank the reviewer for the nice comment. As the reviewer comment, the manuscript still exists some shortcomings, we added partial shortcomings in the section of Discussion, please check them in the third paragraph on p17 and the first paragraph on p18.

      We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by high fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Sngh3 knockout reduced, while Sngh3 over-expression potentiated fatty liver in mice on a HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study. There are, however, several potential issues to consider before jumping to the conclusion.

      (1) First of all, it's important to ensure the robustness and rigor of each study. The manuscript was not carefully put together. The image qualities for several figures were poor, making it difficult for the readers to evaluate the results with confidence. The biological replicates and numbers of experimental repeats for cell-based assays were not described. When possible, the entire immunoblot imaging used for quantification should be presented (rather than showing n=1 representative). There were multiple mis-labels in figure panels or figure legends (e.g., Fig. 2I, Fig. 2K and Fig. 3K). The b-actin immunoblot image was reused in Fig. 4J, Fig. 5G and Fig. 7B with different exposure times. These might be from the same cohort of mice. If the immunoblots were run at different times, the loading control should be included on the same blot as well.

      We thank the reviewer for the detailed comment. We have provided the clear figures in revised manuscript, please check them.

      The biological replicates and numbers of experimental repeats for cell-based assays had been updated and please check them in the manuscript.

      The entire immunoblot imaging used for quantification had been provided in the primary data. Please check them.

      The original Figure 2I, Figure 2K, Figure 3K have been revised and replaced with new Figure 2F, 2H, 3H, and their corresponding figure legends has also been corrected in revised manuscript.

      The protein levels of CD36, PPARγ and β-ACTIN were examined at the same time and we had revised the manuscript, please check them in revised Figure 7B and C.

      (2) The authors can do a better job in explaining the logic for how they came up with the potential function of each component of the signaling cascade. Sngh3 is down-regulated by HFD. However, the evidence presented indicates its involvement in promoting steatosis. In Fig. 1C, one would expect PPARg expression to be up-regulated (when Sngh3 was down-regulated). If so, the physiological observation conflicts with the proposed mechanism. In addition, SND1 is known to regulate RNA/miRNA processing. How do the authors rule out this potential mechanism? How about the hosting snoRNA, Snord17? Does it involve in the progression of NASLD?

      We thank the reviewer for the detailed comment. In this study, although the expression of Snhg3 was decreased in DIO mice, Snhg3 deficiency decreased the expression of hepatic PPARγ and alleviated hepatic steatosis in DIO mice, and Snhg3 overexpression induced the opposite effect, which led us to speculate that the downregulation of Snhg3 in DIO mice might be a stress protective reaction to high nutritional state, but the specific details need to be clarified. This is probably similar to FGF21 and GDF15, whose endogenous expression and circulating levels are elevated in obese humans and mice despite their beneficial effects on obesity and related metabolic complications (Keipert and Ost, 2021). We had added the content in the Discussion section, please check it in the second paragraph on p12.

      SND1 has multiple roles through associating with different types of RNA molecules, including mRNA, miRNA, circRNA, dsRNA and lncRNA. We agree with the reviewer good suggestion, the potential mechanism of SND1/lncRNA-Snhg3 involved in hepatic lipid metabolism needs to be further investigated. We also discussed the limitation in the manuscript and please refer the section of Discussion in the third paragraph on p17.

      Snhg3 serves as host gene for producing intronic U17 snoRNAs, the H/ACA snoRNA. A previous study found that cholesterol trafficking phenotype was not due to reduced Snhg3 expression, but rather to haploinsufficiency of U17 snoRNA (Jinn et al., 2015). Additionally, knockdown of U17 snoRNA in vivo protected against hepatic steatosis and lipid-induced oxidative stress and inflammation (Sletten et al., 2021). In this study, the expression of U17 snoRNA decreased in the liver of DIO Snhg3-HKO mice and remain unchanged in the liver of DIO Snhg3-HKI mice, but overexpression of U17 snoRNA had no effect on the expression of SND1 and PPARγ (figure supplement 5A-C), indicating that Sngh3 induced hepatic steatosis was independent on U17 snoRNA. We had discussed it in revised manuscript, please refer to p15 of the Discussion section.

      References

      JINN, S., BRANDIS, K. A., REN, A., CHACKO, A., DUDLEY-RUCKER, N., GALE, S. E., SIDHU, R., FUJIWARA, H., JIANG, H., OLSEN, B. N., SCHAFFER, J. E. & ORY, D. S. 2015. snoRNA U17 regulates cellular cholesterol trafficking. Cell Metab, 21, 855-67. DIO:10.1016/j.cmet.2015.04.010, PMID:25980348

      KEIPERT, S. & OST, M. 2021. Stress-induced FGF21 and GDF15 in obesity and obesity resistance. Trends Endocrinol Metab, 32, 904-915. DIO:10.1016/j.tem.2021.08.008, PMID:34526227

      SLETTEN, A. C., DAVIDSON, J. W., YAGABASAN, B., MOORES, S., SCHWAIGER-HABER, M., FUJIWARA, H., GALE, S., JIANG, X., SIDHU, R., GELMAN, S. J., ZHAO, S., PATTI, G. J., ORY, D. S. & SCHAFFER, J. E. 2021. Loss of SNORA73 reprograms cellular metabolism and protects against steatohepatitis. Nat Commun, 12, 5214. DIO:10.1038/s41467-021-25457-y, PMID:34471131

      (3) The role of PPARg in fatty liver diseases might be a rodent-specific phenomenon. PPARg agonist treatment in humans may actually reduce ectopic fat deposition by increasing fat storage in adipose tissues. The relevance of the finding to human diseases should be discussed.

      We thank the reviewer for the detailed comment. We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not have further recommendations beyond what I mentioned in the original review. The authors have not adequately addressed all the issues but the manuscript has improved and the overall strength of evidence is now solid from incomplete.

      We appreciate positive feedback from the reviewer. While we acknowledge that the updated manuscript has significantly improved, we recognize that it remains incomplete and additional details regarding Snhg3 will be warranted in our future studies. Moreover, we have discussed those potential weakness in the section of Discussion (please refer in the third paragraph on p17 and the first paragraph on p18).

      Reviewer #2 (Recommendations For The Authors):

      The authors have provided explanations and some new data to clarify the comments from the first submission. They have also included the original immunoblots for all the experimental repeats. The CHX protein stability results shown in Fig. 5J were not consistent between experiments, perhaps because the difference was subtle. The results on PPARg protein expression were not clearcut. The inclusion of a PPARg knockdown control would be helpful to validate the specificity of the antibody. Of note, the immunoblots used for Fig. 5I (PA treated) repeats 2, 4 and 1 were identical to those of Fig. 7F repeats 3, 1 and 5. The authors should provide an explanation for the potential issue.

      We thank the further comments and suggestions from the reviewer. We agree with the reviewer comment about Snhg3-mediated SND1 protein stability. In this study, Snhg3 promoted the protein, not mRNA, level of SND1, but Snhg3 subtly increased the SND1 protein stability. We revised the description in the manuscript, “Meanwhile, Snhg3 regulated the protein, not mRNA, expression of SND1 in vivo and in vitro by mildly promoting the stability of SND1 protein (Figures 5G-I).” This revision can be found in the second paragraph on p9. While our findings indicated that Snhg3 can influence SND1 expression at the protein level, we acknowledge the possibility of additional mechanisms contributing to this complex regulatory network. Therefore, further investigation is necessary to clarify whether Snhg3 regulates SND1 protein expression through other potential mechanisms. In light of this, we have added it in the Discussion section. Please refer to the second paragraph on p16.

      In this study, the protein level of PPARγ (molecular weight ~57 kDa) was detected using anti-PPARγ antibody (Abclonal, Cat. A11183), which has been used to determine PPARγ protein expression in 13 published papers as showed in the ABclonal Technology Co., Ltd. (https://abclonal.com.cn/catalog/A11183). And the specificity of this antibody has been validated in Zhang’s study by PPARγ knockdown (Zhang et al., 2019). In our study, hepatic PPARγ protein sometimes showed two bands (~ 57kDa and > 75kDa) using this antibody. It is well established that the PPARγ gene encodes two protein isoforms (PPARγ1, a 477 amino acid protein, and PPARγ2, a 505 amino acid protein) via differential promoter usage and alternative splicing (Gene: Pparg (ENSMUSG00000000440) - Transcript comparison - Mus_musculus - Ensembl genome browser 112) (Hernandez-Quiles et al., 2021). The molecular weight difference between PPARγ1 and PPARγ2 is about 3kd. Therefore, we consider that the band shown larger than 75kd in our study is likely nonspecific. In line with the reviewer’s suggestion, the antibody’s specificity could be further validated by knockdown or knockout of PPARγ in the future.

      We thank the reviewer for the detailed comment. In this study, we tested the effect of Snhg3 overexpression on SND1 protein level and the effect of Snhg3 or Snd1 overexpression on PPARγ protein level in Hepa1-6 cells by transfecting with Snhg3, SND1 and the control, respectively. The results indicated that overexpression of Snhg3 promoted the protein levels of SND1 and PPARγ, and overexpression of SND1 also induced the protein level of PPARγ. Considering scholarly and professional thinking and writing, we firstly showed that overexpression of Snhg3 promoted the protein level of SND1 in Figure 5I, followed by demonstrating that the overexpression of Snhg3 or SND1 elicited PPARγ expression in Figures 7F. However, we acknowledge that this order of presentation may cause confusion. In fact, these experiments were repeatedly performed by multiple times, and we have provided the new original western blot data and analysis for Figure 5I (PA treatment) for further clarification. Please check them.

      References

      HERNANDEZ-QUILES, M., BROEKEMA, M. F. & KALKHOVEN, E. 2021. PPARgamma in Metabolism, Immunity, and Cancer: Unified and Diverse Mechanisms of Action. Front Endocrinol (Lausanne), 12, 624112. DIO:10.3389/fendo.2021.624112, PMID:33716977

      ZHANG, Z., ZHAO, G., LIU, L., HE, J., DARWAZEH, R., LIU, H., CHEN, H., ZHOU, C., GUO, Z. & SUN, X. 2019. Bexarotene Exerts Protective Effects Through Modulation of the Cerebral Vascular Smooth Muscle Cell Phenotypic Transformation by Regulating PPARgamma/FLAP/LTB(4) After Subarachnoid Hemorrhage in Rats. Cell Transplant, 28, 1161-1172. DIO:10.1177/0963689719842161, PMID:31010302

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how is the energy produced by the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work for addressing mechanical force transmission between the cytoplasm and the cell surface. 

      Strengths: 

      (1) Convincing evidence shows AgmT functions as an LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant. 

      (2) Authors show agmT mutants develop morphological changes in response to treatment with a b-lactam antibiotic, mecillinam. 

      (3) The use of single-molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds. 

      (4) The authors understand the limitations of their work and do not overinterpret their data. 

      Weaknesses: 

      (1) A clear model of AgmT's role in gliding motility or interactions with other A-motility proteins is not provided. Instead, speculative roles for how AgmT enzymatic activity could facilitate bFAC function in A-motility are discussed. 

      We appreciate the reviewer for this comment. We have added a new figure, Fig. 6, and updated the Discussion to propose a mechanism, “rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      (2) Although agmT mutants do not swarm, in-depth phenotypic analysis is lacking. In particular, do individual agmT mutant cells move, as found with other swarming defective mutants, or are agmT mutants completely nonmotile, as are motor mutants? 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the gliding phenotype of the ΔagmT pilA mutant on the single cell level. We found that the ΔagmT pilA cells are not completely static. Instead, they move for less than half cell length before pauses and reversal. We moved on to quantify the velocity and gliding persistency and found that the gliding phenotype of the ΔagmT pilA cells matches the prediction on the bFACs that loses the connection between the inner subcomplexes and PG.  

      We then imaged individual ∆agmT pilA- cells on 1.5% agar surface at 10-s intervals using bright-field microscopy. To our surprise, instead of being static, individual ∆agmT pilA- cells displayed slow movements, with frequent pauses and reversals (Video 1). To quantify the effects of AgmT, we measured the velocity and gliding persistency (the distances cells traveled before pauses and reversals) of individual cells. Compared to the pilA- cells that moved at 2.30 ± 1.33 μm/min (n = 46) and high persistency (Video 2 and Fig. 2C, D), ∆agmT pilA- cells moved significantly slower (0.88 ± 0.62 μm/min, n = 59) and less persistent (Video 1 and Figure. 2C, D). Such aberrant gliding motility is distinct from the “hyper reversal” phenotype. Although the hyper reversing cells constitutively switching their moving directions, they usually maintain gliding velocity at the wild-type level27. due to the polarity regulators Instead, the slow and “slippery” gliding of the ∆agmT pilA- cells matches the prediction that when the inner complexes of bFACs lose connection with PG, bFACs can only generate short, and inefficient movements19. Our data indicate that AgmT is not essential component in the bFACs. Thus, AgmT is likely to regulate the assembly and stability of bFACs, especially their connection with PG.         

      (3) The bioinformatic and comparative genomics analysis of agmT is incomplete. For example, the sequence relationships between AgmT, MltG, and the 13 other LTG proteins in M. xanthus are not clear. Is E. coli MltG the closest homology to AgmT? Their relationships could be addressed with a phylogenetic tree and/or sequence alignments. Furthermore, are there other A-motility genes in proximity to agmT? Similarly, does agmT show specific co-occurrences with the other A-motility genes across genera/species?  

      We answered the first question in the Discussion (it was in the first Results section in the previous version), “Both M. xanthus AgmT and E. coli MltG belong to the YceG/MltG family, which is the first identified LTG family that is conserved in both Gram-negative and positive bacteria25,41. About 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25. The unique inner membrane localization of this family and the fact that AgmT is the only M. xanthus LTG that belongs to this family (Table S2) could partially explain why it is the only LTG that contributes to gliding motility”.

      For the second, we added one sentence in the Results, “No other motility-related genes are found in the vicinity of agmT”.

      For the third question, we do not believe a co-occurrence analysis is necessary. Because M. xanthus gliding is very unique but “about 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25”, gliding should show no co-occurrence with the YceG/MltG family LTGs.

      (4) Related to iii, what about the functional relationship of the endogenous 13 LTG genes? Although knockout mutants were shown to be motile, presumably because AgmT is present, can overexpression of them, similar to E. coli MltG, complement an agmT mutant? In other words, why does MltG complement and the endogenous LTG proteins appear not to be relevant? 

      We appreciate the reviewer for this question, which prompted us to think the uniqueness of AgmT more carefully. AgmT is unique for its inner-membrane localization, rather than activity. We answered this question in the discussion, “LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands”. We then moved on to propose a possible mechanism, “E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”. 

      (5) Based on Figure 2B, overexpression of MltG enhances A-motility compared to the parent strain and the agmT-PAmCh complemented strain, is this actually true? Showing expanded swarming colony phenotypes would help address this question. 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the effects of MltG expression at the single-cell level. We found that “Consistent with its LTG activity, the expression of MltGEc restored gliding motility of the ΔagmT pilA- cells on both the colony (Fig. 2B) and single-cell (Fig. 2C, D) levels. Interestingly, in the absence of sodium vanillate, the leakage expression of MltGEc using the vanillate-inducible promoter was sufficient to compensate the loss of AgmT. A plausible explanation of this observation is that as E. coli grows much faster (generation time 20 - 30 min) than M. xanthus (generation time ~4 h), MltGEc could possess significantly higher LTG activity than AgmT. Induced by 200 μM sodium vanillate, the expression of MltGEc further but non significantly increased the velocity and gliding persistency (Fig. 2B-D). Importantly, the expression of MltGEc failed to restore gliding motility in the agmTEAEA pilA cells, even in the presence of 200 μM sodium vanillate (Fig. 2B). Consistent with the mecillinam resistance assay (Fig. 3C), this result suggests that AgmTEAEA still binds to PG and that in the absence of its LTG activity, AgmT does not anchor bFACs to PG”. These results are shown in the new panels C and D in Figure 2. 

      (6) Cell flexibility is correlated with gliding motility function in M. xanthus. Since AgmT has LTG activity, are agmT mutants less flexible than WT cells and is this the cause of their motility defect? 

      We appreciate the reviewer for bringing up an important question. We saw cells that lack AgmT making S-turns and U-turns frequently under microscope. We used a GRABS assay to quantify cell stiffness and found that neither the absence of AgmT nor the expression of MltGEc affect cell stiffness. We added this result in the manuscript, “The assembly of bFACs produces wave-like deformation on cell surface6,37, suggesting that their assembly may require a flexible PG layer2,6,11,12. As a major contributor to cell stiffness, PG flexibility affects the overall stiffness of cells38. To test the possibility that AgmT and MltGEc facilitate bFAC assembly by reducing PG stiffness, we adopted the GRABS assay38 to quantify if the lack of AgmT and the expression of MltGEc affects cell stiffness. To quantify changes in cell stiffness, we simultaneously measured the growth of the pilA-, ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells in a 1% agarose gel infused with CYE and liquid CYE and calculated the GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- cells using the pilA- cells as the reference, where positive and negative GRABS scores indicate increased and decreased stiffness, respectively (see Materials and Methods and Ref38). The GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells were -0.06 ± 0.04 and -0.10 ± 0.07 (n = 4), respectively, indicating that neither AgmT nor MltGEc affects cell stiffness significantly. Whereas PG flexibility could still be essential for gliding, AgmT and MltGEc do not regulate bFAC assembly by modulating PG stiffness. Instead, these LTGs could connect bFACs to PG by generating structural features that are irrelevant to PG stiffness”.      

      Reviewer #2 (Public Review): 

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in the failure of focal adhesion complexes to properly interact with the cell wall. 

      An interpretive weakness is the rather direct role attributed to AgmT in focal adhesion assembly. While their data clearly show that AgmT is important, it is unclear whether this is the direct consequence of AgmT somehow promoting bFAC binding to PG or just an indirect consequence of changed cell wall architecture without AgmT. In E. coli, an MltG mutant has increased PG strain length, suggesting that M. xanthus's PG architecture may likewise be compromised in a way that precludes AglR binding to the cell wall. However, this distinction would be very difficult to establish experimentally. MltG has been shown to associate with active cell wall synthesis in E. coli in the absence of protein-protein interactions, and one could envision a similar model in M. xanthus, where active cell wall synthesis is required for focal adhesion assembly, and MltG makes an important contribution to this process. 

      Based on the data that AgmT does not assemble into bFACs and that heterologous MltGEc substitutes M. xanthus AgmT in gliding, we believe that AgmT facilitates the proper assembly of bFACs indirectly. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates proper bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue:  

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The last sentence of the Discussion implies that anchoring LTG (AgmT) in the inner membrane is important. I did not see this mentioned about AgmT. Does it contain an inner membrane anchoring domain? Along these lines, the AgmT and MltG proteins appear to be of different sizes (Figure 1A). Please clarify, perhaps including full-length sequence alignment and/or domain architecture for these proteins. 

      We revised the first paragraph in the Results and clarified, “Among these genes, agmT (ORF K1515_0491023) was predicted to encode an inner membrane protein with a single N-terminal transmembrane helix (residues 4 – 25) and a large “periplasmic solute-binding” domain22.”

      We appreciate the reviewer for spotting the mistake in Fig. 2A. The E. coli MltG sequence shown in the alignment starts from residue 158, instead of 88. We have corrected this mistake in the figure. M. xanthus AgmT and E. coli MltG are of similar sizes, with 239 and 240 amino acids, respectively. 

      In Figure 3 legend, define D3. 

      The definition of D_3_ was added into the figure legend.

      Figure 4A shows 100-frame composite micrographs, but no time interval between frames is given. 

      The imaging frequency, 10 Hz, was stated in the text. We also added this information into the figure legend.

      Line 98, the term "Especially" does not flow well, change to "This includes the characteristic..." or similar. 

      We deleted “especially” from the sentence.

      Line 179, "not" is not accurate, replace with "rarely." 

      Changed.

      Line 188, add a qualifier, "proper" before "bFACs assembly." 

      Added.

      Lines 196 and 202, provide the sizes of each protein in these fusion constructs. 

      We added these numbers to the figure legend.

      In Figure 5A add arrows to identify each band. State in legend whether this is a denaturing gel, if so, why are AgmT-PAmCherry homodimers present?

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

      Line 207, "near evenly along cell bodies" does not seem consistent with Figure 5B as there looks to be an enrichment of AgmT at cell poles. 

      We have replaced panel 5B with more typical images. Due to the shape difference between cell poles and the cylindrical nonpolar regions, many surface-associated proteins could appear “enriched” at cell poles. This effect was very obvious in Fig. 5B, possibly due to the unevenness of the agar surface. We examined our data carefully and did not find significant polar enrichment. Compared to AglZ that significantly enriches at poles and forms evenly-spaced clusters along the cell body, the localization of AgmT is completely different.  

      Lines 252 and 260, change "Fig. 5B" to "Fig. 5C." 

      We apologize for these mistakes. They have been corrected.

      Line 266, insert "the" before "cell envelope." 

      Added.

      Line 278, insert "presumably" between "AgmT generates (small openings)" 

      Corrected.

      Reviewer #2 (Recommendations For The Authors): 

      - Major comment: I would rephrase conclusions regarding a direct role of AgmT in focal adhesion assembly since these data are indirect (AglR binding to the cell wall is reduced in the absence of AgmT - this could also be interpreted as the absence of AgmT causing altered cell wall architecture that precludes AglR binding). Example: I don't think the data support line 222 "AgmT connects bFACs to PG", perhaps rephrased to accommodate more agnostic explanations. Likewise, line 308 states that MltG has been "adopted" by the gliding motility machinery. This conclusion cannot be drawn from the data presented. 

      We agree with the reviewer that the conclusions should be stated precisely. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue: 

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      However, we believe that the conclusion that “AgmT connects bFACs to PG" still stands true. Although AgmT is not likely to interact with the gliding machinery directly, its activity does increase the binding between bFACs and PG. 

      We agree with the reviewer that “adopt” may not be the best word to describe AgmT’s function in gliding. In the revised manuscript, we changed the phrase to “contributes to gliding motility”. 

      - Line 35: define "bFAC" at first use. 

      Fixed.

      - Figure 2: Mention in the caption why the pilA mutation is significant. Also, make more clear what one is supposed to see. You could include an arrow showing motile cells extruding from the colony edge, and mark + label the edge of the colony. 

      Following the reviewer’s recommendations, we described the motility phenotypes in detail in the main text, “On a 1.5% agar surface, the pilA- cells moved away from colony edges both as individuals and in “flare-like” cell groups, indicating that they were still motile with gliding motility. In contrast, the ∆aglR pilA- cells that lack an essential component in the gliding motor, were unable to move outward from the colony edge and thus formed sharp colony edges. Similarly, the ∆agmT pilA- cells also formed sharp colony edges, indicating that they could not move efficiently with gliding (Fig. 2B)”. 

      We also added a schematic block into panel B and two sentences into the legend, “To eliminate S-motility, we further knocked out the pilA gene that encodes pilin for type IV pilus. Cells that move by gliding are able to move away from colony edges.” 

      - Figure 3 caption. Mecillinam concentration should presumably be µg/mL, not g/mL?

      Also, remove the ".van,." in the second to last line. 

      We apologize for these mistakes. We have corrected them in the figure legend. 

      - Line 212 - at this point in the manuscript, the fact that AgmT likely does not assemble into bFACs is quite well established, so I would start this paragraph with something like "As an additional test, we...". 

      Revised as the reviewer recommended.

      - Figure 5C - this assay needs a protein loading control. How about whole-cell AglR before pelleting PG? 

      We do have a whole-cell loading control, which we have added into the revised figure.

      - Figure 5A - how are the dimers visible? Is this a native gel? If so, please add to the Methods section (I would find information on Western Blot there, but not on gel electrophoresis). 

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      As stated in the previous response to the reviewers,  MBP and  RpoA were indeed used in the western blot experiments as  appropriate controls for periplasmic and cytoplasmic proteins, respectively. The open review process of eLife has enabled us to include additional data from experiments suggested by the reviewers. We think that this mode of publication is appropriate in the present case for the reporting of the requested analysis of OMVs. Indeed, these data are of interest only to a rather specialized audience.

      Reviewer #2 (Public Review):  

      Weaknesses:

      Figure 3 and 4 - why are the data shown here only two biological replicates, when there are 3-5 replicates shown in table S1 and S2? This makes it seem like you are cherry picking your favorite replicates. Please present the data as the mean of all the replicates performed, with error shown on the graph.

      We apologize for forgetting to update the legend to Figures 3 and 4. In the modified version, we have indicated that the values used for the plots are the average of three to five replicates. The full set of data together with the means and standard deviations appear in Tables S1 and S2. We would like to keep the current presentation of the data because introducing standard deviations in these figures compromise the legibility of the data.

      This work will have a moderate impact on the field of research in which the connections between the OM and peptidoglycan are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

      We have already answered this comment in the first response to the reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. They found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data. Strengths of the study include the computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.

      Previous models of replay or theta sequences focused on circuit plasticity and usually required a pre-existing place map input from the external environment via upstream structures. However, those models failed to explain how networks support rapid sequential coding of novel environments or simply transferred the question to the upstream structure. On the contrary, the current proposed model required minimal spatial inputs and was aimed at elucidating how a preconfigured structure gave rise to preplay, thereby facilitating the sequential encoding of future novel environments.

      In this model, the fundamental units for spatial representation were clusters within the network. Sequential representation was achieved through the balance of cluster isolation and their partial overlap. Isolation resulted in a self-reinforced assembly representation, ensuring stable spatial coding. On the other hand, overlap-induced activation transitions across clusters, enabling sequential coding.

      This study is important when considering that previous models mainly focused on plasticity and experience-related learning, while this model provided us with insights into how network architecture could support rapid sequential coding with large capacity, upon which learning could occur efficiently with modest modification via plasticity.

      I found this research very inspiring and, below, I provide some comments aimed at improving the manuscript. Some of these comments may extend beyond the scope of the current study, but I believe they raise important questions that should be addressed in this line of research.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we have revised the text to clarify that the random clustering is random with respect to any future, novel environment (lines 111-114 and 710-712).

      Lines 111-114: “To reconcile these experimental results, we propose a model of intrinsic sequence generation based on randomly clustered recurrent connectivity, wherein place cells are connected within multiple overlapping clusters that are random with respect to any future, novel environment.”

      Lines 710-712: “Our results suggest that the preexisting hippocampal dynamics supporting preplay may reflect general properties arising from randomly clustered connectivity, where the randomness is with respect to any future, novel experience.”

      The cause of clustering could be prior experiences (e.g. Bourjaily and Miller, 2011) or developmental programming (e.g. Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022), and we have modified lines 116 and 714-718 to state this.

      Lines 116: Added citation of “Perin et al., 2011”

      Lines 714-718: “Synaptic plasticity in the recurrent connections of CA3 may primarily serve to reinforce and stabilize intrinsic dynamics, which could be established through a combination of developmental programming (Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022) and past experiences (Bourjaily and Miller, 2011), rather than creating spatial maps de novo.”

      We thank the reviewer for suggesting that the results of Liu et al., 2021 strengthen the support for our modeling motivations. We agree, and we now cite their finding that the hippocampal representations of novel environments emerged rapidly but were initially generic and showed greater discriminability from other environments with repeated experience in the environment (lines 130-134).

      Lines 130-134: “Further, such preexisting clusters may help explain the correlations that have been found in otherwise seemingly random remapping (Kinsky et al., 2018; Whittington et al., 2020) and support the rapid hippocampal representations of novel environments that are initially generic and become refined with experience (Liu et al., 2021).”

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Explanation 1 is correct: Our cluster-activation analyses (Figure 5) showed that the parameter values that generate preplay correspond to the parameter regions that support sustained cluster activity over multiple decoding time bins, which led us to the conclusion of the reviewer’s first proposed explanation.

      We have now added additional analyses supporting the conclusion that cluster-wise activity is the main driver of preplay rather than individual cell-identity (Figures 6 and 7). In Figure 6 we show that cluster-identity alone is sufficient to produce significant preplay by performing decoding after shuffling cell identity within clusters, and in Figure 7 we show that this result holds true when considering the sequence of spiking activity within population bursts rather than the spatial decoding.

      Lines 495-515: The pattern of preplay significance across the parameter grid in Figure 4f shows that preplay only occurs with modest cluster overlap, and the results of Figure 5 show that this corresponds to the parameter region that supports transient, isolated cluster-activation. This raises the question of whether cluster-identity is sufficient to explain preplay. To test this, we took the sleep simulation population burst events from the fiducial parameter set and performed decoding after shuffling cell identity in three different ways. We found that when the identity of all cells within a network are randomly permuted the resulting median preplay correlation shift is centered about zero (t-test 95% confidence interval, -0.2018 to 0.0012) and preplay is not significant (distribution of p-values is consistent with a uniform distribution over 0 to 1, chi-square goodness-of-fit test p=0.4436, chi-square statistic=2.68; Figure 6a). However, performing decoding after randomly shuffling cell identity between cells that share membership in a cluster does result in statistically significant preplay for all shuffle replicates, although the magnitude of the median correlation shift is reduced for all shuffle replicates (Figure 6b). The shuffle in Figure 6b does not fully preserve cell’s cluster identity because a cell that is in multiple clusters may be shuffled with a cell in either a single cluster or with a cell in multiple clusters that are not identical. Performing decoding after doing within-cluster shuffling of only cells that are in a single cluster results in preplay statistics that are not statistically different from the unshuffled statistics (t-test relative to median shift of un-shuffled decoding, p=0.1724, 95% confidence interval of -0.0028 to 0.0150 relative to the reference value; Figure 6c). Together these results demonstrate that cluster-identity is sufficient to produce preplay.

      Lines 531-551: While cluster-identity is sufficient to produce preplay (Figure 6b), the shuffle of Figure 6c is incomplete in that cells belonging to more than one cluster are not shuffled. Together, these two shuffles leave room for the possibility that individual cell-identity may contribute to the production of preplay. It might be the case that some cells fire earlier than others, both on the track and within events. To test the contribution of individual cells to preplay, we calculated for all cells in all networks of the fiducial parameter point their mean relative spike rank and tested if this is correlated with the location of their mean place field density on the track (Figure 7). We find that there is no relationship between a cell’s mean relative within-event spike rank and its mean place field density on the track (Figure 7a). This is the case when the relative rank is calculated over the entire network (Figure 7, “Within-network”) and when the relative rank is calculated only with respect to cells with the same cluster membership (Figure 7, “Within-cluster”). However, because preplay events can proceed in either track direction, averaging over all events would average out the sequence order of these two opposite directions. We performed the same correlation but after reversing the spike order for events with a negative slope in the decoded trajectory (Figure 7b). To test the significance of this correlation, we performed a bootstrap significance test by comparing the slope of the linear regression to the slope that results when performing the same analysis after shuffling cell identities in the same manner as in Figure 6. We found that the linear regression slope is greater than expected relative to all three shuffling methods for both the within-network mean relative rank correlation (Figure 6c) and the within-cluster mean relative rank correlation (Figure 6d).

      Lines 980-1000:

      “Cell identity shuffled decoding

      We performed Bayesian decoding on the fiducial parameter set after shuffling cell identities in three different manners (Figures 6 and 7). To shuffle cells in a cluster-independent manner (“Across-network shuffle”), we randomly shuffled the identity of cells during the sleep simulations. To shuffle cells within clusters (“Within-cluster shuffle”), we randomly shuffled cell identity only between cells that shared membership in at least one cluster. To shuffle cells within only single clusters (“Within-single-cluster shuffle”), we shuffled cells in the same manner as the within-cluster shuffle but excluded any cells from the shuffle that were in multiple clusters.

      To test for a correlation between spike rank during sleep PBEs and the order of place fields on the track (Figure 7), we calculated for each excitatory cell in each network of the fiducial parameter set its mean relative spike rank and correlated that with the location of its mean place field density on the track (Figure 7a). To account for event directionality, we calculated the mean relative rank after inverting the rank within events that had a negatively sloped decoded trajectory (Figure 7b). We calculated mean relative rank for each cell relative to all cells in the network (“Within-network mean relative rank”) and relative to only cells that shared cluster membership with the cell (“Within-cluster mean relative rank”). We then compared the slope of the linear regression between mean relative rank and place field location against the slope that results when applying the same analysis to each of the three methods of cell identify shuffles for both the within-network regression (Figure 7c) and the within-cluster regression (Figure 7d).”

      We also now show that the sequence of cluster-activation in events with 3 active clusters does not match the sequence of cluster biases on the track above chance levels and that events with fewer active clusters have the largest increase in median weighted decode correlation (Figure 5—figure supplement 1), showing that the reviewer’s second explanation is not the case.

      Lines 466-477: “The results of Figure 5 suggest that cluster-wise activation may be crucial to preplay. One possibility is that the random overlap of clusters in the network spontaneously produces biases in sequences of cluster activation which can be mapped onto any given environment. To test this, we looked at the pattern of cluster activations within events. We found that sequences of three active clusters were not more likely to match the track sequence than chance (Figure 5—figure supplement 1a). This suggests that preplay is not dependent on a particular biased pattern in the sequence of cluster activation. We then we asked if the number of clusters that were active influenced preplay quality. We split the preplay events by the number of clusters that were active during each event and found that the median preplay shift relative to shuffled events with the same number of active clusters decreased with the number of active clusters (Spearman’s rank correlation, p=0.0019, =-0.13; Figure 5—figure supplement 1b).”

      Lines 1025-1044:

      “Active cluster analysis

      To quantify cluster activation (figure 5), we calculated the population rate for each cluster individually as the mean firing rate of all excitatory cells belonging to the cluster smoothed with a Gaussian kernel (15 ms standard deviation). A cluster was defined as ‘active’ if at any point its population rate exceeded twice that of any other cluster during a PBE. The active clusters’ duration of activation was defined as the duration for which it was the most active cluster.

      To test whether the sequence of activation in events with three active clusters matched the sequence of place fields on the track, we performed a bootstrap significance test (Figure 5—figure supplement 1). For all events from the fiducial parameter set that had three active clusters, we calculated the fraction in which the sequence of the active clusters matched the sequence of the clusters’ left vs right bias on the track in either direction. We then compared this fraction to the distribution expected from randomly sampling sequences of three clusters without replacement.

      To determine if there was a relationship between the number of active clusters within an event and it’s preplay quality we performed a Spearman’s rank correlation between the number of active clusters and the normalized absolute weighted correlation across all events at the fiducial parameter set. The absolute weighted correlations were z-scored based on the absolute weighted correlations of the time-bin shuffled events that had the same number of active clusters.”

      We also now add control simulations showing that without the cluster-dependent bias the population burst events no longer significantly decode as preplay (Figure 4—figure supplement 4e).

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We have added a figure illustrating different types of networks and their corresponding SWI (Figure 1—figure supplement 1) and a corresponding description in the main text (lines 228-234).

      Lines 228-234: “A ring lattice network (Figure 1—figure supplement 1a) exhibits high clustering but long path lengths between nodes on opposite sides of the ring. In contrast, a randomly connected network (Figure 1—figure supplement 1c) has short path lengths but lacks local clustered structure. A network with small world structure, such as a Watts-Strogatz network (Watts and Strogatz, 1998) or our randomly clustered model (Figure 1—figure supplement 1b), combines both clustered connectivity and short path lengths. In our clustered networks, for a fixed connection probability the SWI increases with more clusters and lower cluster participation…”

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We have modified lines 690-692 to clarify that that statement is specific to our model.

      Lines 690-692: “In our clustered network structure, such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index (SWI, Figure 1g; Neal, 2015; Neal, 2017).”

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      We have added a brief explanation of this in the text in lines 281-284.

      Lines 281-284: “During simulated sleep, sparse, stochastic spiking spontaneously generates sufficient excitement within the recurrent network to produce population burst events resembling preplay (Figure 2d-f)”

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Our clusters correspond to functional assemblies in that cells that share a cluster membership have more-similar place fields and are more likely to reactivate together during population burst events. In the figure to the right, we show for an example network at the fiducial parameter set the Pearson correlation between all pairs of place fields split by whether the cells share membership in a cluster (blue) or do not (red).

      Author response image 1.

      We expect an assembly analysis would identify assemblies similarly to the experimental data, but we see this additional analysis as a future direction. We have added a description of this correspondence in the text at lines 134-137.

      Lines 134-137: “Such clustered connectivity likely underlies the functional assemblies that have been observed in hippocampus, wherein groups of recorded cells have correlated activity that can be identified through independent component analysis (Peyrache et al., 2010; Farooq et al., 2019).”

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally. We report here preliminary results supporting this as an interesting future direction.

      Author response image 2.

      We performed a similar analysis to that reported in Figure 3C of Dragoi and Tonegawa, 2013. We determined the statistical significance of each event individually for each of the two environments by testing whether the decoded event’s absolute weighted correlation exceeded that 99th percentile of the corresponding shuffle events. We then fit a linear regression to the fraction of events that were significant for each of the two tracks and that were significant to either of the two tracks (left panel of above figure). We then estimated the track capacity as the number of tracks at the point where the linear regression reached 100% of the network capacity. We find that applying this analysis to our fiducial parameter set returns an estimate of ~8.6 tracks (Dragoi and Tonegawa, 2013, found ~15 tracks).

      We performed this same analysis for each parameter point in our main parameter grid (right panel of above figure). The parameter region that produces significant preplay (Figure 4f) corresponds to the region that has a track capacity of approximately 8-25 tracks. In the parameter grid region that does not produce preplay, the estimated track capacity approaches the high values that this analysis would produce when applied to events that are significant only at the false-positive rate. This analysis is based on the assumption that each preplay event would significantly correspond to at least one future event. Interesting interpretation issues arise when applying this analysis to parameter regions that do not produce statistically significant preplay, which we leave to future directions to address.

      We note two differences between our analysis here and that in Dragoi and Tonegawa, 2013. First, their track capacity analysis was performed on spike sequences rather than decoded spatial sequences, which is the focus of our manuscript. Second, they recorded rats exploring three novel tracks, while in our manuscript we only simulated two novel tracks, which reduces the accuracy of our linear extrapolation of track capacity.

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a spiking network model with clustered neurons produces intrinsic spike sequences when driven with a ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We thank the reviewer for pointing out this important limitation. We see extensive testing of the time vs space coding properties of this network as a future direction, but we have performed simulations that demonstrate the robustness of place field coding to variations in traversal speeds and added the results as a supplemental figure (Figure 3—figure supplement 1).

      Lines 332-336: “To verify that our simulated place cells were more strongly coding for spatial location than for elapsed time, we performed simulations with additional track traversals at different speeds and compared the resulting place fields and time fields in the same cells. We find that there is significantly greater place information than time information (Figure 3—figure supplement 1).

      Lines 835-841: “To compare coding for place vs time, we performed repeated simulations for the same networks at the fiducial parameter point with 1.0x and 2.0x of the original track traversal speed. We then combined all trials for both speed conditions to calculate both place fields and time fields for each cell from the same linear track traversal simulations. The place fields were calculated as described below (average firing rate within each of the fifty 2-cm long spatial bins across the track) and the time fields were similarly calculated but for fifty 40-ms time bins across the initial two seconds of all track traversals.”

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to variations in feedforward input is important. We have added new simulation results (Figure 4—figure supplement 4) showing that the existence of preplay in our model is robust to variations in the form of input.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Lines 413-420: To test the robustness of our results to variations in input types, we simulated alternative forms of spatially modulated feedforward inputs. We found that with no parameter tuning or further modifications to the network, the model generates robust preplay with variations on the spatial inputs, including inputs of three linearly varying cues (Figure 4—figure supplement 4a) and two stepped cues (Figure 4—figure supplement 4b-c). The network is impaired in its ability to produce preplay with binary step location cues (Figure 4—figure supplement 4d), when there is no cluster bias (Figure 4—figure supplement 4e), and at greater values of cluster participation (Figure 4—figure supplement 4f).

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 9b (original Figure 7b) the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson et al., 1996; Pavlides, et al., 2019).

      Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation in the model is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated.

      We have added text at lines 627-630 clarifying these points.

      Lines 628-631: “Cells that share membership in a cluster will have some amount of correlation in their remapping due to the cluster-dependent cue bias, which is consistent with experimental results (Hampson et al., 1996; Pavlides et al., 2019), but the combinatorial nature of cluster membership renders the overall place field map correlations low (Figure 9b).”

      Reviewer #3 (Public Review):

      Summary:

      This work offers a novel perspective on the question of how hippocampal networks can adaptively generate different spatial maps and replays/preplays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. Unlike previous modeling attempts, the authors do not pre-tune their model neurons to any particular place fields. Instead, they build a random, moderately-clustered network of excitatory (and some inhibitory) cells, similar to CA3 architecture. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that weakly connected cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with real ones. In summary, the model demonstrates that spontaneous activity within a small-world structured network can generate place cells and replays without the need for pre-configured maps.

      Strengths:

      This work addresses an important question in hippocampal dynamics. Namely, how can hippocampal networks quickly generate new place cells when a novel environment is introduced? And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Other models depended on synaptic plasticity rules which made remapping slower than what is seen in recordings. This modeling work proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity and border-cell inputs, avoiding the need for pre-set spatial maps or plasticity rules. The proposal that small-world architecture is key for place cells and preplays to adapt to new spatial environments is novel and of potential interest to the computational and experimental community.

      The authors do a good job of thoroughly examining some of the features of their model, with a strong focus on excitatory cell connectivity. Perhaps the most valuable conclusion is that replays require the successive activation of different cell clusters. Small-world architecture is the optimal regime for such a controlled succession of activated clusters.

      The use of pre-existing electrophysiological data adds particular value to the model. The authors convincingly show that the simulated place cells and preplay events share many important features with those recorded in CA1 (though CA3 ones are similar).

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input. We show in Figure 4—figure supplement 4that our preplay results are robust to several variations in the location-cue inputs. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      We apologize for a lack of clarity if we have caused confusion about the type of inputs and if we implied an absence of spatially-tuned information in the network. In order for place fields to appear the network must receive spatial information, which we model as linearly-varying cues and illustrate in Figure 1b and describe in the caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683). Such input is not place-field like, as the small bias to any cell linearly decreases from one boundary of the track or the other.

      The cluster-dependent bias, which is also described in the same lines (Figure 1 caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683)), only affects the strength of the spatial cues that are present during simulated run periods. Crucially, this cluster-dependent bias is absent during sleep simulations when preplay occurs, which is why preplay can equally correlate with place field sequences in any context.

      We have modified the text (lines 207-210, 218, and 824-827) to clarify these points. We have also added results from a control simulation (Figure 4—figure supplement 4e) showing that preplay is not generated in the absence of the cluster-dependent bias.

      Lines 207-210: “This bias causes cells that share cluster memberships to have more similar place fields during the simulated run period, but, crucially, this bias is not present during sleep simulations so that there is no environment-specific information present when the network generates preplay.”

      Lines 218: “Second, to incorporate cluster-dependent correlations in place fields, a small…”

      Lines 824-827: “The addition of this bias produced correlations in cells’ spatial tunings based on cluster membership, but, importantly, this bias was not present during the sleep simulations, and it did not lead to high correlations of place-field maps between environments (Figure 9b).”

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells. We have added a brief discussion of this to the manuscript.

      Lines 733-739: “Additionally, the in vivo microcircuitry of CA3 is complex and includes aspects such as nonlinear dendritic computations and a variety of inhibitory cell types (Rebola et al., 2017). This microcircuitry is crucial for explaining certain aspects of hippocampal function, such as ripple and gamma oscillogenesis (Ramirez-Villegas et al., 2017), but here we have focused on a minimal model that is sufficient to produce place cell spiking activity that is consistent with experimentally measured place field and preplay statistics.”

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we have modified lines 114-116 to address the clustered connectivity reported in CA3.

      Lines 114-116: “Such clustering is a common motif across the brain, including the CA3 region of the hippocampus (Guzman et al., 2016) as well as cortex (Song et al., 2005), …”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Based on Figure 3, the place fields are not uniformly distributed in the maze. Meanwhile, based on Figure 1b and Methods, the total input seems to be uniform across the maze. Why does the uniform total external input lead to nonuniform network activities?

      While the total input to the network is constant across the maze, the input to any individual cell can peak only at either end of the track. All excitatory cells receive input from both the left-cue and the right-cue with different input strengths. By chance and due to the cluster-dependent bias some cells will have stronger input from one cue than the other and will therefore be more likely to have a place field toward that side of the track. However, no cell receives a peak of input in the center of the track. We have modified lines 141-143 to clarify this.

      Lines 141-143: “While the total input to the network is constant as a function of position, each cell only receives a peak in its spatially linearly varying feedforward input at one end of the track.”

      (2) I find these sentences confusing: "...we expected that the set of spiking events that significantly decode to linear trajectories in one environment (Figure 4) should decode with a similar fidelity in another environment..." (Lines 513-515) and "As expected... but not with the place fields of trajectories from different environments (Figure 7c)" (Line 517-520). What is the expectation for cross-environment decoding? Should they be similar or different? Also, in Figure 7c, the example is not fully convincing. In the figure caption, it states that decoding is significant in the top row but not in the bottom row, but they look similar across rows.

      Original lines 513-515 refer to the entire set of events, while original lines 517-520 refer to one example event. The sleep events are simulated without any track-specific information present, so the degree to which preplay occurs when decoding based on the place fields of a specific future track should be independent of any particular track when considering the entire set of decoded PBEs, as shown in Figure 9d (original Figure 7). However, because there is strong remapping across tracks (Figure 9b), an individual event that shows a strong decoded trajectory based on the place fields of one track (Figure 9c, top row) should show chance levels of a decoded trajectory when decoded with the place fields of an alternative track (Figure 9c, bottom row).

      We have revised lines 643-650 for clarity, and we have added statistics for the events shown in Figure 9c.

      Lines 644-651: “Since the place field map correlations are high for trajectories on the same track and near zero for trajectories on different tracks, any individual event would be expected to have similar decoded trajectories when decoding based on the place fields from different trajectories in the same environment and dissimilar decoded trajectories when decoding based on place fields from different environments. A given event with a strong decoded trajectory based on the place fields of one environment would then be expected to have a weaker decoded trajectory when decoded with place fields from an alternative environment (Figure 9c).

      Lines 604-608: “(c) An example event with a statistically significant trajectory when decoded with place fields from Env. 1 left (absolute correlation at the 99th percentile of time-bin shuffles) but not when decoded with place fields of the other trajectories (78th, 45th, and 63rd percentiles, for Env. 1 right, Env. 2 left, and Env. 2 right, respectively). shows a significant trajectory when it is decoded with place fields from one environment (top row), but not when it is decoded with place fields from another environment (bottom row). “

      (3) In Methods, the equation at line 610, E in the last term should be E_ext.

      We modeled the feedforward inputs as excitatory connections with the same reversal potential as the recurrent excitatory connections, so  is the proper value.

      (4) Equation line 617 states that conductances follow exponential decay, but the initial conductances of g_I.g_E and g_SRA are not specified.

      We have added a description of the initial values in lines 760-764.

      Lines 760-764: “Initial feed-forward input conductances were set to values approximating their steady-state values by randomly selecting values from a Gaussian with a mean of   and a standard deviation of . Initial values of the recurrent conductances and the SRA conductance were set to zero.”

      (5) In the parameter table below line 647, W_E-E, W_E-I, and W_I-E are not described in the text.

      We have clarified in lines 757-760 that the step increase in conductance corresponds to these parameter values.

      Lines 757-760: “A step increase in conductance occurs at the time of each spike by an amount corresponding to the connection strength for each synapse ( for E-to-E connections, for E-to-I connections, and  for I-to-E connections), or by  for .”

      (6) On line 660, "...Each environment and the sleep session had unique context cue input weights...". Does that mean that within a sleep session, the network received the same context input? How strongly are the sleep dynamics driven by that context input rather than by intrinsic dynamics? Usually, sleep activity is high dimensional, what would happen if the input during sleep is more stochastic?

      Yes, within a sleep session each network receives a single set of context inputs, which are implemented as independent Poisson spike trains (so being independent, in small time-windows the dimensionality is equal to the number of neurons). The effects of any particular set of sleep context cue inputs should be minor, since the standard deviation of the input weights, , is small. Further, because the preplay analysis is performed across many networks at each parameter point, the observation of preplay is independent of any particular realization of either the recurrent network or the sleep context inputs.

      Further exploring the effects of more biophysically realistic neural dynamics during simulated sleep is an interesting future direction.

      (7) One bracket is missing in the denominator in line 831.

      We have fixed this error.

      Line 1005: “)” -> “()”

      Reviewer #2 (Recommendations For The Authors):

      - I would suggest the authors cite Chenkov et al 2017, PLOS Comp Bio, in which "replay" sequences were produced in clustered networks, and discuss how their work differs.

      We have included a contrast of our model to that of Chenkov et al., 2017 in lines 73-78.

      Lines 73-78: “Related to replay models based on place-field distance-dependent connectivity is the broader class of synfire-chain-like models. In these models, neurons (or clusters of neurons) are connected in a 1-dimensional feed-forward manner (Diesmann et al., 1999; Chenkov et al., 2017). The classic idea of a synfire-chain has been extended to included recurrent connections, such as by Chenkov et al., 2017, however such models still rely on an underlying 1-dimensional sequence of activity propagation.”

      - Figure legend 2e says "replay", should be "preplay".

      We have fixed this error.

      Line 255: “(e) Example preplay event…”

      - How much does the context cue affect the result? e.g. Is sleep notably different with different sleep context cues?

      As discussed above in our response to Reviewer 1, the context cue weights have a small standard deviation, , which means that differences in the effects of different realizations of the context inputs are small. Different sets of context cues will cause cells to have slightly higher or lower spiking rates during sleep simulations, but because there is no correlation between the sleep context cue and the place field simulations there should be no effect on preplay quality.

      - Figure 4 should include a control with a single cluster.

      We thank the reviewer for this suggestion and have added additional control simulations.

      In our model, the recurrent structure of a network with a single cluster is equivalent to a cluster-less random network. Additionally, any network where cluster participation equals the number of clusters is equivalent to a cluster-less random network, since all neurons belong to all clusters and can therefore potentially connect to any other neuron. Such a condition corresponds to a diagonal boundary where the number of clusters equals the cluster participation, which occurs at higher values of cluster participation than we had shown in our primary parameter grid.

      We now include simulation results that extend to this boundary, corresponding to cluster-less networks (Figure 4—figure supplement 4f). Networks at these parameter points do not show preplay. See our earlier response for the new text associated with Figure 4—figure supplement 4.

      - The results of Figure 4 are very noisy. I would recommend increasing the sampling, both in terms of the number of population events in each condition and the number of conditions.

      We have run simulations for longer durations (300 seconds) and with more networks (20) to produce more accurate empirical values for the statistics calculated across the parameter grids in Figures 3 and 4. Our additional simulations (Figure 4—figure supplement 4) provide support that the parameter region of preplay significance is reliable.

      Lines 831-833: “For the parameter grids in Figures 3 and 4 we simulated 20 networks with 300 s long sleep sessions in order to get more precise empirical estimates of the simulation statistics.”

      - It's not entirely clear what's different between the analysis described in lines 334-353, and the preplay analysis in Figure 2. In general, the description of this result was difficult to follow, as it included a lot of text that would be better served in the methods.

      In Figure 2 we first introduce the Bayesian decoding method, but it is not until Figure 4 that the shuffle-based significance testing is first introduced. We have simplified the description of the shuffle comparison in lines 371-375 and now refer the reader to the methods for details.

      Lines 371-375: “We find significant preplay in both our reference experimental data set (Shin et al., 2019; Figure 4a, b; see Figure 4—figure supplement 1 for example events) and our model (Figure 4c, d) when analyzed by the same methods as Farooq et al., 2019, wherein the significance of preplay is determined relative to time-bin shuffled events (see Methods). For each detected event we calculated its absolute weighted correlation. We then generated 100 time-bin shuffles of each event, and for each shuffle recalculated the absolute weighted correlation to generate a null distribution of absolute weighted correlations.”

      - Many of the figures have low text resolution (e.g. Figure 6).

      We have now fixed this.

      - How does the clustered small world network compare to e.g. a small world ring network as used in Watts and Strogatz 1998?

      As described in our above response to Reviewer 1's fourth point, we have added a supplementary figure (Figure 1—figure supplement 1, with corresponding text) comparing our model with the Watts-Strogatz model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 5 would benefit from a plot of the overlap of activated clusters per event.

      In our cluster activation analysis in Figure 5, we defined a cluster as “active” if at any point in the event its population rate was twice that of any other clusters’. We used this definition—which permits no overlap of activated clusters—rather than a definition based on a z-scoring of the rate, because we determined that preplay required periods of spiking dominated by individual clusters.

      Author response image 3.

      The choice of such a definition is supported by our observation that most spiking activity within an event is dominated by whichever cluster is most active at each point in time. In the left panel of the above figure we show the distribution of the average fraction of spikes within each event that came from the most active cluster at each point in time. The right panel shows the distribution of the average across time within each event of the ratio of the population activity rate of the most active cluster to the second most active cluster. The data for both panels comes from all events at the fiducial parameter set.

      Author response image 4.

      Rather than overlapping at a given moment in time, clusters might have overlap in their probability of being active at some point within an event. We do find that there is a small but significant correlation in cluster co-activation. For each network we calculated the activation correlation across events for each pair of clusters (example network show in the left panel). We compared the distribution of resulting absolute correlations against the values that results after shuffling the correlations between cluster activations (right panel, all correlations for all networks from the fiducial parameter point).

      Figures 4e/f are referred to as 4c/d in the text (pg 14).

      We have fixed this error.

      Lines 400-412: “4c” -> “4e” and “4d” -> “4f”

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment: I find that the eLife assessment mentions “statistical analyses are yet to be carried out to support statements of statistical significance” while the reviewers mention that the data are compelling and results are technically solid. Besides all observations in the manuscript are presented with robust and established norms of statistical analysis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The use of data from before COVID-19 is both a strength and a weakness. Because COVID had effects on vascular health and had higher death rates for groups with the comorbidities of interest here, it has likely shifted the demographics in ways that would shift the results in unpredictable ways if the analysis were repeated with current data. This can be a strength in providing a reference point for studying those changes as well as allowing researchers to study differences between regions without the complication of different public health responses adding extra variation to the data. On the other hand, it limits the usefulness of the data in research concerned with the current status of the various populations.

      We completely agree with the observation, but were restricted as the purpose was to use the most robust and technically qualified data from GBD. The post COVID19 GBD data has not yet been released, but I am sure the observations made in the study can help in guiding the issues in the post COVID era too, because genetics is not going to change in these population groups.

      However, we did highlight this aspect of COVID19 even in our original version and also in the revised version.

      Reviewer #2 (Public Review):

      Weaknesses:

      The presentation is not focused. It is important to include p-values for all comparisons and focus the presentation on the main effects from the dataset analysis.

      The significant p-values were restricted to public health data only to identify and distinguish differences in incidence, prevalence and mortality and how they differ across world populations. These differences have often been interpreted from socio-economic point of view, while our manuscript presents the reasons for differences for main condition (Stroke) and its comorbid condition among different ethnicities from a genetic perspective. This genetic perspective was further explored to identify unique ethnic specific variants and their patterns of linkage disequilibrium in distinguishing the phenotypic variations. Considering the quantum and diversity of data, both for public health and GWAS data, there can be several directions but for presentation we focused only on the most distinguishing and established phenotypic differences. I am sure this will open up avenues for several future investigations including COVID, as has been highlighted by the reviewers too. All observations were presented with robust and established norms of statistical analysis.


      The following is the authors’ response to the original reviews.

      Thanks for the constructive observations on strengths and weaknesses of our manuscript. Interestingly, some of the weaknesses mentioned here also turns out to be the strength of the article. For example COVID19 has been mentioned by the reviewer as a driver to increase the mortality in some comorbid conditions and stroke. Firstly, I must clarify that, our data is from PreCOVID era and we indeed mention that in COVID era, COVID-19 might differentially impact the risk of stroke. Possibly this differential influence on the comorbidities of stroke, is likely to be influenced by its underlying genetics of stroke and its comorbidities.

      I have tried to address the concerns raised by the reviewers, which ideally doesn’t impact the original manuscript. Statistical limitation has been commented pertaining to P-values, which has been clarified here. However, certain minor concerns such as abbreviations have been resolved in the revised manuscript. My response to weakness and reviewer’s comments are mentioned below.

      Reviewer #1 (Public Review):

      Strengths:

      The data provided here will provide a foundation for a lot of future research into the causes of the observed correlations as well as whether the observed differences in comorbidities across regions have clinically relevant effects on risk management.

      Weaknesses:

      • As with any cross-national analysis of rates, the data is vulnerable to differences in classification and reporting across jurisdictions.

      GBD data is the most robust and most comprehensive data resource which has been used and accepted globally in predicting the health metrics statistics.

      GBD data indeed considers normalisations, regarding classification and reporting.

      To the best of our knowledge this is the best available resource to consider all health metrics analysis.

      • Furthermore, given the increased death rate from COVID-19 associated with many of these comorbid conditions and the long-term effects of COVID-19 infection on vascular health, it is expected that many of the correlations observed in this dataset will shift along with the shifting health of the underlying populations.

      I must clarify that we have used data prior to COVID-19.

      But yes the patterns after COVID19 will shift due to the impact of covid. This makes the study even more relevant as the comorbid conditions of stroke are also the risk drivers for COVID19 and mortality. This shift has been reported by some authors, which has been discussed in the discussion.

      Therefore, understanding the genetic factors underlying stroke and its comorbid conditions might help in resolving how COVID19 might differentially impact on health outcome.

      We did highlight this aspect of COVID19 even in our original version.

      Introduction 1st para:

      “It is the accumulated risk of comorbid conditions that enhances the risk of stroke further. Are these comorbid conditions differentially impacted by socio-economic factors and ethnogeographic factors. This was clearly evident in COVID era, when COVID-19 differentially impacted the risk of stroke, possibly due to its differential influence on the comorbidities of stroke.”

      Discussion 3rd para:

      “Studies reported reduction in life expectancy in 31 of 37 high-income countries, deduced to be due to COVID-191 . However, it would be unfair to ignore the comorbid conditions which could also be the critical determinants for reduced life expectancy in these countries.”

      Recommendations For The Authors:

      On page 5, the authors make a note about Africa and the Middle East having the highest ASMR for high SBP and comment about the relative populations of these regions. The populations of the regions are irrelevant to the rate.

      Since the study is on comorbid factors of stroke and its impact on mortality therefore, relative burden seems critical. This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Results section 2:

      “Ethno-regional differences in mortality and prevalence of stroke and its major comorbid conditions

      We observed interesting patterns of ASMRs of stroke, its subtypes and its major comorbidities across different regions over the years as shown in figure 1a, table 1 and supplementary files S2 & S3. When assessed in terms of ranks, high SBP is the most fatal condition followed by IHD in all regions, except Oceania (OCE) where IHD and high SBP swap ranks. Africa (AFR; 206.2/100000, 95%UI 177.4-234.2) and Middle East (MDE; 198.6/100000, 95%UI 162.8-234.4) have the highest ASMR for high SBP, even though they rank as only the third and sixth most populous continents (fig. S2), respectively.”

      On page 17, the authors are alarmed by a large ratio between prevalence rates and mortality rates for certain conditions. This is confusing since this indicates that these conditions are not as dangerous as the other conditions.

      This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Discussion para 1:

      “While the global stroke prevalence is nearly 15 times its mortality rate, prevalence of comorbid conditions such as high SBP, high BMI, CKD, T2D are alarmingly 150- to 500-fold higher than their mortality rates. These comorbid conditions can drastically affect the outcome of stroke.”

      In Figure 4, the colors are not defined.

      In Structure plot colours are assigned as per each K, it doesn’t directly refer to any population. But the plot distinguishes the stratification of populations as per K. Ramasamy, R.K., Ramasamy, S., Bindroo, B.B. et al. STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3, 431 (2014). https://doi.org/10.1186/2193-1801-3-431

      Reviewer #2 (Public Review):

      Strengths:

      The idea is interesting and the data are compelling. The results are technically solid.

      The authors identify specific genetic loci that increase the risk of a stroke and how they differ by region.

      Weaknesses:

      The presentation is not focused. It would be better to include p-values and focus presentation on the main effects of the dataset analysis.

      I presume the comment is made with reference to results with significant p-values.

      P-values are mentioned in the main text when referring to significant decrease or increase with respect to global rates and time e.g. P-values for comparison of a year 2019, are based on regional rates to global rates of 2019. Supplementary table S2a (mortality) and S3a (prevalence) e.g. P-values for comparison between year is based on 2019 rates to 2009 rates in Supplementary table S2b (mortality) and S3b (prevalence) e.g. P-values for proportional mortality and proportional prevalence in Supplementary table S4 and S5 is also based on global rates.

      Recommendations For The Authors:

      It would be better to minimize the use of acronyms. Often one has to go back to decipher what the acronym stands for. It is fine to have acronyms in figure legends, if necessary. However, at least for regions, please do not use acronyms.

      In the revised version we have tried to minimise the Acronyms.

      Removed the acronyms for regions and other places wherever possible however, the diseases acronyms have been maintained as per the GBD terms.

      Please focus the presentation on the main results. Currently, the presentation wanders and repeats itself a lot.

      Since the manuscript tries to address the global and regional rates of prevalence, mortality and its relationship to genetic correlates, it is difficult not to repeat the same to stress the significant observations coming out of different analysis methods. This might reflect on some amount of repetitiveness but the intention was to stress the significant observations.

      I would also recommend acknowledging and discussing socioeconomic factors earlier in the manuscript.

      Current mention happens in 3rd para of Discussion

      “The changing dynamics of stroke or its comorbid conditions can be attributed to multitude of factors. Often global burden of stroke has been discussed from the point of view of socio-economic parameters. Studies indicate that half of the stroke-related deaths are attributable to poor management of modifiable risk factors 8,9. However, we observe that different socio-economic regions are driven by different risk factors.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

      We thank the reviewer for appreciating the importance of our work and his/her positive comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors employed DLS to quantify the percentage of SV clustering in Fig. 1c and d. As DLS usually measures particle size distribution, I am not sure how the data was plotted in Fig. 1c and d. It would be great to show a representative raw dataset here.

      We thank the reviewer for the comment. To address this, we have put four representative DLS datasets of different α-Syn variants mediating SV clustering for clarification (Author response image 1). Rather than presenting the particle distribution based on the light scattering intensity, DLS can also convert the intensity to present the data as particle size distribution based on the particle number counts. In our analysis, particle diameters around 50 nm are considered to represent single SV species, whereas diameters larger than 120 nm indicate SV clusters. Specifically, as shown in Author response image 1, adding Ac-α-syn to a homogeneous SV sample altered the distribution from one single SV particle species (Author response image 1d) to three distinct species (Author response image 1a); this resulted in 68.5% of the particles being single SVs and 31.5% being SV clusters.

      Author response image 1.

      Representative raw dataset of α-Syn-mediated synaptic vesicle (SV) clustering monitored by dynamic light scattering (DLS). The grey-colored rows represent small particles (< 5 nm) that contributed zero to the particle number count.

      (2) Syn-lipid interactions are known to be altered by mutations involved in neurodegenerative diseases. I am wondering how those mutations will affect SV clustering mediated by the interaction of LPC with N-acetylated syn.

      We thank the reviewer for the insightful comment. Our data indicate that N-acetylation enhances the binding of the N-terminal region of α-syn to LPC, thereby facilitating SV clustering. This enhancement benefits from the fact that N-acetylation effectively neutralizes the positive charge of α-syn’s N-terminal region, promoting its insertion into LPC-rich membranes through hydrophobic interactions. Therefore, we envision that any mutation that weakens membrane binding capability of the N-terminal unmodified α-Syn may decrease SV clustering mediated by the interaction between the Ac-α-syn and LPC.

      In a separated work (doi: 10.1093/nsr/nwae182, Fig. S8), we compared the binding affinity of LPC with wild-type N-terminal un-modified α-syn and six Parkinson’s disease (PD) familial mutants (A30P, E46K, H50Q, G51D, A53E, and A53T). Among these, only the A30P mutation showed a significant decrease in binding with LPC. Furthermore, using the same single vesicle assay setup, in another paper (doi: 10.1073/pnas.2310174120, Fig. 4C), we demonstrated that the A30P-mutated α-Syn lost its ability to facilitate SV clusters. Therefore, among the six PD mutations, the A30P mutation may significantly impact the SV clustering mediated by Ac-α-syn LPC interaction.

      (3) The crosslinking data in Fig. 4 was obtained using LPC or PS liposomes. I am wondering if these results truly mimic physiological conditions. Could the authors use SVs for these experiments?

      We thank the reviewer for the suggestion. To elucidate the mechanistic differences between N-terminal unmodified α-syn and N-acetylated α-syn, we utilized pure LPC and PS liposomes for clarity. If using natural source SVs, which contain many synaptic proteins, could complicate or obscure the interaction patterns of Ac-α-syn due to potential crosstalk with other SV proteins. Additionally, the complex lipid environment of SV membranes would not help us decipher the specific molecular mechanism by which Ac-α-Syn facilitates SV clustering through LPC.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes being low, questioning the significance of the findings. Oddly, no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I that were reported to cause vesicle clustering.

      We appreciate the reviewers’ insightful comments. Indeed, in another paper (doi: 10.1093/nr/nwae182), employing conventional α-Syn pull-down assay and LC-MS lipidomics method, we found that α-Syn has a preference for binding to lysophospholipids across in vivo and in vitro systems. Additionally, by comparing the lipid compositions of mouse brains, SVs and SV lipid-raft membranes, we found LPC levels to be twice as high in SVs compared to brain homogenates, and twice as high in lipid-raft membranes compared to non-lipid-raft membranes. Altogether, these findings emphasize the physiological relevance of understanding the mechanism by which Ac-α-syn mediated SV clustering through LPC.

      Liquid-liquid phase separation has been implicated in the assembly and maintenance of SV clusters, and we believe that the SV cluster liquid phase is interconnected by highly abundant proteins with multivalent low-affinity interactions. Besides the previously discovered protein-protein interactions between α-Syn and synapsin (doi: 10.1016/j.jmb.2021.166961) or VAMP2 (doi: 10.1038/s41556-024-01456-1) that contribute to SV condensates, protein-lipid interactions between α-Syn and acidic phospholipids or LPC may also play a role. Furthermore, post-translational modifications, such as N-acetylation of α-Syn, may also contribute to SV condensates.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 2, the authors indicate that for the binding assay both vesicle populations, the immobilized "acceptor" and the superfused "donor" population were labeled with different fluorescent dyes whereas in the text it is stated that the immobilized acceptor liposomes were unlabeled. Please clarify. Moreover, a control is missing showing that binding indeed depends on the immobilised liposome fraction and does not occur in their absence. This control is important because due to the long incubation times non-specific adsorption may occur which may be enhanced by adding destabilizing LPC or charged PS to the membrane.

      We thank the reviewer for pointing out this inconsistency. To avoid signal leakage from a high concentration of DiD vesicles upon green laser irradiation, we immobilized unlabeled vesicles. We have revised the Figure 2a as well as the figure caption.

      Regarding the control mentioned by the reviewer, we agree with the reviewer that non-specific binding could occur with the long incubation. In fact, a layer of highly dense liposomes (100 μM) immobilized on the imaging surface is also for reducing non-specific interactions. In the absence of this layer of immobilized liposomes, we did see a high level of non-specific binding that significantly impacted our experiments. Therefore, we need to perform clustering experiments in the presence of immobilized liposomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Gonzalez Alam et al. report a series of functional MRI results about the neural processing from the visual cortex to high-order regions in the default-mode network (DMN), compiling evidence from task-based functional MRI, resting-state connectivity, and diffusionweighted imaging. Their participants were first trained to learn the association between objects and rooms/buildings in a virtual reality experiment; after the training was completed, in the task-based MRI experiment, participants viewed the objects from the earlier training session and judged if the objects were in the semantic category (semantic task) or if they were previously shown in the same spatial context (spatial context task). Based on the task data, the authors utilised resting-state data from their previous studies, visual localiser data also from previous studies, as well as structural connectivity data from the Human Connectome Project, to perform various seed-based connectivity analysis. They found that the semantic task causes more activation of various regions involved in object perception while the spatial context task causes more activation in various regions for place perception, respectively. They further showed that those object perception regions are more connected with the frontotemporal subnetwork of the DMN while those place perception regions are more connected with the medial-temporal subnetwork of the DMN. Based on these results, the authors argue that there are two main pathways connecting the visual system to highlevel regions in the DMN, one linking object perception regions (e.g., LOC) leading to semantic regions (e.g., IFG, pMTG), the other linking place perception regions (e.g., parahippocampal gyri) to the entorhinal cortex and hippocampus.

      Below I provide my takes on (1) the significance of the findings and the strength of evidence, (2) my guidance for readers regarding how to interpret the data, as well as several caveats that apply to their results, and finally (3) my suggestions for the authors.

      (1) Significance of the results and strength of the evidence

      I would like to praise the authors for, first of all, trying to associate visual processing with high-order regions in the DMN. While many vision scientists focus specifically on the macroscale organisation of the visual cortex, relatively few efforts are made to unravel how neural processing in the visual system goes on to engage representations in regions higher up in the hierarchy (a nice precedent study that looks at this issue is by Konkle and Caramazza, 2017). We all know that visual processing goes beyond the visual cortex, potentially further into the DMN, but there's no direct evidence. So, in this regard, the authors made a nice try to look at this issue.

      We thank the reviewer for their positive feedback and for their very thoughtful and thorough comments, which have helped us to improve the quality of the paper.

      Having said this, the authors' characterisation of the organisation of the visual cortex (object perception/semantics vs. place perception/spatial contexts) does not go beyond what has been known for many decades by vision neuroscience. Specifically, over the past two decades, numerous proposals have been put forward to explain the macroscale organisation of the visual system, particularly the ventrolateral occipitotemporal cortex. A lateral-medial division has been reliably found in numerous studies. For example, some researchers found that the visual cortex is organised along the separation of foveal vision (lateral) vs. peripheral vision (medial), while others found that it is structured according to faces (lateral) vs. places (medial). Such a bipartite division is also found in animate (lateral) vs. inanimate (medial), small objects (lateral) vs. big objects (medial), as well as various cytoarchitectonic and connectomic differences between the medial side and the lateral side of the visual cortex. Some more recent studies even demonstrate a tripartite division (small objects, animals, big objects; see Konkle and Caramazza, 2013). So, in terms of their characterisation of the visual cortex, I think Gonzalez Alam et al. do not add any novel evidence to what the community of neuroscience has already known.

      The aim of our study was not to provide novel evidence about visual organisation, but rather to understand how these well-known visual subdivisions are related to functional divisions in memory-related systems, like the DMN. We agree that our study confirms the pattern observed by numerous other studies in visual neuroscience.  

      However, the authors' effort to link visual processing with various regions of the DMN is certainly novel, and their attempt to gather converging evidence with different methodologies is commendable. The authors are able to show that, in an independent sample of restingstate data, object-related regions are more connected with semantic regions in the DMN while place-related regions are more connected with navigation-related regions in the DMN, respectively. Such patterns reveal a consistent spatial overlap with their Kanwisher-type face/house localiser data and also concur with the HCP white-matter tractography data. Overall, I think the two pathways explanation that the authors seek to argue is backed by converging evidence. The lack of travelling wave type of analysis to show the spatiotemporal dynamics across the cortex from the visual cortex to high-level regions is disappointing though because I was expecting this type of analysis would provide the most convincing evidence of a 'pathway' going from one point to another. Dynamic caudal modelling or Granger causality may also buttress the authors' claim of pathway because many readers, like me, would feel that there is not enough evidence to convincingly prove the existence of a 'pathway'.

      By ‘pathway’ we are referring to a pattern of differential connectivity between subregions of visual cortex and subregions of DMN, suggesting there are at least two distinct routes between visual and heteromodal regions. However, these routes don’t have to reflect a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. We have now clarified this in the discussion section of the manuscript. We agree it would be interesting to characterise the spatiotemporal dynamics of neural propagation along our pathways, and we have incorporated this proposal into the future directions section.

      “One important caveat is that we have not investigated the spatiotemporal dynamics of neural propagation along the pathways we identified between visual cortex and DMN. The dissociations we found in task responses, intrinsic functional connectivity and white matter connections all support the view that there are at least two distinct routes between visual and heteromodal DMN regions, yet this does not necessarily imply that there is a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. It would be interesting in future work to characterise the spatiotemporal dynamics of neural propagation along visualDMN pathways using methods optimised for studying the dynamics of information transmission, like Granger causality or travelling wave analysis.”

      We have also edited the wording of sentences in the introduction and discussion that we thought might imply directionality or transmission of information along these pathways, or to clarify the nature of the pathways (please see a couple of examples below):

      In the Introduction:

      “We identified dissociable pathways of connectivity between from different parts of visual cortex to and DMN subsystems “

      In the Discussion:

      “…pathways from visual cortex to DMN -> …pathways between visual cortex and DMN“.

      (2) Guidance to the readers about interpretation of the data

      The organisation of the visual cortex and the organisation of the DMN historically have been studied in parallel with little crosstalk between different communities of researchers. Thus, the work by Gonzalez Alam et al. has made a nice attempt to look at how visual processing goes beyond the realm of the visual cortex and continues into different subregions of the DMN.

      While the authors of this study have utilised multiple methods to obtain converging evidence, there are several important caveats in the interpretation of their results:

      (1) While the authors choose to use the term 'pathway' to call the inter-dependence between a set of visual regions and default-mode regions, their results have not convincingly demonstrated a definitive route of neural processing or travelling. Instead, the findings reveal a set of DMN regions are functionally more connected with object-related regions compared to place-related regions. The results are very much dependent on masking and thresholding, and the patterns can change drastically if different masks or thresholds are used.

      We would like to qualify that our findings do not only reveal a set of any “DMN regions that are functionally more connected with object-related regions compared to place-related regions”. Instead, we show a double dissociation based on our functional task responses: DMN regions that were more responsive to semantic decisions about objects are more functionally and structurally connected to visual regions more activated by perceiving objects, while DMN regions that were more responsive to spatial decisions are more connected to visual regions activated by the contrast of scene over object perception.

      We do not believe that the thresholding or masking involved in generating seeds strongly affected our results. We are reassured of this by two facts:

      (1) We re-analysed the resting-state data using a stricter clustering threshold and this did not change the pattern of results (see response below).

      (2) In response to a point by reviewer #2, we re-analysed the data eroding the masks of the MT-DMN, and this also didn’t change the pattern of results (please see response to reviewer 2).

      In this way, our results are robust to variations in mask shape/size and thresholding.

      (2) Ideally, if the authors could demonstrate the dynamics between the visual cortex and DMN in the primary task data, it would be very convincing evidence for characterising the journey from the visual cortex to DMN. Instead, the current connectivity results are derived from a separate set of resting state data. While the advantage of the authors' approach is that they are able to verify certain visual regions are more connected with certain DMN regions even under a task-free situation, it falls short of explaining how these regions dynamically interact to convert vision into semantic/spatial decision.

      We agree that a valuable future direction would be to collect evidence of spatiotemporal dynamics of propagation of information along these pathways. This could be the focus of future studies designed to this aim, and we have suggested this in the manuscript based on the reviewer’s suggestion. Furthermore, as stated above, we have now qualified our use of the term ‘pathway’ in the manuscript to avoid confusion.

      “These pathways refer to regions that are coupled, functionally or structurally, together, providing the potential for communication between them.”

      (3) There are several results that are difficult to interpret, such as their psychophysiological interactions (PPI), representational similarity analysis, and gradient analysis. For example, typically for PPI analysis, researchers interrogate the whole brain to look for PPI connectivity. Their use of targeted ROI is unusual, and their use of spatially extensive clusters that encompass fairly large cortical zones in both occipital and temporal lobes as the PPI seeds is also an unusual approach. As for the gradient analysis, the argument that the semantic task is higher on Gradient 1 than the spatial task based on the statistics of p-value = 0.027 is not a very convincing claim (unhelpfully, the figure on the top just shows quite a few blue 'spatial dots' on the hetero-modal end which can make readers wonder if the spatial context task is really closer to the unimodal end or it is simply the authors' statistical luck that they get a p-value under 0.05). While it is statistically significant, it is weak evidence (and it is not pertinent to the main points the authors try to make).

      To streamline the manuscript, we have moved the PPI and RSA results to the

      Supplementary Materials. However, we believe the gradient analysis is highly pertinent to understanding the functional separation of these pathways. In the revision, we show that not only was the contrast between the Semantic and Spatial tasks significant, in addition, the majority of participants exhibited a pattern consistent with the result we report. To show the results more clearly, we have added a supplementary figure (Figure S8) focussed on comparisons at the participant level.

      This figure shows the position in the gradient for each peak per participant per task. The peaks for each participant across tasks are linked with a line. Cases where the pattern was reversed are highlighted with dashed lines (7/27 participants in each gradient). This allows the reader and reviewer to verify in how many cases, at the individual level, the pattern of results reported in the text held (see “Supplementary Analysis: Individual Location of pathways in whole-brain gradients”).  

      (3) My suggestion for the authors

      There are several conceptual-level suggestions that I would like to offer to the authors:

      (1)  If the pathway explanation is the key argument that you wish to convey to the readers, an effective connectivity type of analysis, such as Granger causality or dynamic caudal modelling, would be helpful in revealing there is a starting point and end point in the pathway as well as revealing the directionality of neural processing. While both of these methods have their issues (e.g., Granger causality is not suitable for haemodynamic data, DCM's selection of seeds is susceptible to bias, etc), they can help you get started to test if the path during task performance does exist. Alternatively, travelling wave type of analysis (such as the results by Raut et al. 2021 published in Science Advances) can also be useful to support your claims of the pathway.

      As we have stated above, we agree with the reviewer that, given the pattern of results obtained in our work, analyses that characterise the spatiotemporal dynamics of transmission of information along the pathways would be of interest. However, we are concerned that our data is not well-optimised for these analyses.

      (2)  I think the thresholding for resting state data needs to be explained - by the look of Figure 2E and 3E, it looks like whole-brain un-thresholded results, and then you went on to compute the conjunction between these un-thresholded maps with network templates of the visual system and DMN. This does not seem statistically acceptable, and I wonder if the conjunction that you found would disappear and reappear if you used different thresholds. Thus, for example, if the left IFG cluster (which you have shown to be connected with the visual object regions) would disappear when you apply a conventional threshold, this means that you need to seriously consider the robustness of the pathway that you seek to claim... it may be just a wild goose that you are chasing.

      We believe the reviewer might be confused regarding the procedure we followed to generate the ROIs used in the pathways connectivity analysis. As stated in the last paragraph of the “Probe phase” and “Decision phase” results subsections, the maps the reviewer is referring to (Fig. 3E, for example) were generated by seeding the intersection of our thresholded univariate analysis (Fig. 3A) with network templates. In the case of Fig 3E, these are the Semantic>Spatial decision results after thresholding, intersected with Yeo DMN (MT, FT and Core, combined). These seeds were then entered into a whole-brain seed-based spatial correlation analysis, which was thresholded and cluster-corrected using the defaults of CONN. The same is true for Fig. 2E, but using the thresholded Probe phase

      Semantic>Context regions. Thus, we do not believe the objections to statistical rigour the reviewer is raising apply to our results.

      The thresholding of the resting-state data itself was explained in the Methods (Spatial Maps and Seed-to-ROI Analysis). As stated above, we thresholded using the default of the CONN software package we used (cluster-forming threshold of p=.05, equivalent to T=1.65). For increased rigour, we reproduced the thresholded maps from Figs 2E and 3E further increasing the threshold from p=.05, equivalent to T=1.65, to p=.001, equivalent to T=3.1. The resulting maps were very similar, showing minimal change with a spatial correlation of r > .99 between the strict and lax threshold versions of the maps for both the probe and decision seeds. This can be seen in Figure 2E and Figure 33E, which depict the maps produced with stricter thresholding. These maps can also be downloaded from the Neurovault collection, and the re-analysis is now reported in the Supplementary Materials (see section “Supplementary Analysis: Resting-state maps with stricter thresholding”) Probe phase (compare with Fig. 2E):

      (3) There are several analyses that are hard to interpret and you can consider only reporting them in the supplementary materials, such as the PPI results and representational similarity analysis, as none of these are convincing. These analyses do not seem to add much value to make your argument more convincing and may elicit more methodological critiques, such as statistical issues, the set-up of your representational theory matrix, and so on.

      We have moved the PPI and RSA results to the supplementary materials. We agree this will help us streamline the manuscript.  

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or multiple semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of restingstate functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as two parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based dissociation.

      (2) The manuscript contains a large number of analyses across multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the overall pattern of results reported in this work.

      We thank the reviewer for their remarks on the strengths of our manuscript.

      Weaknesses:

      (1) Throughout the manuscript, a strong distinction is drawn between semantic and spatial processing. However, given that only objects and spatial contexts were employed in the primary experiment, it is not clear that a broader conceptual distinction is warranted between "semantic" and "spatial" cognition. There are multiple grounds for concern regarding this basic premise of the manuscript.

      a. One can have conceptual knowledge of different types of scenes or spatial contexts. A city street will consistently differ from a beach in predictable ways, and a kitchen context provides different expectations than a living room. Such distinctions reflect semantic knowledge of scene-related concepts, but in the present work spatial and "all other" semantic information are considered and discussed as distinct and separate.

      The “building” contexts we created were arbitrary, containing beds, desks and an assortment of furniture that did not reflect usual room distributions, i.e., a kitchen next to a dining room. We have made this aspect of our stimuli clearer in the Materials section of the task. 

      “The learning phase employed videos showing a walk-through for twelve different buildings (one per video), shot from a first-person perspective. The videos and buildings were created using an interior design program (Sweet Home 3D). Each building consisted of two rooms: a bedroom and a living room/office, with an ajar door connecting the two rooms. The order of the rooms (1st and 2nd) was counterbalanced across participants. Each room was distinctive, with different wallpaper/wall colour and furniture arrangements. The building contexts created by these rooms were arbitrary, containing furniture that did not reflect usual room distributions (i.e., a kitchen next to a dining room), to avoid engaging further conceptual knowledge about frequently-encountered spatial contexts in the real world.”

      To help the reviewer and readers to verify this and come to their own conclusions, we have also added the videos watched by the participants to the OSF collection.

      “A full list of pictures of the object and location stimuli employed in this task, as well as the videos watched by the participants can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training. “

      We agree that scenes or spatial contexts have conceptual characteristics, and we actually manipulated conceptual information about the buildings in our task, in order to assess the neural underpinnings of this effect. In half of the buildings, the rooms/contexts were linked through the presence of items that shared a common semantic category (our “same category building” condition): this presented some conceptual scaffolding that enabled participants to link two rooms together. These buildings could then be contrasted with “mixed category buildings” where this conceptual link between rooms was not available. We found that right angular gyrus was important in the linking together of conceptual and spatial information, in the contrast of same versus mixed category buildings.

      b. As a related question, are scenes uniquely different from all other types of semantic/category information? If faces were used instead of scenes, could one expect to see different regions of the visual cortex coupling with task-defined face > object ROIs? The current data do not speak to this possibility, but as written the manuscript suggests that all (non-spatial) semantic knowledge should be processed by the FT-DMN.

      Thanks for raising this important point. Previous work suggests that the human visual system (and possibly the memory system, as suggested by Deen and Freiwald, 2021) is sensitive to perceptual categories important to human behaviour, including spatial, object and social information. Previous work (Silson et al., 2019; Steel et al., 2021) has shown domain-specific regions in visual regions (ventral temporal cortex; VTC) whose topological organisation is replicated in memory regions in medial parietal cortex (MPC) for faces and places. In these studies, adding objects to the analyses revealed regions sensitive to this category sandwiched between those responsive to people and places in VTC, but not in MPC. However, consistent with our work, the authors find regions sensitive to memory tasks for places and objects (as well as people) in the lateral surface of the brain. 

      Our study was not designed to probe every category in the human visual system, and therefore we cannot say what would happen if we contrasted social judgments about faces with semantic judgments about objects. We have added this point as a limitation and future direction for research:

      “Likewise, further research should be carried out on memory-visual interactions for alternative domains. Our study focused on spatial location and semantic object processing and therefore cannot address how other categories of stimuli, such as faces, are processed by the visual-tomemory pathways that we have identified. Previous work has suggested some overlap in the neurobiological mechanisms for semantic and social processing (Andrews-Hanna et al., 2014; Andrews-Hanna & Grilli, 2021; Chiou et al., 2020), suggesting that the FT-DMN pathway may be highlighted when contrasting both social faces and semantic objects with spatial scenes. On the other hand, some researchers have argued for a ‘third pathway’ for aspects of social visual cognition (Pitcher & Ungerleider, 2021; Pitcher, 2023). Future studies that probe other categories will be able to confirm the generality (or specificity) of the pathways we described.”

      c. Recent precision fMRI studies characterizing networks corresponding to the FT-DMN and MTL-DMN have associated the former with social cognition and the latter with scene construction/spatial processing (DiNicola et al., 2020; 2021; 2023). This is only briefly mentioned by the authors in the current manuscript (p. 28), and when discussed, the authors draw a distinction between semantic and social or emotional "codes" when noting that future work is necessary to support the generality of the current claims. However, if generality is a concern, then emphasizing the distinction between object-centric and spatial cognition, rather than semantic and spatial cognition, would represent a more conservative and bettersupported theoretical point in the current manuscript.

      We appreciate this comment and we have spent quite a bit of time considering what the most appropriate terminology would be. The distinction between object and spatial cognition is largely appropriate to our probe phase, although we feel this label is still misleading for two reasons:

      First, we used a range of items from different semantic categories, not just “objects”, although we have used that term as a shorthand to refer to the picture stimuli we presented. The stimuli include both animals (land animals, marine animals and birds) and man-made objects (tools, musical instruments and sports equipment). This category information is now more prominent in the rationale (Introduction) and the Methods to avoid confusion.

      Interested readers can also review our “object” stimuli in the OSF collection associated with this manuscript:

      Introduction: “…participants learned about virtual environments (buildings) populated with objects belonging to different, heterogeneous, semantic categories, both man-made (tools, musical instruments, sports equipment) and natural (land animals, marine animals, birds).”

      Methods:

      “A full list of pictures of the object and location stimuli employed in this task can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training.”

      Secondly, we manipulated the task demands so that participants were making semantic judgments about whether two items were in the same category, or spatial judgments about whether two rooms had been presented in the same building. Our use of the terms “semantic” and “spatial” was largely guided by the tasks that participants were asked to perform.

      We have revised the terminology used in the discussion to reflect this more conservative term. However, since the task performed was semantic in nature (participants had to judge whether items belonged to semantic categories), we have modified the term proposed by the reviewer to “object-centric semantics”, which we hope will avoid confusion.  

      (2) Both the retrosplenial/parieto-occipital sulcus and parahippocampal regions are adjacent to the visual network as defined using the Yeo et al. atlas, and spatial smoothness of the data could be impacting connectivity metrics here in a way that qualitatively differs from the (non-adjacent) FT-DMN ROIs. Although this proximity is a basic property of network locations on the cortical surface, the authors have several tools at their disposal that could be employed to help rule out this possibility. They might, for instance, reduce the smoothing in their multi-echo data, as the current 5 mm kernel is larger than the kernel used in Experiment 2's single-echo resting-state data. Spatial smoothing is less necessary in multiecho data, as thermal noise can be attenuated by averaging over time (echoes) instead of space (see Gonzalez-Castillo et al., 2016 for discussion). Some multi-echo users have eschewed explicit spatial smoothing entirely (e.g., Ramot et al., 2021), just as the authors of the current paper did for their RSA analysis. Less smoothing of E1 data, combined with a local erosion of either the MTL-DMN and VIS masks (or both) near their points of overlap in the RSFC data, would improve confidence that the current results are not driven, at least in part, by spatial mixing of otherwise distinct network signals.

      A: The proximity of visual peripheral and DMN-C networks is a property of these networks’ organisation (Silson et al., 2019; Steel et al., 2021), and we agree the potential for spatial mixing of the signal due to this adjacency is a valid concern. Altering the smoothing kernel of the multi-echo data would not address this issue though, since no connectivity analyses were performed in task data. The reviewer is right about the kernel size for task data (5mm), but not about the single echo RS data, which actually has lower spatial resolution (6mm). 

      Since this objection is largely about the connectivity analysis, we re-analysed the RS data by shrinking the size of the visual probe and DMN decision ROIs for the context task using fslmaths. We eroded the masks until the smallest gap between them exceeded the size of our 6mm FWHM smoothing kernel, which eliminates the potential for spatial mixing of signals due to ROI adjacency. The eroded ROIs can be consulted in the OSF collection associated with this project (see component “ROI Analysis/Revision_ErodedMasks”. The results, presented in the supplementary materials as “Eroded masks replication analysis”, confirmed the pattern of findings reported in the manuscript (see SM analysis below). We did not erode the respective ROIs for the semantic task, given that adjacency is not an issue there. 

      “Eroded masks replication analysis:

      The Visual-to-DMN ANOVA showed main effects of seed (F(1,190)=22.82, p<.001), ROI (F(1,190)=9.48, p=.002) and a seed by ROI interaction (F(1,190)=67.02, p<.001). Post-hoc contrasts confirmed there was stronger connectivity between object probe regions and semantic versus spatial context decision regions (t(190)=3.38, p<.001), and between scene probe regions and spatial context versus semantic decision regions (t(190)=-7.66, p<.001).

      The DMN-to-Visual ANOVA confirmed this pattern: again, there was a main effect of ROI (F(1,190)=4.3, p=.039) and a seed by ROI interaction (F(1,190)=57.59, p<.001), with posthoc contrasts confirming stronger intrinsic connectivity between DMN regions implicated in semantic decisions and object probe regions (t(190)=5.06, p<.001), and between DMN regions engaged by spatial context decisions and scene probe regions (t(190)=3.25, p=.001).”

      (3) The authors identify a region of the right angular gyrus as demonstrating a "potential role in integrating the visual-to-DMN pathways." This would seem to imply that lesion damage to right AG should produce difficulties in integrating "semantic" and "spatial" knowledge. Are the authors aware of such a literature? If so, this would be an important point to make in the manuscript as it would tie in yet another independent source of information relevant to the framework being presented. The closest of which I am aware involves deficits in cued recall performance when associates consisted of auditory-visual pairings (Ben-Zvi et al., 2015), but that form of multi-modal pairing is distinct from the "spatial-semantic" integration forwarded in the current manuscript.

      This is a very interesting observation. There is a body of literature pointing to AG (more often left than right) as an integrator of multimodal information: It has been shown to integrate semantic and episodic memory, contextual information and cross-modality content.

      The Contextual Integration Model (Ramanan et al., 2017) proposes that AG plays a crucial role in multimodal integration to build context. Within this model, information that is essential for the representation of rich, detailed recollection and construction (like who, when, and, crucially for our findings, what and where) is processed elsewhere, but integrated and represented in the AG. In line with this view, Bonnici et al (2016) found AG engagement during retrieval of multimodal episodic memories, and that multivariate classifiers could differentiate multimodal memories in AG, while unimodal memories were represented in their respective sensory areas only. Recent work examining semantic processing in temporallyextended narratives using multivariate approaches concurs with a key role of left AG in context integration (Branzi et al., 2020).

      In addition to context integration, other lines of work suggest a role of AG as an integrator across modalities, more specifically. Recent perspectives suggest a role of AG as a dynamic buffer that allows combining distinct forms of information into multimodal representations (Humphreys et al., 2021), which is consistent with the result in our study of a region that brings together semantic and spatial representations in line with task demands. Others have proposed a role of the AG as a central connector hub that links three semantic subsystems, including multimodal experiential representation (Xu et al., 2017). Causal evidence of the role of AG in integrating multimodal features has been provided by Yazar et al (2017), who studied participants performing memory judgements of visual objects embedded in scenes, where the name of the object was presented auditorily. TMS to AG impaired participants’ ability to retrieve context features across multiple modalities. However, these studies do not single out specifically right AG.

      Some recent proposals suggest a causal role of right AG as a key region in the early definition of a context for the purpose of sensemaking, for which integrating semantic information with many other modalities, including vision, may be a crucial part (Seghier, 2023). TMS studies suggest a causal role for the right AG in visual attention across space

      (Olk et al. 2015, Petitet et al. 2015), including visual search and the binding of stimulus- and response-characteristics that can optimise it (Bocca et al. 2015). TMS over the right AG disrupts the ability to search for a target defined by a conjunction of features (Muggleton et al. 2008) and affects decision-making when visuospatial attention is required (Studer et al. 2014). This suggests that the AG might contribute to perceptual decision-making by guiding attention to relevant information in the visual environment (Studer et al. 2014). These, taken together, suggest a causal role of right AG in controlling attention across space and integrating content across modalities in order to search for relevant information. 

      Most of this body of research points to left, rather than right, AG as a key region for integration, but we found regions of right AG to be important when semantic and spatial information could be integrated. We might have observed involvement of the right AG in our study, as opposed to the more-often reported left, given that people have to integrate semantic information with spatial context, which relies heavily on visuospatial processes predominantly located in right hemisphere regions (cf. Sormaz et al., 2017), which might be more strongly connected to right than left AG. 

      Lastly, we are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. We have added as a recommendation that patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration. We have added the following to the discussion:

      “We found a region of the right AG that was potentially important for integrating semantic and spatial context information. Previous research has established a key role of the AG in context integration (Ramanan et al., 2017; Bonnici et al., 2016; Branzi et al., 2020) and specifically, in guiding multimodal decisions and behaviour (Humphreys et al., 2021; Xu et al., 2017; Yazar et al., 2017). Although some recent proposals suggest a causal role of right AG in the early establishment of meaningful contexts, allowing semantic integration across modalities (Seghier, 2023; Olk et al., 2015, Petitet et al., 2015; Bocca et al., 2015; Muggleton et al. 2008), the majority of this research points to left, rather than right, AG as a key region for integration. However, we might have observed involvement of the right AG in our study given that people were integrating semantic information with spatial context, and right-lateralised visuospatial processes (cf. Sormaz et al., 2017) might be more strongly connected to right than left AG. We are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. Patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) I mentioned the numerous converging analyses reported in this manuscript as a strength. However, in practice, it also makes results in numerous dense figures (routinely hitting 7-8 sub-panels) and results paragraphs which, as currently presented, are internally coherent but are not assembled into a "bigger picture" until the discussion. Readers may have an easier time with the paper if introductions to the different analyses ("probe phase", "decision phase", etc.) also include a bigger-picture summary of how the specific analysis is contributing to the larger argument that is being constructed throughout the manuscript. This may also help readers to understand why so many different analysis approaches and decisions were employed throughout the manuscript, why so many different masks were used, etc.

      Thank you for this suggestion. We agree that the range of analyses and their presentation can make digesting them difficult. To address this, we have outlined our analyses rationale at the beginning of the results as a sort of “big picture” summary that links all analyses together, and added introductory paragraphs to each analysis that needed them (namely, the probe, decision, and pathway connectivity analyses, as the gradient and integration analyses already had introductory paragraphs describing their rationale, and the PPI/RSA analyses were moved to supplementary materials), linking them to the summary, which we reproduce below:

      “To probe the organisation of streams of information between visual cortex and DMN, our neuroimaging analysis strategy consisted of a combination of task-based and connectivity approaches. We first delineated the regions in visual cortex that are engaged by the viewing of probes during our task (Figure 2), as well as the DMN regions that respond when making decisions about those probes (Figure 3): we characterised both by comparing the activation maps with well-established DMN and object/scene perception regions, analysed the pattern of activation within them, their functional connectivity and task associations. Having characterised the two ends of the stream, we proceeded to ask whether they are differentially linked: are the regions activated by object probe perception more strongly linked to DMN regions that are activated when making semantic decisions about object probes, relative to other DMN regions? Is the same true for the spatial context probe and decision regions? We answered this question through a series of connectivity analyses (Figure 4) that examined: 1) if the functional connectivity of visual to DMN regions (and DMN to visual regions) showed a dissociation, suggesting there are object semantic and spatial cognition processing ‘pathways’; 2) if this pattern was replicated in structural connectivity; 3) if it was present at the level of individual participants, and, 4) we characterised the spatial layout, network composition (using influential RS networks) and cognitive decoding of these pathways. Having found dissociable pathways for semantic (object) and spatial context (scene) processing, we then examined their position in a high-dimensional connectivity space (Figure 5) that allowed us to document that the semantic pathway is less reliant on unimodal regions (i.e., more abstract) while the spatial context pathway is more allied to the visual system. Finally, we used uni- and multivariate approaches to examine how integration between these pathways takes place when semantic and spatial information is aligned (Figure 6).”

      (2) At various points, figures are arranged out of sequence (e.g., panel d is referenced after panel g in Figure 2) or are missing descriptions of what certain colors mean (e.g., what yellow represents in Figure 6d). This is a minor issue, but one that's important and easy to address in future revisions.

      We thank the reviewer for bringing this issue to our attention. We have added descriptions for the yellow colour to the figure legends of Figures 6 and 7 (now in supplementary materials, Figure S9).

      We have also edited the text to follow a logical sequence with respect to referencing the panels in Figures 2 and 3, where panel d is now referenced after panel c. Lastly, we reorganised the layout of Figure 4 to follow the description of the results in the text.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep learning models to analyse the dynamics of epithelia. In this way they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strengths:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is compelling.

      The methods presented in this work should prove to be very helpful for quantifying cell proliferation in epithelial tissues.

      We thank the reviewer for the positive comments!

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      Comments on revised version:

      Regarding the Reviewer's 1 comment on the architecture details, I have now understood that the precise architecture (number/type of layers, activation functions, pooling operations, skip connections, upsampling choice...) might have remained relatively hidden to the authors themselves, as the U-net is built automatically by the fast.ai library from a given classical choice of encoder architecture (ResNet34 and ResNet101 here) to generate the decoder part and skip connections.

      Regarding the Major point 1, I raised the question of the generalisation potential of the method. I do not think, for instance, that the optimal number of frames to use, nor the optimal choice of their time-shift with respect to the division time (t-n, t+m) (not systematically studied here) may be generic hyperparameters that can be directly transferred to another setting. This implies that the method proposed will necessarily require re-labeling, re-training and re-optimizing the hyperparameters which directly influence the network architecture for each new dataset imaged differently. This limits the generalisation of the method to other datasets, and this may be seen as in contrast to other tools developed in the field for other tasks such as cellpose for segmentation, which has proven a true potential for generalisation on various data modalities. I was hoping that the authors would try themselves testing the robustness of their method by re-imaging the same tissue with slightly different acquisition rate for instance, to give more weight to their work.

      We thank the referee for the comments. Regarding this particular biological system, due to photobleaching over long imaging periods (and the availability of imaging systems during the project), we would have difficulty imaging at much higher rates than the 2 minute time frame we currently use. These limitations are true for many such systems, and it is rarely possible to rapidly image for long periods of time in real experiments. Given this upper limit in framerate, we could, in principle, sample this data at a lower framerate, by removing time points of the videos but this typically leads to worse results. With some pilot data, we have tried to use fewer time intervals for our analysis but they always gave worse results. We found we need to feed the maximum amount of information available into the model to get the best results (i.e. the fastest frame rate possible, given the data available). Our goal is to teach the neural net to identify dynamic space-time localised events from time lapse videos, in which the duration of an event is a key parameter. Our division events take 10 minutes or less to complete therefore we used 5 timepoints in the videos for the deep learning model. If we considered another system with dynamic events which have a duration T when we would use T/t timepoints where t is the minimum time interval (for our data t=2min). For example if we could image every minute we would use 10 timepoints. As discussed below, we do envision other users with different imaging setups and requirements may need to retrain the model for their own data and to help with this, we have now provided more detailed instructions how to do this (see later).

      In this regard, and because the authors claimed to provide clear instructions on how to reuse their method or adapt it to a different context, I delved deeper into the code and, to my surprise, felt that we are far from the coding practice of what a well-documented and accessible tool should be.

      To start with, one has to be relatively accustomed with Napari to understand how the plugin must be installed, as the only thing given is a pip install command (that could be typed in any terminal without installing the plugin for Napari, but has to be typed inside the Napari terminal, which is mentioned nowhere). Surprisingly, the plugin was not uploaded on Napari hub, nor on PyPI by the authors, so it is not searchable/findable directly, one has to go to the Github repository and install it manually. In that regard, no description was provided in the copy-pasted templated files associated to the napari hub, so exporting it to the hub would actually leave it undocumented.

      We thank the referee for suggesting the example of (DeXtrusion, Villars et al. 2023). We have endeavoured to produce similarly-detailed documentation for our tools. We now have clear instructions for installation requiring only minimal coding knowledge, and we have provided a user manual for the napari plug-in. This includes information on each of the options for using the model and the outputs they will produce. The plugin has been tested by several colleagues using both Windows and Mac operating systems.

      Author response image 1.

      Regarding now the python notebooks, one can fairly say that the "clear instructions" that were supposed to enlighten the code are really minimal. Only one notebook "trainingUNetCellDivision10.ipynb" has actually some comments, the other have (almost) none nor title to help the unskilled programmer delving into the script to guess what it should do. I doubt that a biologist who does not have a strong computational background will manage adapting the method to its own dataset (which seems to me unavoidable for the reasons mentioned above).

      Within the README file, we have now included information on how to retrain the models with helpful links to deep learning tutorials (which, indeed, some of us have learnt from) for those new to deep learning. All Jupyter notebooks now include more comments explaining the models.

      Finally regarding the data, none is shared publicly along with this manuscript/code, such that if one doesn't have a similar type of dataset - that must be first annotated in a similar manner - one cannot even test the networks/plugin for its own information. A common and necessary practice in the field - and possibly a longer lasting contribution of this work - could have been to provide the complete and annotated dataset that was used to train and test the artificial neural network. The basic reason is that a more performant, or more generalisable deep-learning model may be developed very soon after this one and for its performance to be fairly compared, it requires to be compared on the same dataset. Benchmarking and comparison of methods performance is at the core of computer vision and deep-learning.

      We thank the referee for these comments. We have now uploaded all the data used to train the models and to test them, as well as all the data used in the analyses for the paper. This includes many videos that were not used for training but were analysed to generate the paper’s results. The link to these data sets is provided in our GitHub page (https://github.com/turleyjm/cell-division-dl- plugin/tree/main). In the folder for the data sets and in the GitHub repository, we have included the Jupyter notebooks used to train the models and these can be used for retraining. We have made our data publicly available at Zenodo dataset https://zenodo.org/records/10846684 (added to last paragraph of discussion). We have also included scripts that can be used to compare the model output with ground truth, including outputs highlighting false positives and false negatives. Together with these scripts, models can be compared and contrasted, both in general and in individual videos. Overall, we very much appreciate the reviewer’s advice, which has made the plugin much more user- friendly and, hopefully, easier for other groups to train their own models. Our contact details are provided, and we would be happy to advise any groups that would like to use our tools.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep-learning models to analyse the dynamics of epithelia. In this way, they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after the healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strength:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is solid.

      Weakness:

      Some aspects of the deep-learning models remained unclear, and the authors might want to think about adding details. First of all, for readers not being familiar with deep-learning models, I would like to see more information about ResNet and U-Net, which are at the base of the new deep-learning models developed here. What is the structure of these networks?

      We agree with the Reviewer and have included additional information on page 8 of the manuscript, outlining some background information about the architecture of ResNet and U-Net models.

      How many parameters do you use?

      We apologise for this omission and have now included the number of parameters and layers in each model in the methods section on page 25.

      What is the difference between validating and testing the model? Do the corresponding data sets differ fundamentally?

      The difference between ‘validating’ and ‘testing’ the model is validating data is used during training to determine whether the model is overfitting. If the model is performing well on the training data but not on the validating data, this a key signal the model is overfitting and changes will need to be made to the network/training method to prevent this. The testing data is used after all the training has been completed and is used to test the performance of the model on fresh data it has not been trained on. We have removed refence to the validating data in the main text to make it simpler and add this explanation to the methods. There is no fundamental (or experimental) difference between each of the labelled data sets; rather, they are collected from different biological samples. We have now included this information in the Methods text on page 24.

      How did you assess the quality of the training data classification?

      These data were generated and hand-labelled by an expert with many years of experience in identifying cell divisions in imaging data, to give the ground truth for the deep learning model.

      Reviewer #1 (Recommendations For The Authors):

      You repeatedly use 'new', 'novel' as well as 'surprising' and 'unexpected'. The latter are rather subjective and it is not clear based on what prior knowledge you make these statements. Unless indicated otherwise, it is understood that the results and methods are new, so you can delete these terms.

      We have deleted these words, as suggested, for almost all cases.

      p.4 "as expected" add a reference or explain why it is expected.

      A reference has now been included in this section, as suggested.

      p.4 "cell divisions decrease linearly with time" Only later (p.10) it turns out that you think about the density of cell divisions.

      This has been changed to "cell division density decreases linearly with time".

      p.5 "imagine is largely in one plane" while below "we generated a 3D z-stack" and above "our in vivo 3D image data" (p.4). Although these statements are not strictly contradictory, I still find them confusing. Eventually, you analyse a 2D image, so I would suggest that you refer to your in vivo data as being 2D.

      We apologise for the confusion here; the imaging data was initially generated using 3D z-stacks but this 3D data is later converted to a 2D focused image, on which the deep learning analysis is performed. We are now more careful with the language in the text.

      p.7 "We have overcome (...) the standard U-Net model" This paragraph remains rather cryptic to me. Maybe you can explain in two sentences what a U-Net is or state its main characteristics. Is it important to state which class you have used at this point? Similarly, what is the exact role of the ResNet model? What are its characteristics?

      We have included more details on both the ResNet and U-Net models and how our model incorporates properties from them on Page 8.

      p.8 Table 1 Where do I find it? Similarly, I could not find Table 2.

      These were originally located in the supplemental information document, but have been moved to the main manuscript.

      p.9 "developing tissue in normal homeostatic conditions" Aren't homeostatic and developing contradictory? In one case you maintain a state, in the other, it changes.

      We agree with the Reviewer and have removed the word ‘homeostatic’.

      p.9 "Develop additional models" I think 'models' refers to deep learning models, not to physical models of epithelial tissue development. Maybe you can clarify this?

      Yes, this is correct; we have phrased this better in the text.

      p.12 "median error" median difference to the manually acquired data?

      Yes, and we have made this clearer in the text, too.

      p.12 "we expected to observe a bias of division orientation along this axis" Can you justify the expectation? Elongated cells are not necessarily aligned with the direction of a uniaxially applied stress.

      Although this is not always the case, we have now included additional references to previous work from other groups which demonstrated that wing epithelial cells do become elongated along the P/D axis in response to tension.

      p.14 "a rather random orientation" Please, quantify.

      The division orientations are quantified in Fig. 4F,G; we have now changed our description from ‘random’ to ‘unbiased’.

      p.17 "The theories that must be developed will be statistical mechanical (stochastic) in nature" I do not understand. Statistical mechanics refers to systems at thermodynamic equilibrium, stochastic to processes that depend on, well, stochastic input.

      We have clarified that we are referring to non-equilibrium statistical mechanics (the study of macroscopic systems far from equilibrium, a rich field of research with many open problems and applications in biology).

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      In general, novelty over previous work does not seem particularly important. From a methodological point of view, the models are based on generic architectures of convolutional neural networks, with minimal changes, and on ideas already explored in general. The authors seem to have missed much (most?) of the literature on the specific topic of detecting mitotic events in 2D timelapse images, which has been published in more specialized journals or Proceedings. (TPMAI, CCVPR etc., see references below). Even though the image modality or biological structure may be different (non-fluorescent images sometimes), I don't believe it makes a big difference. How the authors' approach compares to this previously published work is not discussed, which prevents me from objectively assessing the true contribution of this article from a methodological perspective.

      On the contrary, some competing works have proposed methods based on newer - and generally more efficient - architectures specifically designed to model temporal sequences (Phan 2018, Kitrungrotsakul 2019, 2021, Mao 2019, Shi 2020). These natural candidates (recurrent networks, long-short-term memory (LSTM) gated recurrent units (GRU), or even more recently transformers), coupled to CNNs are not even mentioned in the manuscript, although they have proved their generic superiority for inference tasks involving time series (Major point 2). Even though the original idea/trick of exploiting the different channels of RGB images to address the temporal aspect might seem smart in the first place - as it reduces the task of changing/testing a new architecture to a minimum - I guess that CNNs trained this way may not generalize very well to videos where the temporal resolution is changed slightly (Major point 1). This could be quite problematic as each new dataset acquired with a different temporal resolution or temperature may require manual relabeling and retraining of the network. In this perspective, recent alternatives (Phan 2018, Gilad 2019) have proposed unsupervised approaches, which could largely reduce the need for manual labeling of datasets.

      We thank the reviewer for their constructive comments. Our goal is to develop a cell detection method that has a very high accuracy, which is critical for practical and effective application to biological problems. The algorithms need to be robust enough to cope with the difficult experimental systems we are interested in studying, which involve densely packed epithelial cells within in vivo tissues that are continuously developing, as well as repairing. In response to the above comments of the reviewer, we apologise for not including these important papers from the division detection and deep learning literature, which are now discussed in the Introduction (on page 4).

      A key novelty of our approach is the use of multiple fluorescent channels to increase information for the model. As the referee points out, our method benefits from using and adapting existing highly effective architectures. Hence, we have been able to incorporate deeper models than some others have previously used. An additional novelty is using this same model architecture (retrained) to detect cell division orientation. For future practical use by us and other biologists, the models can easily be adapted and retrained to suit experimental conditions, including different multiple fluorescent channels or number of time points. Unsupervised approaches are very appealing due to the potential time saved compared to manual hand labelling of data. However, the accuracy of unsupervised models are currently much lower than that of supervised (as shown in Phan 2018) and most importantly well below the levels needed for practical use analysing inherently variable (and challenging) in vivo experimental data.

      Regarding the other convolutional neural networks described in the manuscript:

      (1) The one proposed to predict the orientation of mitosis performs a regression task, predicting a probability for the division angle. The architecture, which must be different from a simple Unet, is not detailed anywhere, so the way it was designed is difficult to assess. It is unclear if it also performs mitosis detection, or if it is instead used to infer orientation once the timing and location of the division have been inferred by the previous network.

      The neural network used for U-NetOrientation has the same architecture as U-NetCellDivision10 but has been retrained to complete a different task: finding division orientation. Our workflow is as follows: firstly, U-NetCellDivision10 is used to find cell divisions; secondly, U-NetOrientation is applied locally to determine the division orientation. These points have now been clarified in the main text on Page 14.

      (2) The one proposed to improve the quality of cell boundary images before segmentation is nothing new, it has now become a classic step in segmentation, see for example Wolny et al. eLife 2020.

      We have cited similar segmentation models in our paper and thank the referee for this additional one. We had made an improvement to the segmentation models, using GFP-tagged E-cadherin, a protein localised in a thin layer at the apical boundary of cells. So, while this is primarily a 2D segmentation problem, some additional information is available in the z-axis as the protein is visible in 2-3 separate z-slices. Hence, we supplied this 3-focal plane input to take advantage of the 3D nature of this signal. This approach has been made more explicit in the text (Pages 14, 15) and Figure (Fig. 2D).

      As a side note, I found it a bit frustrating to realise that all the analysis was done in 2D while the original images are 3D z-stacks, so a lot of the 3D information had to be compressed and has not been used. A novelty, in my opinion, could have resided in the generalisation to 3D of the deep-learning approaches previously proposed in that context, which are exclusively 2D, in particular, to predict the orientation of the division.

      Our experimental system is a relatively flat 2D tissue with the orientation of the cell divisions consistently in the xy-plane. Hence, a 2D analysis is most appropriate for this system. With the successful application of the 2D methods already achieving high accuracy, we envision that extension to 3D would only offer a slight increase in effectiveness as these measurements have little room for improvement. Therefore, we did not extend the method to 3D here. However, of course, this is the next natural step in our research as 3D models would be essential for studying 3D tissues; such 3D models will be computationally more expensive to analyse and more challenging to hand label.

      Concerning the biological application of the proposed methods, I found the results interesting, showing the potential of such a method to automatise mitosis quantification for a particular biological question of interest, here wound healing. However, the deep learning methods/applications that are put forward as the central point of the manuscript are not particularly original.

      We thank the referee for their constructive comments. Our aim was not only to show the accuracy of our models but also to show how they might be useful to biologists for automated analysis of large datasets, which is a—if not the—bottleneck for many imaging experiments. The ability to process large datasets will improve robustness of results, as well as allow additional hypotheses to be tested. Our study also demonstrated that these models can cope with real in vivo experiments where additional complications such as progressive development, tissue wounding and inflammation must be accounted for.

      Major point 1: generalisation potential of the proposed method.

      The neural network model proposed for mitosis detection relies on a 2D convolutional neural network (CNN), more specifically on the Unet architecture, which has become widespread for the analysis of biology and medical images. The strategy proposed here exploits the fact that the input of such an architecture is natively composed of several channels (originally 3 to handle the 3 RGB channels, which is actually a holdover from computer vision, since most medical/biological images are gray images with a single channel), to directly feed the network with 3 successive images of a timelapse at a time. This idea is, in itself, interesting because no modification of the original architecture had to be carried out. The latest 10-channel model (U-NetCellDivision10), which includes more channels for better performance, required minimal modification to the original U-Net architecture but also simultaneous imaging of cadherin in addition to histone markers, which may not be a generic solution.

      We believe we have provided a general approach for practical use by biologists that can be applied to a range of experimental data, whether that is based on varying numbers of fluorescent channels and/or timepoints. We envisioned that experimental biologists are likely to have several different parameters permissible for measurement based on their specific experimental conditions e.g., different fluorescently labelled proteins (e.g. tubulin) and/or time frames. To accommodate this, we have made it easy and clear in the code on GitHub how these changes can be made. While the model may need some alterations and retraining, the method itself is a generic solution as the same principles apply to very widely used fluorescent imaging techniques.

      Since CNN-based methods accept only fixed-size vectors (fixed image size and fixed channel number) as input (and output), the length or time resolution of the extracted sequences should not vary from one experience to another. As such, the method proposed here may lack generalization capabilities, as it would have to be retrained for each experiment with a slightly different temporal resolution. The paper should have compared results with slightly different temporal resolutions to assess its inference robustness toward fluctuations in division speed.

      If multiple temporal resolutions are required for a set of experiments, we envision that the model could be trained over a range of these different temporal resolutions. Of course, the temporal resolution, which requires the largest vector would be chosen as the model's fixed number of input channels. Given the depth of the models used and the potential to easily increase this by replacing resnet34 with resnet50 or resnet101 the model would likely be able to cope with this, although we have not specifically tested this. (page 27)

      Another approach (not discussed) consists in directly convolving several temporal frames using a 3D CNN (2D+time) instead of a 2D, in order to detect a temporal event. Such an idea shares some similarities with the proposed approach, although in this previous work (Ji et al. TPAMI 2012 and for split detection Nie et al. CCVPR 2016) convolution is performed spatio-temporally, which may present advantages. How does the authors' method compare to such an (also very simple) approach?

      We thank the Reviewer for this insightful comment. The text now discusses this (on Pages 8 and 17). Key differences between the models include our incorporation of multiple light channels and the use of much deeper models. We suggest that our method allows for an easy and natural extension to use deeper models for even more demanding tasks e.g. distinguishing between healthy and defective divisions. We also tested our method with ‘difficult conditions’ such as when a wound is present; despite the challenges imposed by the wound (including the discussed reduction in fluorescent intensities near the wound edge), we achieved higher accuracy compared to Nie et al. (accuracy of 78.5% compared to our F1 score of 0.964) using a low-density in vitro system.

      Major point 2: innovatory nature of the proposed method.

      The authors' idea of exploiting existing channels in the input vector to feed successive frames is interesting, but the natural choice in deep learning for manipulating time series is to use recurrent networks or their newer and more stable variants (LSTM, GRU, attention networks, or transformers). Several papers exploiting such approaches have been proposed for the mitotic division detection task, but they are not mentioned or discussed in this manuscript: Phan et al. 2018, Mao et al. 2019, Kitrungrotaskul et al. 2019, She et al 2020.

      An obvious advantage of an LSTM architecture combined with CNN is that it is able to address variable length inputs, therefore time sequences of different lengths, whereas a CNN alone can only be fed with an input of fixed size.

      LSTM architectures may produce similar accuracy to the models we employ in our study, however due to the high degree of accuracy we already achieve with our methods, it is hard to see how they would improve the understanding of the biology of wound healing that we have uncovered. Hence, they may provide an alternative way to achieve similar results from analyses of our data. It would also be interesting to see how LTSM architectures would cope with the noisy and difficult wounded data that we have analysed. We agree with the referee that these alternate models could allow an easier inclusion of difference temporal differences in division time (see discussion on Page 20). Nevertheless, we imagine that after selecting a sufficiently large input time/ fluorescent channel input, biologists could likely train our model to cope with a range of division lengths.

      Another advantage of some of these approaches is that they rely on unsupervised learning, which can avoid the tedious relabeling of data (Phan et al. 2018, Gilad et al. 2019).

      While these are very interesting ideas, we believe these unsupervised methods would struggle under the challenging conditions within ours and others experimental imaging data. The epithelial tissue examined in the present study possesses a particularly high density of cells with overlapping nuclei compared to the other experimental systems these unsupervised methods have been tested on. Another potential problem with these unsupervised methods is the difficulty in distinguishing dynamic debris and immune cells from mitotic cells. Once again despite our experimental data being more complex and difficult, our methods perform better than other methods designed for simpler systems as in Phan et al. 2018 and Gilad et al. 2019; for example, analysis performed on lower density in vitro and unwounded tissues gave best F1 scores for a single video was 0.768 and 0.829 for unsupervised and supervised respectively (Phan et al. 2018). We envision that having an F1 score above 0.9 (and preferably above 0.95), would be crucial for practical use by biologists, hence we believe supervision is currently still required. We expect that retraining our models for use in other experimental contexts will require smaller hand labelled datasets, as they will be able to take advantage of transfer learning (see discussion on Page 4).

      References :

      We have included these additional references in the revised version of our Manuscript.

      Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231. >6000 citations

      Nie, W. Z., Li, W. H., Liu, A. A., Hao, T., & Su, Y. T. (2016). 3D convolutional networks-based mitotic event detection in time-lapse phase contrast microscopy image sequences of stem cell populations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 55-62).

      Phan, H. T. H., Kumar, A., Feng, D., Fulham, M., & Kim, J. (2018). Unsupervised two-path neural network for cell event detection and classification using spatiotemporal patterns. IEEE Transactions on Medical Imaging, 38(6), 1477-1487.

      Gilad, T., Reyes, J., Chen, J. Y., Lahav, G., & Riklin Raviv, T. (2019). Fully unsupervised symmetry-based mitosis detection in time-lapse cell microscopy. Bioinformatics, 35(15), 2644-2653.

      Mao, Y., Han, L., & Yin, Z. (2019). Cell mitosis event analysis in phase contrast microscopy images using deep learning. Medical image analysis, 57, 32-43.

      Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Takemoto, S., Yokota, H., Ipponjima, S., ... & Chen, Y. W. (2019). A cascade of 2.5 D CNN and bidirectional CLSTM network for mitotic cell detection in 4D microscopy image. IEEE/ACM transactions on computational biology and bioinformatics, 18(2), 396-404.

      Shi, J., Xin, Y., Xu, B., Lu, M., & Cong, J. (2020, November). A Deep Framework for Cell Mitosis Detection in Microscopy Images. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 100-103). IEEE.

      Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux, M., ... & Kreshuk, A. (2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife, 9, e57613.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One concern is regarding the experimental task design. Currently, only subjective reports of interoceptive intensity are taken into account, the addition of objective behavioural measures would have given additional value to the study and its impact. 

      To address this comment, we calculated interoceptive accuracy during the cardiorespiratory perturbation (isoproterenol) task according to our previous methods (e.g., Khalsa et al 2009 Int J Psychophys, Khalsa et al, 2015 IJED, Khalsa et al 2020 Psychophys, Hassanpour et al, 2018 NPP, Teed et al 2022 JAMA Psych). Thus, we quantified interoceptive accuracy as the cross-correlation between heart rate and real-time cardiorespiratory perception; specifically, the zero-lag cross-correlation between the heart rate and dial rating time series, and the maximum cross-correlation between these time series while allowing for different temporal delays (or lags). As expected, we found a dose-related increase in interoceptive accuracy from the 0.5mcg moderate perturbation dose (for which neuroimaging maps were not included in the current study) to the 2.0mcg high perturbation dose: zero-lag cross-correlations of 0.25 and 0.61, maximum cross-correlations of 0.41 and 0.73, for 0.5mcg and 2.0mcg doses, respectively, when averaged across all participants in the current study. Taking a closer examination at just the 2.0mcg dose, there were no group differences in zero-lag cross-correlation (t89\=-0.68, p=0.50) or maximum cross-correlation (t87\=-1.0, p=0.32) (depicted below, panel A). Furthermore, there were no associations between either of these interoceptive accuracy measures and the magnitude of activation within bilateral dysgranular convergent regions (F1\= 0.27 and 0.01, p=0.61 and 0.91, for the main effect of percent signal change on max and zero-lag cross-correlations, respectively; depicted below, panel B). When considering the significant correlation between the right insula signal intensity and subjective dial ratings, this lack of association with interoceptive accuracy suggests that the right dysgranular convergent insula was preferentially tracking the magnitude estimation rather than accuracy facet of interoceptive awareness during cardiorespiratory perturbation. Notably, during the saline placebo infusion, there were no systematic changes in heart rate and thus no systematic change in dial rating, precluding the calculation of the cross-correlation as a measure of interoceptive accuracy.

      In reviewing these findings, we did not feel that the results add meaningful information to our interpretation of convergence, and accordingly we have chosen not to include it in the manuscript.

      Author response image 1.

      (A) Interoceptive accuracy during 2.0mcg isoproterenol perturbation, as measured by the maximum (left panel) and zero-lag (right panel) cross-correlation between the time series of heart rate and perceptual dial rating. There were no differences between groups. (B) There were no associations between interoceptive accuracy ratings and signal intensity within the convergence dysgranular insula during the Peak period of 2.0mcg perturbation. 

      This brings me to my second concern. The authors mostly refer to their own previous work, without highlighting other methods used in the field. Some tasks measure interoceptive accuracy or other behavioural outcomes, instead of merely subjective intensity. Expanding the scientific context would aid the understanding and integration of this study with the rest of the field. 

      Given our focus on the neural basis of bottom-up perturbations of interoception, we found it relevant to reference previous studies from our lab, as we built directly upon these previous findings to inform the hypotheses and design of the current experiment, but we can appreciate to provide a broader view of the literature. To expand the contextual frame, we have cited two fMRI meta-analyses of cardiac and gastrointestinal interoception (line 101). There are few studies that have used comparable perturbation approaches during neuroimaging in clinical populations, although we have referenced an exemplar study from the respiratory domain by Harrison et al (2021) in the discussion (line 612). In considering this comment more carefully, we felt that expanding the context further to other task-based methods or behavioral outcomes would shift the focus beyond our emphasis on the insular cortex and top-down/bottom-up convergence, though we have previously discussed and integrated such approaches (e.g., Khalsa & Lapidus, 2016 Front Psych, Khalsa et al, 2018 Biol Psychiatry CNNI, Khalsa et al 2022, Curr Psych Rep).

      Lastly, the suggestions for future research lack substance compared to the richness of the discussion. I recommend a slight revision of the introduction/discussion. There is text in the discussion (explanatory or illuminating) which is better suited to the introduction. 

      When discussing our study limitations (beginning line 732), we offer numerous areas for future research including different preprocessing pipelines, more sophisticated analysis techniques (such as multivariate pattern analysis) that would allow for individual-level inferences regarding convergent patterns of activation within the insula. However, we have revised the last sentence of our limitations paragraph (line 757), and have added more specificity regarding future approaches examining insular and whole-brain interoceptive signal flow.

      Reviewer 2:

      (1) The interpretation of the resting-state data is not quite as clear-cut as the task-based data - as presented currently, changes could potentially represent fluctuations over time rather than following interoception specifically. In contrast, much stronger conclusions can be drawn from the authors' task-based data. …I was also unsure about the interpretation of the resting state analysis (Figure 5), as there was no control condition without interoceptive tasks, meaning any change could represent a change over time that differed between groups and not necessarily a change from pre- to post-interoception. Relatedly I wondered if the authors had calculated the test-retest reliability of the resting state data (e.g. intraclass correlation coefficients for the whole-brain functional connective of convergent dysgranular insula subregions and left middle frontal gyrus before vs. after the tasks), as it would be generally useful for the field to know its stability. 

      We have acknowledged the lack of a control condition in the isoproterenol task (note that the VIA task contained an exteroceptive trial that was included in the brain image contrast analysis). We have also provided further justification for our approach in both the Methods (see the first paragraph “fMRI resting state analysis” subsection) and Results (see the last paragraph of the “Convergence analysis” subsection). We cannot estimate test-retest reliability from the current dataset, given that we do not have resting state scans separated by a similar time frame without the performance of the interoceptive tasks in between (this is now clarified in line 346).

      (2) The transdiagnostic sample could be better characterised in terms of diagnostic information, and was almost entirely female; it is also unclear what the effect of psychotropic medications may have been on the results given the effects of (e.g.) serotonergic medication on the BOLD signal. …Table 1 would be substantially improved by a fuller clinical characterisation of the specific sample included in the analysis - the diagnostic acronyms included in the table caption are not used in the table itself at present and would be an excellent addition, describing, for example, the demographics and symptom scores of patients meeting criteria for MDD, GAD, and AN (and perhaps those meeting criteria for more than 1). Similarly, additional information about the specific medications patients (or controls?) were taking in this study would be welcome (given the potential influences of common medications (e.g. antidepressants) on neurovascular coupling). 

      We have expanded Table 1 to include more specific diagnostic information for the transdiagnostic ADE group (GAD, MDD, and/or AN, as well as other psychiatric diagnoses). We have also included medication use.  

      Finally, Figures 7c and 7d would be greatly improved by showing individual data points if possible, and there may be a typo in the caption 'The cardiac group reported higher cardiac intensity ratings in the ADE group'.

      We have adjusted Figure 7c and 7d to include individual data points, as we agree that this provides greater transparency to the data itself. We have also fixed the typo in the figure caption.

      (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example, due to physiological correction in one but not both tasks. Although I note this is mentioned in the limitations, it was not clear to me why physiological noise was removed from the ISO task and whether it would be possible to do the same in the VIA task, which could be important for the most robust comparison of the two. 

      In this study, we intentionally chose different task-specific preprocessing pipelines so we could ensure that our results were not simply due to new ways of handling the data. This would allow us to evaluate evidence of replicating the previous group-level findings of insular activation that informed the current approach and hypotheses. We agree that a harmonized approach is also merited, and in a subsequent project using this dataset, we have matched preprocessing pipelines for a connectivity-based analysis, to best facilitate comparison across tasks. We look forward to sharing those results with the scientific community in due time.

      Reviewer 3:

      Maybe I missed it (and my apologies in case I did), but there were a few instances where it was not entirely clear whether differential effects (say between groups or conditions) were compared directly, as would be required. One example is l. 459 ff: The authors report the interesting lateralisation effect for the two interception tasks and say it was absent in the exteroceptive VIA task. As a reader, it would be great to know whether that finding (effect in one condition but not in the other) is meaningful, i.e. whether the direct comparison becomes statistically significant. … The same applies to later comparisons, for example, the correlations reported in l. 465 ff (do these differ from one another?) as well as the FC patterns reported in l. 476 ff - again, there is a specific increase in the ADE group (but not in the HC), but is this between-group difference statistically meaningful? 

      Thank you for these questions. We have added greater detail in the Results section in order to increase clarity regarding which statistical comparisons support which conclusions. Generally, we limited our comparisons to the effect of group, as comparing ADE vs. HC individuals was of primary interest, and in some cases also the effect of hemisphere and epoch. However, we did not perform exhaustive comparisons for all measures, in the interest of keeping the focus of our multi-level multi-task analysis on the hypothesis-driven questions specifically related to convergence of top-down and bottom-up processing.

      Regarding the comment asking if we could compare the lateralization effect directly across task conditions (i.e., is there a greater difference between hemispheres in the ISO task compared to VIA?): unfortunately, directly comparing signal intensity across tasks is not possible because the isoproterenol infusion induces physiological changes that can cause some dose-related signal reduction (we have attempted to address this in the past, e.g., Hassanpour et al, 2018 HumBrMapp). Consequently, our conclusions about spatial localization of top-down and bottom-up convergence are limited to group-level comparisons based on binary activation.

      (2) A second 'major' relates to the intensity ratings (l. 530 ff). I found it very interesting that the ADE group reported higher cardiac, but lower exteroceptive intensity ratings during the VIA task. I understand the authors' approach to collapse within the ADE group, but it would be great to know which subgroup of patients drives this differential effect. It could be the case that the cardiac effect is predominantly present in the anxiety group, while the lower exteroceptive ratings are driven by the depression patients. Even if that were not the case, it would be highly instructive to understand the rating pattern within the anxiety group in greater detail. Do these patients 'just' selectively upregulate interoception, or is there even a perceived downregulation of exteroceptive signalling? 

      We have depicted these data below for reviewers’ reference, showing individual responses for each group (HC and ADE; panel A), as well as the ADE individuals separated by primary diagnosis (GAD = generalized anxiety disorder, n=24; AN = anorexia nervosa, n=16; MDD = major depressive disorder, n=6; panel B). When tested via linear regression, we found no differences in ratings across ADE subgroups (rating ~ subgroup * condition, F3\=1.71, p=0.16 for main effect of subgroup). However, several factors should be considered in interpreting this result: first, all subgroups are small, particularly the MDD sample. Second, while these diagnostic labels refer to the most prominent symptom expression of each patient, every clinical participant in the study had a co-morbid disorder. Therefore, it is not possible to isolate disorder-specific pathology from our multi-diagnostic sample, and for this reason we refrained from including the subgroup-specific data in the manuscript.

      Author response image 2.

      (A) Post-trial ratings during the Visceral Interoceptive attention task, for reference. This is also shown in Figure 7D. (B) The same post-trial ratings in (A), but with the ADE group separated by primary diagnoses. Importantly, although assigned to one diagnostic category on the basis of most prominent symptom expression, most patients had one or more comorbidities across disorders. GAD = Generalized Anxiety Disorder. MDD = major depressive disorder. AN = anorexia nervosa. HC = healthy comparison.

      l. 86: 'Conscious experience' of what, precisely? During the first round of reading, I was wondering about the extent to which consciousness as a general concept will play a role, which could be misleading. 

      We have changed it to “conscious experience of the inner body” in the text. The current study is limited in scope to the neurobiology of conscious perceptions of the inner body, not consciousness as a general phenomenon. We hope this distinction is now clear.

      l.115: Particularly given the focus on predictive processing, I was wondering whether the (slightly outdated) spotlight metaphor is really needed here. 

      While not perfect, we believe it is still valid to metaphorically reference goal-directed attention towards the body as an “attentional spotlight”. Given the concern, we have minimized the focus on this metaphor, and the sentence now reads as follows:

      “Extending beyond these model-based influences are goal-directed activities (also described previously as the ‘attentional spotlight’ effect ((Brefczynski and DeYoe 1999)), whereby focusing voluntary attention towards certain environmental signals not only alters their conscious experience but selectively enhances neural activity in the responsive area of cortex.”

      l. 129 ff: The sentence has three instances of 'and' in it, most likely a typo. 

      We have fixed this in the text.

      l. 245: What do these ratings correspond to, i.e. what was the precise question/instruction? 

      The instructions for subjective ratings in each task are mentioned in the Methods (line 223 for ISO task, line 249 for the VIA task), and we have added more detail regarding the scale used to collect subjective intensity ratings.

      l. 322: Could you provide the equation of the LMEM in the main text? It would be interesting to know e.g. whether participants/patients were included as a random effect. 

      We have provided this equation in the Methods (line 326).

      l. 418 ff: I was confused about the statistical approach here. Why use separate t-tests instead of e.g. another LMEM which would adequately model task and condition factors? 

      We did not use t-tests, but instead used linear regression to look at differences in agranular PSC across groups, hemispheres, and epochs, as well as potential associations between PSC and trait measures. We have adjusted the wording in this Methods paragraph (line 418) to help clarity.

      l. 425: As a general comment, it would be great to provide the underlying scripts openly through GitHub, OSF, ... 

      We agree with this comment, and our main analysis scripts have been posted on our OSF as an addition to the original preregistration of this work (https://osf.io/6nxa3/).

      l. 443: For consistency, please report the degrees of freedom for the X² test.

      l. 454: ... and the F statistic would require two degrees of freedom (only the second is reported).

      l. 523: The t value is reported without degrees of freedom here (but has them in other instances).

      l. 540: Typo ('were showed').

      We have reported degrees of freedom for all statistics.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for taking the time to review our manuscript. We are grateful to reviewer #1 for positive evaluation of our work and for providing valuable comments that will significantly enhance the presentation of our results. We understand reviewer #2's negative assessment because we did not discuss an alternative model of dosage compensation in Drosophila. We will address this omission in the Introduction section of the revised manuscript and remove any controversial statements from other parts of the text. However, it is important to clarify that our study does not focus on the mechanisms of dosage compensation. The main goal of the manuscript was to investigate the assembly of the MSL complex and its specific binding to the Drosophila X chromosome. We utilized male survival data to demonstrate the efficacy of MSL complex binding to the X chromosome, a relationship that has been supported by numerous independent studies. We understand that Reviewer #2 agrees that disruption of the MSL complex binding results in male lethality. As far as we understand, Reviewer #2 suggests that the MSL complex does not activate transcription of X chromosome genes, but instead facilitate the recruitment of MOF protein and potentially other general transcription factors to the X chromosome. This could explain the decrease in autosomal gene expression due to a reduction in activating factors like MOF at autosomal promoters. In the upcoming revision, we aim to strike a balance between the two models that elucidate dosage compensation in Drosophila. We appreciate your feedback and look forward to enhancing the clarity and coherence of our manuscript based on your insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      We thank the reviewer for the positive assessment of the experimental work done.

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila.

      We completely agree with this reviewer's claim. In the Introduction section we attempted to make clear that there are two models for the functional role of specific recruitment of the MSL complex to the X chromosome in males.

      Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors use sex-specific lethality as a measure of disruption of dosage compensation, but other modulations of gene expression are the likely cause of these viability effects.

      Sun et al (2013, PNAS 110: E808-817) showed that recruitment of the MSL complex-specific subunit MSL2 or the MOF protein to the UAS promoter resulted in recruitment of the entire MSL complex in males but not transcriptional activation. This important result argues that the MSL complex does not activate transcription. However, it must be taken into account that the GAL4 DNA binding region used to recruit the chimeric MSL2 protein to the UAS promoter was directly fused to the MSL2 RING domain, which is critical for interaction of MSL2 with MSL1 and its ubiquitination activity (this activity could potentially be involved in transcription activation). It also remains poorly understood what happens to the MSL complex after recruitment to the promoters or HAS on the X chromosome. Subcomplex MSL1/MSL3/MOF can acetylate TF and H4K16 during RNA polymerase II elongation, resulting in increasing of transcription. The separate role of MSL2 and MSL1 in the activation of transcription of gene promoters is also shown. Sun et al. showed that in females, recruitment of MOF to the UAS promoter leads to a strong increase in transcription, which is associated with the inclusion of MOF in the non-specific lethal (NSL) complex, which is bound to promoters and is required for strong transcription activation. In males, MOF is preferentially recruited to the UAS promoter in the full MSL complex or perhaps in the MSL1/MSL3/MOF subcomplex, which stimulates transcription during RNA polymerase II elongation much less strongly than NSL complex. The same result was obtained in the Prestel et al. 2010 (Mol Cell 38:815-26). In this study the GAL4 binding sites were inserted upstream of the lacZ and mini-white genes. Activation of transcription after recruitment of GAL4-MOF to the GAL4 sites was studied in males and females. As in Sun et al. 2013, strong activation of the reporter was observed in females. A weak transcriptional activation of the reporter gene in males was shown, and the MOF protein was detected not only on the promoter, but also in the coding and 3’ regions of the reporter.

      We do not understand how the paper by Aleman et al (Cell Reports 35: 109236, 2021) is consistent with the hypothesis that the MSL complex is not involved in the transcriptional activation of X chromosomal genes. The main conclusions of this paper: 1) Inactivation of Mtor leads to selective activation of the male X chromosome. 2) Mtor-driven attenuation of male X occurs in broad domains linked by the MSL complex. 3) Mtor genetically interacts with MSL components and reduces male mortality; 4) Mtor restrains dose-compensated expression at the level of nascent transcription. Thus, the paper shows that the MSL complex has an activator activity that is partially inhibited by Mtor. Accordingly, inactivation of Mtor only partially restored the survival of males in which dosage compensation was not completely inactivated.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550).

      We agree that an alternative model of the dosage compensation mechanism is reasonable. We can assume that both mechanisms can function jointly provide effective dosage compensation in Drosophila males. At the suggestion of the reviewer to reconsider the entire context of the article, we will make many small changes throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Overall, I found the text well written and the figures logically organized (especially Figure 5, which had the potential to confuse). The authors especially excelled in bringing together the decades of literature in the Discussion.

      I offer several suggestions to improve the readability:

      Consider presenting the coiled-coil domain homology in Figure 1A as a contrast for the N-terminal region, which the authors claim is poorly conserved.

      We added the coiled-coil domain homology in Figure 1A in new version of the manuscript.

      It is difficult to visualize the red MSL2 in Figure 2; the green and red panels should be presented separately in the main text, as they are in the Supplemental Figure 2.

      We prepared Figure 2 with separate green and red panels.

      The ChIP-seq experiments for MSL proteins are well presented, but in my opinion, add little to the overall conclusions:

      Figure 6 mostly recapitulates what has already been published and utilized by several groups, most recently the authors themselves (Tikhonova 2019): that MSL expressed in females targets the X/HAS, similar to in males. While these are nice supporting data for the female transgenic system, I do not believe this figure should be prominently featured as if this is a novelty of the current study.

      We fully agree with the reviewer's comment about the limitation of scientific novelty in Figure 6. It has an auxiliary meaning. Therefore, we transferred this figure to Supplementary material (as supplement for Figure 5).

      The ChIP experiments in Figure 7 agree with the conclusions in Figures 2 and 3 (polytene chromosome immunostaining) when it comes to X/autosome localization. I believe it would help with the flow of the paper if these experiments were combined or at least placed closer together in the narrative, rather than falling at the end.

      We moved Figure 7 (in new version – Figure 5) closer to polytene chromosome immunostaining. We agree with reviewer that this placement of the figure will make it easier to perceive the meaning of the article as a whole.

      I find Figure 8 difficult to understand, especially since the "clusters" are not annotated in the figure, but are described in the text. I struggled to follow the authors' conclusions based on these data. The authors could clarify the figure with annotations, although to be honest I do not currently see the value of this analysis/figure.

      In the new version of the article, we changed this part: we removed clusters for autosomes as difficult for understanding and non-important for this manuscript. Also we tried to place emphasis more clearly in the text of the article for clusters 1 and 2 that characterize HAS.

    1. Author response:

      We thank the reviewers for their time and thoughtful comments. We are encouraged that all reviewers found our work novel and clear. We will submit a full revision to address all the points the reviewers made. Below, we briefly highlight a few clarifications and planned analyses to address major concerns; all other concerns raised by the reviewers will also be addressed in the revision.

      Reviewers #1 and #3 asked whether the variability in grid properties emerged with experience/time in the environment. We agree that this is an interesting question, and we will re-analyze the data to explore whether between-cell variability increases with time within a session. However, we note that since the rats were already familiarized to the environment for 10-20 sessions prior to the recordings, there may be limited additional changes in between-cell variability between recording sessions. In one case, two sessions from the same rat were recorded on consecutive days (R11/R12 and R21/R22) - these sessions did not show any difference in variability. 

      Reviewer #2 noted that the variability in grid properties is known to experimentalists. We will tone down our discussion on the current assumptions in the field to accurately reflect this awareness in the community. However, we would like to emphasize that the lack of work carefully examining the robustness of this variability has prevented a firm understanding of whether this is an inherent property of grid cells or due to noise. The impact of this can be seen in theoretical neuroscience work where a considerable number of articles (including recent publications) start with the assumption that all grid cells within a module have identical properties, with the exception of phase shift and noise. In addition, since grid cells are assumed to be identical in the computational neuroscience community, there has been little work on quantifying how much variability a given model produces. This makes it challenging to understand how consistent different models are with our observations. We believe that making these limitations of previous work clear is important to properly conveying our work’s contribution. 

      Reviewer #3 asked whether the variability in grid properties could be driven by cells that were conjunctively tuned with head direction. We agree that this is an interesting hypothesis and will explore this by performing new analysis. We note that, as reported by Gardner et al. (2022), only 19 of the 168 cells in recording session R12 are conjunctive. Even if these cells are included in the same proportion as pure grid cells by our inclusion criteria (which appears unlikely, given that conjunctive cells may be less reliable across splits of the data), then approximately 9 out of the 82 cells we analyzed would be conjunctive. Therefore, we expect it to be unlikely that they are the main source of the variability we find. However, we will test this in our revised manuscript.

      Reviewer #3 asked whether the “price” paid in having grid property variability was too high for the modest gain in ability to encode local space. We agree that losing the continuous attractor network (CAN) structure, and the ability to path integrate, would be a very large loss. However, we do not believe that the variability we observe necessarily destroys either CAN or path integration. We argue this for two reasons. First, the data we analyzed [from Gardner et al. (2022)] is exactly the data set that was found to have toroidal topology and therefore viewed to be in line with a major prediction of CANs. Thus, the amount of variability in grid properties does not rule out the underlying presence of a continuous attractor. Second, path integration may still be possible with grid cells that have variable properties. To illustrate this, and to address another comment from Reviewer #3, we have begun to analyze the distribution of grid properties in a recurrent neural network (RNN) model trained to perform path integration (Sorscher et al., 2019). This RNN model, in addition to others (Banino et al., 2018; Cueva and Wei, 2018), has been found to develop grid cells and there is evidence that it develops CANs as the underlying circuit mechanism (Sorscher et al., 2023). We find that the grid cells that emerge from this model exhibit variability in their grid spacings and orientations. This illustrates that path integration (the very task the RNN was trained to perform) is possible using grid cells with variable properties.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript. Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript. Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:<br /> While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:<br /> Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.<br /> One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure. The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:<br /> The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.<br /> Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.<br /> Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.<br /> Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:<br /> The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:<br /> Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed.

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices. 1. Testing the sensor in other cell lines 2. Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP-43.

      Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank Reviewer #1 for their detailed feedback. In response, we will investigate the function of CUTS in neuronal cells and evaluate how a modest reduction in TDP-43 levels affects the splicing of physiologically relevant TDP-43-regulated cryptic exons within these cells (eg. STMN2, UNC13A, etc…).

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      We thank Reviewer #2 for their constructive evaluation of our study. In response, we will assess CUTS in human neuronal cells, as also recommended by Reviewer #1. Additionally, we will incorporate an analysis of CUTS using flow cytometry to provide quantitative measurements of GFP signal. We agree that investigating how CUTS responds to stressors affecting TDP-43 function would be a valuable addition (eg. MG132), and we will include this data in the revisions to the study.

      We also appreciate the feedback on our figures and will work to enhance their clarity, incorporating the Reviewer’s suggestions. Specifically, we will split Figure 2D and 2G into multiple plots and ensure clearer labeling of the image panels in Figures 2A and 4B.

      Regarding the comment on the 5FL data, we believe this occurrence can be explained by existing literature, and we will address this directly in the discussion section of the manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      We thank Reviewer #3 for their time and thoughtful assessment of our manuscript. We will address all their recommendations, including expanding the discussion on the CE sequences utilized in the CUTS sensor and exploring the potential utility of the CUTS sensor in alternative disease-relevant systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This preprint explores the involvement of cyclic di-GMP in genome stability and antibiotic persistence regulation in bacterial biofilms. The authors proposed a novel mechanism that, due to bacterial adhesion, increases c-di-GMP levels and influences persister formation through interaction with HipH. While the work may provide useful insights that could attract researchers in biofilm studies and persistence mechanisms, the main findings are inadequately supported and require further validation and refinement in experimental design.

      We sincerely thank eLife for the through assessment of our manuscript. We appreciate the constructive criticism and see it as an opportunity to strengthen our research. In response to the reviewers' comments and suggestions, we have made significant improvements to our study. We have refined our experimental design and conducted additional experiments to provide more robust evidence supporting our findings. We believe that with these additional experiments and refinements, our study provides robust evidence for this novel mechanism, contributing significantly to the fields of biofilm research and bacterial persistence.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors propose a UPEC TA system in which a metabolite, c-di-GMP, acts as the AT with the toxin HipH. The idea is novel, but several key ideas are missing in regard to the relevant literature, and the experimental design is flawed. Moreover, they are absolutely not studying persister cells as Figure 1b clearly shows they are merely studying dying cells since no plateau in killing (or anything close to a plateau) was reached. So in no way has persistence been linked to c-di-GMP. Moreover, I do not think the authors have shown how the c-di-GMP sensor works. Also, there is no evidence that c-di-GMP is an antitoxin as no binding to HipH has been shown. So at best, this is an indirect effect, not a new toxin/antitoxin system as for all 7 TAs, a direct link to the toxin has been demonstrated for antitoxins.

      Thank you for your constructive comments on our manuscript. Your insights have prompted us to revisit our data and experimental design, leading to significant improvements in our study.

      (1) Clarification on Persister Cell Detection: We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented two key modifications: shortening the sampling interval and extending the antibiotic treatment duration. ​These adjustments have resulted in an updated killing curve that now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing, as illustrated in Response Figure 1.​ This refined pattern aligns with established characteristics of persister cell behavior in antibiotic tolerance studies, providing a more accurate representation of the persister population dynamics in our experimental system. We believe these methodological enhancements significantly improve the reliability and interpretability of our results, offering a clearer insight into the persister cell phenomenon under investigation.

      (2) Validation of c-di-GMP Sensor: We appreciate your point about the c-di-GMP sensor. The c-di-GMP sensor, developed by Howard C. Berg's team, is specifically designed to detect relative intracellular concentrations of c-di-GMP in Escherichia coli cells. This capability is crucial for understanding the dynamic regulation of c-di-GMP during bacterial responses to environmental stimuli. We have expanded our explanation of the sensor's detection mechanism in lines 138-146 of the manuscript, detailing how it functions to reflect changes in c-di-GMP levels within the cells accurately. The mechanism operates through a series of signaling events that are initiated when c-di-GMP binds to the sensor, leading to measurable outputs that correlate with intracellular concentrations. Additionally, we have provided a schematic chart in Figure S1B to visually support our description regarding the sensor. This figure demonstrates the sensor's responsiveness and specificity in detecting fluctuations in c-di-GMP levels, effectively linking the signaling molecule to cellular behavior. We hope these additions clarify the role of the c-di-GMP sensor in our research and address your concerns regarding its functionality.​

      (3) HipH and c-di-GMP Interaction: Our pull-down experiments presented in Figure 5A-E provide robust and compelling evidence for a direct physical interaction between HipH and c-di-GMP, and the effects of their interaction reminiscent of toxin-antitoxin systems. Yet we acknowledge c-di-GMP is not a traditional antitoxin since it is not genetically linked to HipH. We have revised our terminology to "TA-like system" to reflect this difference more accurately.

      Weaknesses:

      (1) L 53: biofilm persisters are no different than any other persisters (there is no credible evidence of any different persister cells) so this reviewer suggests changing 'biofilm persisters' to 'persisters' throughout the text.

      Thank you for your thoughtful consideration. Upon careful consideration of the current scientific literature, we agree that there is no substantial evidence supporting a distinct category of persister cells specific to biofilms. We have systematically replaced 'biofilm persisters' with 'persisters' throughout the manuscript​.

      (2) L 51: persister cells do not mutate and, once resuscitated, mutate like any other growing cell so this sentence should be deleted as it promotes an unnecessary myth about persistence.

      We sincerely appreciate your astute observation regarding the inaccuracy in line 51. We have removed the sentence in question from line 51​. And we also have thoroughly reviewed the entire manuscript to ensure no similar misconceptions are present elsewhere in the text.

      (3) L 69: please include the only metabolic model for persister cell formation and resuscitation here based on single cells (e.g., doi.org/10.1016/j.bbrc.2020.01.102 , https://doi.org/10.1016/j.isci.2019.100792 ); otherwise, you write as if there are no molecular mechanisms for persistence/resuscitation.

      Thank you for your valuable suggestion. We appreciate the opportunity to enhance the scientific context of our manuscript. We have added a brief explanation of how ppGpp mediates ribosome dimerization, leading to persistence, and how its degradation triggers resuscitation [1-3] (lines 68-74). We have described the role of cAMP-CRP in regulating persistence through its effects on metabolism and stress responses [4, 5] (lines 74-78). We also explore potential interactions or synergies between our proposed mechanisms and these established metabolic models [6] (lines 383-409). We believe this revision significantly enhances our manuscript by providing a more accurate representation of the current state of knowledge in the field and demonstrating how our work builds upon and contributes to existing models of bacterial persistence.

      (4) The authors should cite in the Intro or Discussion that others have proposed similar novel TAs including a ppGpp metabolic toxin paired with an enzymatic antitoxin SpoT that hydrolyzes the toxin (http://dx.doi.org/10.1016/j.molcel.2013.04.002).

      We are grateful for your expertise in pointing out this crucial reference. We sincerely appreciate your suggestion to include the reference to previously proposed novel toxin-antitoxin (TA) systems, particularly the ppGpp-SpoT system [6]. In light of this reference, we have expanded our discussion to include: 1) A brief overview of the ppGpp-SpoT system as a novel TA-like mechanism. 2) Comparisons between the ppGpp-SpoT system and our findings on the HipH-c-di-GMP interaction. 3) Reflections on how these systems challenge and expand traditional definitions of TA systems (lines 383-409). We believe this addition significantly enhances the context and strengthens the rationale for considering the HipH-c-di-GMP interaction as a TA-like system. Thank you for your valuable input in helping us situate our research within the broader landscape of TA system biology.

      (5) Figure 1b: there are no results in this paper related to persister cells. Figure 1b simply shows dying cells were enumerated. Hence, the population of stressed cells increased, not 'persister cells' (Figure 1f), in the course of these experiments.

      We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented 1) Shortened sampling interval: We have reduced the interval between measurements to one hour. 2) Extended sampling duration: The total duration of sampling has been increased to 6 hours (Response Figure 1). The updated killing curve now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing: 1) Initial rapid decline: From 0-1hours, we observe a steep decrease in bacterial survival (slope ≈ -3~-1.8); 2) Slower decline phase: From 4.5-6 hours, the rate of decline is markedly reduced (slope ≈ -0.17~-0.06). This pattern aligns more closely with established characteristics of persister cell behavior in antibiotic tolerance studies.

      (6) Figure S1: I see no evidence that the authors have shown this c-di-GMP detects different c-di-GMP levels since there appears to be no data related to varying c-di-GMP concentrations with a consistent decrease. Instead, there is a maximum. What are the concentration of c-di-GMP on the X-axis for panels C, D, and E? How were c-di-GMP levels varied such that you know the c-di-GMP concentration?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70E). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (7) The viable portion of the VBNC population are persister cells so there is no reason to use VBNC as a separate term. Please see the reported errors often made with nucleic acid staining dyes in regard to VBNCs.

      We appreciate the opportunity to clarify the distinction between VBNC cells and persister cells in our manuscript. It is essential to recognize that VBNC cells and persister cells represent two fundamentally different states of bacterial dormancy. While both may exhibit viability under certain conditions, persister cells are characterized by their ability to resuscitate and grow when environmental conditions become favorable [8]. In contrast, VBNC cells are in a deep dormant state where they cannot be revived through normal culture conditions [9, 10]. This distinction is critical for accurately representing bacterial survival strategies and population dynamics, which is why we maintain the use of the term VBNC separately from persister cells. We have added related references in lines 259.

      Regarding the reported errors associated with nucleic acid staining dyes for identifying VBNC cells, we acknowledge that these methods can exhibit limitations. Specifically, nucleic acid stains may fail to reliably differentiate between metabolically active and inactive cells, leading to inaccuracies in quantifying the true VBNC population [11]. In our study, we have opted to utilize propidium iodide (PI) staining to assess cell viability more accurately, as it effectively distinguishes dead cells from viable cells based on membrane integrity [12]. By employing this methodology, we ensure a more precise estimation of the VBNC proportion without conflating it with persister cell dynamics.

      Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Thank you for your insightful summary of our research. We greatly appreciate your thoughtful consideration of our work.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Weaknesses:

      (1) Is HipH the only target identified by screening the E. coli Keio Knockout Collection?

      We appreciate your inquiry about our screening process and the identification of HipH. We did not screen the entire E. coli Keio Knockout Collection. Our approach was more targeted, focusing on mutants relevant to enzyme activity regulation. We selected specific mutants based on their potential involvement in c-di-GMP-mediated regulatory pathways. This focused approach allowed us to efficiently identify candidates likely to be involved in persister formation. Among the screened mutants, HipH emerged as a significant hit. Its identification was particularly noteworthy due to its known role in persister formation and its potential as a regulatory target of c-di-GMP. We acknowledge that our targeted approach may have overlooked other potential candidates. We are considering a more comprehensive screening approach in future studies to identify additional targets.

      (2) Since the story is complicated, a diagrammatic picture might be needed to illustrate the whole story. And the title does not accurately summarize the novelty of this study.

      Thank you for your valuable feedback. We fully agree with your assessment that a visual representation would greatly enhance the clarity of our complex findings. In response to your suggestion, we have added Response Figure 2 (Fig. 6 in revised manuscript, lines 976-981) to our manuscript. This new figure provides a comprehensive visual summary of the key processes and mechanisms uncovered in our study. This graphic summary provides a clear overview of the interconnected nature of surface adhesion, c-di-GMP signaling, and HipH regulation. It also highlights the complex role of c-di-GMP in persister formation and offers readers a visual aid to better understand the molecular mechanisms underlying our findings.

      We sincerely appreciate your thoughtful comment regarding the title and its reflection of the study's novelty. ​After careful consideration, we believe that our original title adequately captures the essence and significance of our research.​ We have strived to ensure that it accurately represents the scope and novelty of our work while maintaining clarity and conciseness. Nevertheless, we value your input and thank you for taking the time to provide this feedback, as it encourages us to critically evaluate our presentation.

      (3) The ratio of mVenusNB to mScarlet-I (R) negatively correlates with the concentration of c-di-GMP. Therefore, R-1 demonstrates a positive correlation with the concentration of c-di-GMP. Is this method validated with other methods to quantify c-di-GMP, or used in other studies?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (4) References are missing throughout the manuscript. Please add enough references for every conclusion.

      We appreciate your feedback regarding the references in our manuscript. We acknowledge the importance of proper citation to support our conclusions and provide context for our work. ​In response to your comment, we have conducted a comprehensive review of our manuscript and have significantly enhanced our referencing throughout.​ We have added appropriate citations to support each key statement and conclusion presented in our study. These additional references provide a robust foundation for our findings and place our work within the broader context of the field. The complete list of all references, including the newly added ones, can be found at the end of this response letter as well as in the revised manuscript.

      (5) The novelty of this study should be clearly written and compared with previous references. For example, is it the first study to report the mechanism that the adhesion of bacteria to the surface leads to increased persister formation?

      We sincerely appreciate the opportunity to highlight and elaborate the novelty of our research. This study provides novel insights into the relationship between bacterial adhesion to surfaces and the subsequent increase in persister cell formation, which has not been explicitly detailed in previous literature. While existing research has established that biofilms typically harbor higher numbers of persister cells, this investigation not only corroborates that finding but also elucidates the mechanisms through which surface adhesion contributes to this phenomenon.

      Past studies have predominantly focused on the general characteristics of persister cells and their role in biofilm resilience and antibiotic tolerance without specifically addressing the mechanistic link between adhesion and persister formation [13, 14]. For instance, previous work has shown that surface attachment leads to changes in metabolic activity and signaling pathways within bacterial cells, which could promote persistence, but it has not definitively established a causal relationship between adhesion and increased persister formation. Our study highlights that the elevation of cyclic di-GMP levels after surface adhesion triggers a cascade of physiological changes that significantly enhance the formation of persister cells. In particular, we report that adhesion-induced signaling pathways promote dormancy and tolerance to antibiotics, marking an important advancement from the previous understanding that treated persister cells might arise from random phenotypic variation during biofilm development. we have expanded our discussion in lines 366-381.

      In summary, we believe this study stands as one of the first to clearly delineate the mechanism by which bacterial adhesion leads to increased persister formation, providing a valuable contribution to the current understanding of bacterial persistence and biofilm ecology. Thus, we can assert that our findings are not only novel but also essential for informing future research and therapeutic strategies aimed at managing bacterial infections.

      (6) in vitro DNA cleavage assay. Why not use bacterial genomic DNA to test the cleaving of HipH on the bacterial genome?

      Thank you for your feedback regarding our experimental approach. The decision of not directly using genomic DNA in our experiments was made after careful consideration. The high molecular weight of genomic DNA, which presents significant challenges in handling and analysis. The difficulty in extracting intact genomic DNA, which could potentially compromise the integrity of our results. The challenges associated with electrophoretic separation of such large DNA molecules, which could limit our ability to accurately interpret the data.

      Instead, following established practices in molecular biology research and drawing from similar studies in the field [15-17], we opted to use plasmids as model DNA for our experiments.​ This approach offers several advantages: Plasmids are smaller and more manageable, making them easier to manipulate in laboratory conditions; They can be more readily extracted in intact form, ensuring the quality of our experimental material; Plasmid DNA is more amenable to electrophoretic separation, allowing for clearer and more precise analysis. Despite their smaller size, plasmids retain many of the key characteristics of genomic DNA that are relevant to our study. We believe this approach provides a robust and reliable model for our research while overcoming the practical limitations associated with genomic DNA. It allows us to investigate the fundamental principles we're interested in, while maintaining experimental feasibility and data integrity. We have added related references in lines 314 and 599.

      (7) C-di-GMP -HipH is not a TA, it does not fit in the definition of the TA systems. You can say C-di-gmp is an antitoxin based on your study, but C-di-gmp -HipH is not a TA pair.

      We appreciate your insightful feedback regarding the classification of the c-di-GMP-HipH interaction. We acknowledged that while our study suggests c-di-GMP may function as an antitoxin to HipH, the c-di-GMP-HipH pair does not constitute a classical TA system due to the lack of genetic linkage. We have replaced the term "TA system" with "TA-like system" when referring to the c-di-GMP-HipH interaction. This more accurately reflects the nature of their relationship while acknowledging that it differs from traditional TA systems.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Either indent or skip a line to indicate a new paragraph; there is no need to do both.

      Thank you for your feedback regarding the formatting of our manuscript. We have revised the formatting throughout the main text by using a single blank line to separate paragraphs, without indentation.

      (2) L 77: need to define 'c-di-GMP' without using another abbreviation; please write '3,5-cyclic diguanylic acid', etc.

      Thank you for your valuable feedback regarding the proper introduction of abbreviations in our manuscript. We have revised line 86 to introduce the full name of c-di-GMP as "3,5-cyclic diguanylic acid". Following this initial introduction, we consistently use the abbreviation "c-di-GMP" throughout the rest of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This is a fascinating story, but the title and the manuscript need careful revision to make it more clear. The novelty and logic are not very easy to follow.

      (1) Figure 1B, " h" is missing

      We sincerely thank you for your attentive review and for pointing out the missing "h" in Figure 1B. We have carefully reviewed and revised the figure legend in Figure 1B.​ The unit of time has been corrected to include "h" (hours) where appropriate, ensuring consistency and accuracy throughout the figure.

      (2) Line 222, the in vivo mice model should be cited with the reference.

      Thank you for the reminding. We have cited the following reference related to the mice model (line 231).

      Pang Y, et al., (2022) Bladder epithelial cell phosphate transporter inhibition protects mice against uropathogenic Escherichia coli infection. Cell reports 39: 110698

      References

      (1) Wood, T.K. and S. Song, Forming and waking dormant cells: The ppGpp ribosome dimerization persister model. Biofilm, 2020. 2: p. 100018.

      (2) Song, S. and T.K. Wood, ppGpp ribosome dimerization model for bacterial persister formation and resuscitation. Biochem Biophys Res Commun, 2020. 523(2): p. 281-286.

      (3) Wood, T.K., S. Song, and R. Yamasaki, Ribosome dependence of persister cell formation and resuscitation. J Microbiol, 2019. 57(3): p. 213-219.

      (4) Niu, H., J. Gu, and Y. Zhang, Bacterial persisters: molecular mechanisms and therapeutic development. Signal Transduct Target Ther, 2024. 9(1): p. 174.

      (5) Mok, W.W., M.A. Orman, and M.P. Brynildsen, Impacts of global transcriptional regulators on persister metabolism. Antimicrob Agents Chemother, 2015. 59(5): p. 2713-9.

      (6) Amato, S.M., M.A. Orman, and M.P. Brynildsen, Metabolic control of persister formation in Escherichia coli. Mol Cell, 2013. 50(4): p. 475-87.

      (7) Vrabioiu, A.M. and H.C. Berg, Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proc Natl Acad Sci U S A, 2022. 119(6).

      (8) Liu, J., et al., Viable but nonculturable (VBNC) state, an underestimated and controversial microbial survival strategy. Trends Microbiol, 2023. 31(10): p. 1013-1023.

      (9) Pan, H. and Q. Ren, Wake Up! Resuscitation of Viable but Nonculturable Bacteria: Mechanism and Potential Application. Foods, 2022. 12(1).

      (10) Ayrapetyan, M., T. Williams, and J.D. Oliver, Relationship between the Viable but Nonculturable State and Antibiotic Persister Cells. J Bacteriol, 2018. 200(20).

      (11) Zhao, S., et al., Absolute Quantification of Viable but Nonculturable Vibrio cholerae Using Droplet Digital PCR with Oil-Enveloped Bacterial Cells. Microbiol Spectr, 2022. 10(4): p. e0070422.

      (12) Zhao, S., et al., Enumeration of Viable Non-Culturable Vibrio cholerae Using Droplet Digital PCR Combined With Propidium Monoazide Treatment. Front Cell Infect Microbiol, 2021. 11: p. 753078.

      (13) Pan, X., et al., Recent Advances in Bacterial Persistence Mechanisms. Int J Mol Sci, 2023. 24(18).

      (14) Patel, H., H. Buchad, and D. Gajjar, Pseudomonas aeruginosa persister cell formation upon antibiotic exposure in planktonic and biofilm state. Sci Rep, 2022. 12(1): p. 16151.

      (15) Maki, S., et al., Partner switching mechanisms in inactivation and rejuvenation of Escherichia coli DNA gyrase by F plasmid proteins LetD (CcdB) and LetA (CcdA). J Mol Biol, 1996. 256(3): p. 473-82.

      (16) Hockings, S.C. and A. Maxwell, Identification of four GyrA residues involved in the DNA breakage-reunion reaction of DNA gyrase. J Mol Biol, 2002. 318(2): p. 351-9.

      (17) Chan, P.F., et al., Structural basis of DNA gyrase inhibition by antibacterial QPT-1, anticancer drug etoposide and moxifloxacin. Nat Commun, 2015. 6: p. 10048.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment<br /> This important study evaluates the outcomes of a single-institution pilot program designed to provide graduate students and postdoctoral fellows with internship opportunities in areas representing diverse career paths in the life sciences. The data convincingly show the benefit of internships to students and postdocs, their research advisors, and potential employers, without adverse impacts on scientific productivity. This work will be of interest to multiple stakeholders in graduate and postgraduate life sciences education and should stimulate further research into how such programs can best be broadly implemented.

      Thank you for your assessment. We agree that sharing our process for creating this internship program with the wider higher education community is important and we hope it will spur establishment of new programs at other institutions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study was to determine whether short (1 month) internships for biomedical science trainees (mostly graduate students but some post-docs) were beneficial for the trainees, their mentors, and internship hosts. Over a 5 year period, the outcomes of trainees who completed internships were compared with peers who did not. Both quantitative results in terms of survey responses and qualitative results obtained from discussion groups were provided. Overall, the data suggest that internships aid graduate students in multiple ways and do not harm progress on dissertation projects. 'Buy-in' from mentors and prospective mentors appeared to increase over time, and hosts also gained from the contributions of the interns even in a short time period. While the program also appeared valuable for post-doctoral trainees, it was less favorably considered by post-doc mentors.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      The internship program that was examined here appears to have been very well designed in terms of availability to students, range of internship offerings, length of time away from PhD lab, and assessments.

      Having a built-in peer control group of graduate students who did not do internships was valuable for much of the quantitative analyses. However, as the authors acknowledge, those who did opt for internships are a self-selected group who may have character traits that would help them overcome the potential negative impacts of the internship.

      The quantitative data is convincing and addresses important considerations for all stakeholders.

      The manuscript is well-constructed to individually address the impact of the program on each set of stakeholders, while also showcasing areas of mutual benefit.

      The discussion of challenges and limitations, from the perspectives of participating stakeholders, program leaders, and also institutions, is comprehensive and very thoughtful.

      Thank you for noting these strengths in experimental design, control group, and manuscript format.

      Weaknesses:

      The qualitative data that resulted from the 'focus groups' of faculty mentors was somewhat difficult to evaluate given the very limited number of participants (n=7).

      Thank you for pointing out the potential limitations of a small sample size. One reason we selected a qualitative approach to focus group data analysis in our experimental design was to supplement our larger quantitative analyses with faculty advisors. A benefit of relying on qualitative methods is that saturation of a representative set of themes can be reached even with a limited number of participants. This is particularly true when a homogenous sample is used, such as faculty members in the biomedical sciences (Guest, et al. 2006). We have added the following sentences at line 188 in the text to expand on the faculty focus groups:

      “A group of faculty advisors in a range of disciplines and demographics, all of whom were active mentors with extensive training experience were invited to participate in the focus groups. Seven faculty advisors participated in the Year 1 focus group and 5 of those same 7 participated in Year 5. Saturation can occur with as little as six interviews in homogeneous samples (Guest et al. 2006) such as our biomedical faculty research advisors at a single institution.”

      In the original analysis, we increased the generalizability of our findings by gathering faculty opinions and feedback using multiple methods. For example, faculty post internship surveys responses were returned by 75 faculty members over a 5-year period, which represents a 61% response rate. (Faculty post internship surveys results are shown in Figure 1, panels v-x and Figure 4, panels i-t.) In addition, the survey gauging general faculty advisor support for the program (Figure 3); which was administered two times, 4 years apart; gathers the opinions of 115 advisors in year 1 and 122 advisors in year 4. Thus, the faculty focus group surveys were only one of 3 ways that faculty input was gathered. In sum, while the small number of faculty mentors who participated in the focus groups has the potential to introduce bias, we made a conscious decision to use a mixed methods approach to expand beyond one sample to increase the generalizability of our results. However, to acknowledge the complexity of faculty advisor views on internships, we have noted the need to further study faculty advisor support for internships in broader samples as a future direction. This is the new wording we included at line 788:

      “Other future studies could probe faculty advisor support for internships at institutions beyond our own since training culture and faculty perspectives are influenced by many factors and vary from institution to institution.”

      Overall, the data support the authors' conclusions with respect to the utility of internship programs for all stakeholders. As the authors note, the data relate to a specific program where internship length was defined, costs were covered by a grant or institutional funding, and there were multiple off-site internship hosts available. Thus, the results here may not replicate for other programs with different criteria.

      Thank you for noting these advantages that contributed to the success of this program. We agree that other institutions will encounter unique challenges when implementing their own internship program and have addressed some of these limitations in our discussion section. In the Discussion section of the paper, we outline considerations and review lessons learned in an effort to help others know what aspects of the program might or might not work in distinct situations or locations. We also point the reader to distinct internship models at other institutions in the hope that any university hoping to provide their trainees with internship opportunities can benefit from the collective experience of the relatively few programs that have found sustainable ways to accomplish this.  

      This work provides a valuable assessment of how relatively short internships can impact graduate students, both in terms of their graduate tenure and in their decision-making for careers post-graduation. As more graduate programs are heeding calls from funding agencies and professional societies to increase knowledge about, and familiarity with, multiple career paths beyond academia for PhD students, there is a need to evaluate the best ways to accomplish that goal. Hands-on internships are valuable across many spheres so it makes sense that they would be for life science graduates too. However, the fear that time-to-degree and/or productivity would be negatively impacted is important to acknowledge. By providing clear data that this is not the case, these investigators have increased the likelihood that internships could be considered by more institutions. The one big drawback, and one that the authors discuss at some length, is the funding model that could enable internship programs to be used more widely.

      Thank you for providing suggestions to improve the generalizability of our results. We agree that finding a sustainable source of funding for internship programs, and the staff who direct them, is a primary obstacle to implementing these programs more widely. We provide some ideas and funding models for other institutions to consider, and future directions could examine internships that are un-funded or funded primarily by fellowships from supportive granting agencies. Accordingly, we have added the following text to future directions at Line 755:

      “We acknowledge the need for future studies to evaluate the feasibility and outcomes of internship programs funded via different models to see if faculty support and student outcomes would be comparable under different models.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe five-year outcomes of an internship program for graduate students and postdoctoral fellows at their institution spurred by pilot funding from an NIH BEST grant. They hypothesized that such a program would be beneficial to interns, internship hosts, and research advisors. The mixed methods study used surveys and focus groups to gather qualitative and quantitative data from the stakeholder groups, and the authors acknowledge the limitation that the study subjects were self-selected and also had research advisors who agreed to allow them to participate. Thus the generally favorable outcomes may not be applicable to students such as those who are struggling in the lab and/or lack career focus or supportive research advisors. Nonetheless, the overall findings support the hypothesis and also suggest additional benefits, including in some cases positive impact for the lab, improved communication between the intern and their research advisor, and an advantage for recruitment of students to the institution. The data refute one of the principal concerns of research advisors: that by taking students out of the lab, internships reduce individual and overall lab productivity. Students who did internships were significantly less likely to pursue postdoctoral fellowships before entering the biomedical workforce and were more likely to have science-related careers versus research careers than control students who did not do internships, although the study design cannot determine whether this was due to selection bias or to the internship.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      (1) The sample size is good (123 internships).

      (2) The internship program is well described. Outcomes are clearly defined.

      (3) Methods and statistical analyses appear to be appropriate (although I am not an expert in mixed methods).

      (4) "Take-home" lessons for institutions considering implementing internship programs are clearly stated.

      Thank you for enumerating these strengths. We also hope that the sample size, positive outcomes, and take-home lessons will be of benefit to other institutions.

      Weaknesses:

      (1) It is possible that interns, hosts, and research advisers with positive experiences were more likely to respond to surveys than those with negative experiences. The response rate and potential bias in responses should be discussed in the Results, not just given in a table legend in Methods.

      Thank you for noting this oversight. We were pleased that throughout our study, the majority of interns, faculty advisors and internship hosts responded to the surveys. As suggested, we have included the following text at line 132 in the first paragraph of the results section:

      “The response rate for the 123 survey invitations sent to interns and their current research advisors and internship hosts ranged from 61% for research advisors to 73% for hosts, and about 66% for interns (averaging pre and post survey responses). In addition to quantitative surveys, qualitative themes and exemplars were collected from focus groups.”

      (2) With regard to the biased selection of participants, do the authors know how many subjects requested but were not permitted to do internships?

      We too were concerned about trainees who would not be able to secure their PI’s support to participate in an internship.  Accordingly, as part of our program design and evaluation, in the inaugural year of the program our external evaluator, Strategic Evaluations, Inc., administered a survey to graduate students and postdocs who registered for an internship information session or who started, but did not complete the application. Registrants were asked about their decision to complete an application, their experience completing the application if they chose to do so, and the likelihood that they would apply to the program next year. Of the respondents, only 9% indicated that lack of PI support prevented them from participating (n=53 respondents). Hence while we cannot completely rule out PI support as a barrier, only a small percentage of trainees reported this as a barrier despite a robust response rate (43%).  A second line of evidence that there was not a large number of students who were prevented from doing an internship by their research advisor is the high faculty approval rating of the program which was gathered in both year 1 and year 4 of the program (see figure 3). These two independent lines of evidence diminish our concern that faculty advisor resistance was a significant barrier to participation.

      (3) While the authors mention internships in professional degree programs in fields such as law and business, some mention of internship practices in non-biomedical STEM PhD programs such as engineering or computer science would be helpful. Is biomedical science rediscovering lessons learned when it comes to internships?

      Excellent point. We noted that internships are common in non-biomedical STEM masters and PhD programs, but we did not list experiential rotations and internships that are common in nursing, engineering, computer science and other such programs. We agree that many lessons learned from internships in all fields are transferable to the biomedical fields, and we also strongly believe that findings there need to be replicated in the biomedical sciences because of the unique funding model, incentive structure, and apprentice structure of the biomedical training. In response to this critique, we added the following text to the manuscript at line 724:

      “Internships are ubiquitous in many other professional training programs such as law, business, nursing, computer science, and engineering programs (Van Wart, O’Brien et al, 2020).”

      (4) Figure 1 k, l - internships did not appear to change career goals, but are the 76% who agreed pre-internship the same individuals as the 75% who agreed post-internship? What percentage gave discordant responses?

      While our data cannot directly address this question as collected, we surmise that because internships in this program usually occur in the final 12-18 months of training and because there is an emphasis on the internship being a skill-building and not necessarily a career exploration initiative, therefore we were not surprised to see that the internship doesn’t radically alter many trainees’ career plans. One limitation of our study is that career goals were defined by pre-surveys at different timepoints depending on what stage of training an individual (whether control or internship participant) happened to be at during the administration of the baseline survey. We know from previous work that career goals often shift during training (see Roach and Sauermann, 2017 PLOS One, https://doi.org/10.1371/journal.pone.0184130, and Gibbs et al, 2014, PLOS One, https://doi.org/10.1371/journal.pone.0114736), so the point at which career interests are gathered makes a difference in this kind of analysis. Hence, we have expanded our discussion of this limitation to better acknowledge this critique beginning at Line 319.

      “Because of the variable timing between pre-internship career interest surveys among interns and control trainees and securing the first job, future studies could more rigorously evaluate changes in career preferences between pre and post internship with an analysis that considers the time that has elapsed between career interest noted pre-internship vs post internship career placement. “

      Appraisal:

      Overall the authors achieve their aims of describing outcomes of an internship program for graduate career development and offering lessons learned for other institutions seeking to create their own internship programs.

      We thank you for your thorough reading and review of the manuscript.

      Impact:

      The paper will be very useful for other institutions to dispel some of the concerns of research advisers about internships for PhD students (although not necessarily for postdoctoral fellows). In the long run, wider adoption of internships as part of PhD training will depend not only on faculty buy-in but also on the availability of resources and changes to the graduate school funding model so that such programs are not viewed as another "unfunded mandate" in graduate education. Perhaps the industry will be motivated to support internships by the positive outcomes for hosts reported in this paper. Additionally, NIH could allow a certain amount of F, T, or even RPG funds to be used to support internships for purposes of career development. 

      Thank you. We share your hope that the information and data resulting from this study will be valuable to other institutions. Your point about NIH (and other funders, for that matter) allowing trainees to participate in internship experiences while funded by the granting agency is an excellent one. We have found that communication with program officers often garners their support for the intern remaining on a fellowship or training grant during the internship. This allows the internship program to fund additional interns, especially those that are supported by the faculty advisor’s grants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Two minor points about the comments used from focus groups.

      (i) In figure 5, there is a specific quote about being a reward that is used twice;

      (ii) It seems that there should be some consistency in how these quotes are relayed with respect to gender identification of the trainee. In some cases 's/he' is used, in others 'he' or 'she' is used, and in others 'they' is used.

      We appreciate this suggestion and agree that a non-gendered convention would clearer – accordingly, we have revised all quotes to use “they” to be more consistent. In addition, we have removed the duplicated quote from figure 5, which was originally inserted in two sections because of its applicability to both the “Persisting Challenges” and “Trainees’ abilities and skills were primary drivers of the success of the internship”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is somewhat lengthy. Some redundant material can be eliminated - Lines 366-371 simply restate the data in Table 5. Lines 393-396 restate the data in Figure 3. The text should be reserved for interpreting rather than restating the data in tables and figures.

      Thank you for this feedback and we agree that these sections can be condensed. We have removed some of the redundancy and retained enough for figures and text to each be stand alone for accessibility to the readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      We now discuss the parallel with the Hong 2022 study (Discussion, lines 307-310).

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Agreed, see below.

      Specific Suggestions

      Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength...

      Thank you for the suggestion, we have now repeated the analysis of Figure 5 for EP pairs instead of EEP triplets, and included it as new Supplementary Figure S7. Despite the lower statistical power, the trends are very similar. 

      ...Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) has large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      We now added colour coding for each of the promoters in figure S4. We agree this clarifies the contribution of each promoter to the selectivity of each enhancer and it further confirms the responsiveness trends observed in Figure 5.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Unfortunately, even with the number of enhancers that we tested, we lack statistical power to identify sequence motifs that may favour supra-additivity.

      Reviewer #2 (Public Review):

      We thank the reviewer for the supportive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      Summary

      This work investigates how multiple regulatory elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. The main finding is that enhancer pairs generally combine additively on reporter output. The authors also find that the extent to which enhancers increase reporter output is inversely related to the intrinsic strength of the promoter.

      This manuscript presents a compact experiment that investigates an important open question in gene regulation. The results and data will be of interest to researchers studying enhancers. Given that my expertise is in modeling and computation, I will take the experimental results at face value and focus my review on the interpretation of the results and the computational methodology. I find the result of additivity between enhancers to be well supported. The findings on differential responsiveness between promoters are very interesting but the interpretation of such responses as 'non-linear' or 'following a power-law' may be misleading. More broadly, I think a more rigorous description of the mathematical methodology would increase the clarity and accessibility of this manuscript. A major unanswered question is whether the findings in this study apply to enhancers in their native genomic context. Regardless, investigating such questions in an episomal reporter assay is valuable.

      Main comments

      Applicability to native genomic context: The applicability of the results in this paper to enhancers in their native genomic context is unclear. As the authors state in the discussion section, the reporter gene is not integrated into the genome, the spacing between enhancers does not match their native context etc. It is thus unclear whether this experimental design is able to detect the non-additivity between enhancers which is known to be present in the genome. This could be investigated by testing the enhancer-enhancer-promoter tuples for which non-additivity has been observed in the genome (references are given in the introduction) in this assay.

      We appreciate the suggestion, but we chose not to go back to the lab to generate additional data to address this point. Of the cited previous studies, two are comparable to our study because they also used mESCs and included loci that we also studied:  Thomas et al. (2021) and Brosh et al. (2023). We now discuss how the findings of these two studies relate to our observations in the Discussion, lines 336-345.

      Interpretation of promoter responses as non-linear and following a power-law: In Fig 5, the authors demonstrate that enhancer-enhancer pairs boost reporter output more for weak promoters as opposed to strong promoters. I agree the data supports this finding, but I find the interpretation of such data as promoters scaling enhancers according to a power-law (as stated in the abstract) to be misleading. As mentioned on line 297, it is not possible to define an intrinsic measure of enhancer strength, thus the authors assign the base of the power-law to be the average boost index of the enhancer-enhancer pair across the 8 promoters. But this measure incorporates some aspect of a promoter and is not solely a property of enhancers...

      We agree that the power-law conclusion in the abstract was too strong; we have rephrased it as "non-linear".

      ...It would also be useful to know whether the results in Fig 5 apply to only enhancer-enhancer-promoter triples or also to enhancer-promoter pairs.

      We have now added this EP analysis as new Supplemental Figure S7. Although the statistical power is much lower, this shows very similar trends as the EEP analysis. We briefly report this, lines 275-278.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. The authors mention that both studies use the same statistical methodology and the data in this study is consistent with the data from the 2022 paper. However, I think the statistical methodology in both studies needs further exposition. This section of the review is thus meant to ensure that I understand the author's methodology, to guide the reader in understanding how the authors define 'selectivity', and to probe certain assumptions underlying the methodology.

      My understanding of the approach is as follows: The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity.

      The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4 of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test. This is a parametric test that assumes that the observations within each vertical distribution in Fig 2C are normally distributed (this test does allow for heteroskedasticity - which means that the variance may differ within each vertical distribution). Does the normality assumption hold? This analysis should be reported. If this assumption does not hold, is the Welch test well calibrated?

      We have tested the normality of all of the single enhancer + promoter combinations that were tested using the welch F-test. 94.1% of the 439 single enhancers + Promoter combinations show normal distributions (at a 1% FDR). We have added this to the methods section of the revised manuscript. Apart from this, non-normality has little to no influence on the Welch F-test performance (https://rips-irsp.com/articles/10.5334/irsp.198). Therefore, the use of the Welch F-test to score enhancer selectivity on these data is valid. Apart from this, we agree that a simple binary classification of selective vs non-selective is not descriptive enough for these kinds of data. We addressed this in our previous publication by exploring the relationship between selectivity and enhancer strength. However, in the objective in this publication was solely to show that this new dataset follows similar selectivity patterns to our previous publication. Furthermore, our analysis on the non-linearity of promoter response is a more quantitative continuation on the analysis on selectivity as this is probably one of the major contributors to enhancer selectivity. This was probably present in our previous paper but could not be analyzed as there were less combinations per promoter.

      For further clarity, we have now highlighted the individual promoters in Figure S4 by colors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I found this to be an interesting manuscript and am glad this experiment was conducted. As I wrote in my public review, I think that clarifying the computational methods/ideas would really help. I also think it would be helpful to properly define the terms that are being used. For example, this manuscript uses the terminology cooperativity and synergy. Are these meant to be synonymous with supra-addivity?

      Thank you for this point. The revised manuscript no longer uses the word “cooperativity”. We now use “supra-additivity” when describing our data, and “synergy” as biological interpretation. In the Introduction we now clarify this distinction.

      Comments on enhancer selectivity:

      In the public review, I have given comments on the statistical methodology employed to assess enhancer selectivity. On a more subjective note, I'm not convinced that a frequentist approach to a binary classification of 'selective' vs 'not selective' is that useful here. I think it would be more useful to report an 'effect size' of the extent to which an enhancer is selective and to study the sources of this effect size. I think you've tried to do this in lines 329-339 of the discussion but I think the exposition could be clearer.

      Figure S4B may suggest how to do this. It appears that the distribution of boost indices for a given enhancer is trimodal (this is most obvious for the stronger enhancers on the top of the plot). Is it the case that each mode (for each enhancer) consists of the same set of promoters? I think what is implied by Figure 5B is that the stronger promoters are not boosted as much as the weaker promoters. So does the leftmost mode consist of Ap1m1, the middle mode consist of Klf2/Otx2/Nanog, and the rightmost mode of Sox2/Fgf5/Lefty1/Tbx3? If so, I would recommend emphasizing this in the text/figure and clarifying how this relates to selectivity. It seems that the chain of logic is as follows: (1) We define an enhancer to be selective if its boost indices across a set of promoters are not the same. (2) We generally observe that stronger promoters get boosted less than weaker promoters. (3) Thus selectivity arises due to differences in intrinsic strengths of the promoter. I think this is what is being implied in lines 329-339 of the discussion, but it took me multiple readings to understand this and I'm not convinced the power-law explanation is justified (see public review).

      We have modified this paragraph of the Discussion (now lines 350-359).

      Regarding the power-law: in the Results we state “roughly a power-law function”. We have removed the power-law claim from the abstract, that conclusion as phrased was indeed too firm.

      Reference to Zuin et al

      Lines 323 - 325: A reference is made to the data from Zuin et al "following approximately a power-law". What data in Zuin et al does this statement refer to? I do not believe the authors in Zuin et al claim that the relationship between GFP intensity and enhancer-promoter distance (Figure 1h,i from Zuin et al) follows a power law. It is certainly non-linear, but I have taken a look at this data myself and do not find it follows a power-law. Please either explain this further and rigorously justify the claim or adjust the wording accordingly.

      Good point, in the discussion of Zuin et al we have replaced “power law” with “non-linear decay function”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      The authors have addressed the majority of my comments effectively. The new Sis1 experiment provides a clear illustration of a distinctive response to ethanol and heat. This work offers a comprehensive perspective on Hsf1 in stress response from multiple angles. I have two additional comments to improve the paper without re-review:

      (Original point #3) Could the authors clarify the differences between DPY1561 and the original strain used? There appears to be missing statistical analysis for Figure 1E at the bottom.

      DPY1561 is a haploid version of the original heterozygous diploid strain (LRY033). We opted for this strain in the analysis depicted in Figures 1D and 1E since 100% of Hsp104 is BFP-tagged; thus, the signal above background is stronger and the scoring of Hsp104 foci cleaner. The statistical analysis (Mann Whitney test) for the lower graphs in Fig. 1E has been added. We thank the reviewer for pointing this out.

      (Original point #4) In the new Figure 7F, '% transcription' and '% coalescence' are presented. My understanding is that Figures 7D and 7E aim to demonstrate the correlation between HSP104 transcription (a continuous variable) and HSP104-HSP12 coalescence (a binary variable) at the single-cell level. However, averaging the data across cells masks individual variations and potential anti-correlations. The authors could explore statistical methods that handle correlations between a continuous variable and a binary variable. Alternatively, consider converting 'HSP104 transcription' to a binary variable and then performing a chi-square test to assess the association.

      We thank the reviewer for this suggestion. In response, we have made the following changes:

      (1)  Clarified that the data used in this analysis were derived from Fig. 7 – figure supplement 1 in which ‘HSP104 transcription’ was converted to a binary variable.

      (2)  Indicated that the theoretical ceiling for coalescence of these tagged alleles is 25% given their heterozygous state (Figure 7–figure supplement 1D legend).  In the other 75% of cells scored, HSP104-HSP12 coalescence might also be taking place but is not detectable using this strategy. Therefore, it is not possible to elucidate any anti-correlation between HSR transcription and HSR coalescence in this experiment.

      In addition, we attempted to buttress the argument suggested by the Pearson correlation coefficient analysis (Fig. 7F) that a stronger association exists between transcription and gene coalescence in heat-shocked (HS) vs. ethanol stressed (ES) cells. To do so, we used the chi-square test as suggested by the reviewer. However, the results of this test were ambiguous, and we therefore did not include it in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript provides an interesting observation of the modes of endosomal fusion and roles of actin dynamics in this process and the conclusions of the study are justified by the data, there are concerns regarding the lack of important descriptions or quantification in some of the analyses and additional analyses are needed to strengthen this study. The major issues are outlined below:

      (1) The authors indicate that Zone 1 is within approximately 1 μm of the apical surface. What are the distances of Zone 2 and Zone 3 from this surface? It would be better if the authors could provide an explanation or hypothesis that explains the early endosomes, late endosomes, and lysosomes are not intermixed but separated along the z-axis.

      Thank you for pointing out this important issue. Following the comments, we have added an explanation about the depth of early endosomes, late endosomes, and lysosomes to the text (lines 123-124, 127-128, and 130-131). We have also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1B).

      Because endosomes go deeper and mature with repeated fusion and enlargement after endocytosis, early endosomes, late endosomes, and lysosomes are aligned along the z-axis, though the separation is not complete. In confocal microscopic observation, endolysosomal vesicles in VE cells are largely separated into different layers because they are huge and occupy a large space, and as a result, do not exist with much overlap. We have added the explanation to the text (lines 121-122).

      (2) The authors compared the size distribution of the late endosomes that underwent fusion with that of the total late endosomes in the observed area 5 min after labeling (Figure 2C). A similar quantification analysis should also be analyzed 15 min after labeling (Figure 3G).

      Thank you for the appropriate request. We have added the data showing the size distribution of the late endosomes that underwent fusion at 15 min after labeling, to Figure 3G.

      (3) While 3D reconstructions of actin filament patterns under normal conditions are presented (Figures 4 E-F), comparable analyses using cells treated with Cytochalasin D, Jasplakinolide, or S3 peptide needs to be performed.

      As requested by the referee, we have performed additional experiments to show 3D reconstructions of actin filaments on late endosomes after pretreatment with cytochalasin D, jasplakinolide, and S3 peptide. We show the data in new figures: Figure 7–figure supplement 1A, Figure 7–figure supplement 2, and Figure 9–figure supplement 1.

      (4) The authors should provide a clear description of how they quantified the fusion frequency. Why does the fusion frequency appear very low? Why do Cytochalasin D and jasplakinolide show different effects on heterotypic fusion?

      Thank you for pointing out this important issue. We have added the description of how the fusion frequency was quantified to the Materials and Methods (lines 643-645). Briefly, we counted the number of membrane fusion events and the number of late endosomes in the 400-s time-lapse images, and then calculated how many times a single late endosome underwent fusion per minute. The apparent fusion frequency is low because it is expressed in terms of frequency per vesicle per minute.

      As for the different effects of cytochalasin D and jasplakinolide on heterotypic fusion, we already discussed this in the manuscript (lines 537-558). In short, actin filaments extending in the apical-to-basal direction are relatively static and late endosomes receive sliding forces along the apical-basal axis by means of myosins (e.g., myosin V and myosin II) in heterotypic fusion. Thus, depolymerization of actin filaments by cytochalasin D treatment reduces heterotypic fusion, and conversely stabilization of actin filaments by jasplakinolide increases heterotypic fusion.

      (5) The authors need to analyze the distribution of actin filaments during homotypic fusion post-Cytochalasin D treatment.

      As requested by the referee, we have performed additional experiments to show the distribution of actin filaments during homotypic fusion of late endosomes after pretreatment with cytochalasin D. We show the data in a new figure: Figure 7–figure supplement 3.

      (6) Clarification is needed on whether overexpressing YFP-Cofilin led to the deterioration of cell functions.

      Thank you for the comments. As the reviewer pointed out, overexpression of cofilin can change cellular functions and actin architectures in cells (Aizawa et al., 1997; Popow-Wozniak et al., Histochem. Cell Biol., 2012, (138) 725-36). Although we did not observe apparent morphological changes of VE cells after YFP-cofilin expression, we cannot exclude the possibility that YFP-cofilin overexpression affected the distribution of actin filaments. Therefore, we have described this possibility in the text (lines 425-426).

      (7) Although the authors report that the S3 peptide does not affect heterotypic fusion, a reduction in average heterotypic fusion frequency post-treatment was detected (Figure 9G). The authors need to perform a statistical analysis of the quantification performed in Figure 9G.

      We apologize for this misleading graph representation. Because S3 peptide treatment did not change the fusion frequency significantly, we simply did not mark statistical significance in the previous graph. To clarify this point, we have added the label “n.s.” (not significant) to Figure 9G.

      (8) The authors need to provide the potential functional significance of apically extended actin filaments in positioning late endosomes in the discussion.

      We observed 3 different types of actin filaments in the apical region of VE cells (Figure 5). First, the actin mesh in zone 1, which does not interact directly with late endosomes, may function as a barrier preventing enlarged late endosomes from flowing backward from zone 2 to zone 1. Second, actin filaments extending from the apical to the basal direction on the surface of late endosomes are necessary for the movement of late endosomes toward lysosomes in a myosin-dependent manner. Third, the radial branched filaments on the surface of late endosomes temporarily polymerize in an Arp2/3-dependent manner and regulate the lateral movement of late endosomes. This actin organization coordinately regulates the position of late endosomes. We have added this explanation to the Discussion (lines 483-491).

      Reviewer #2 (Recommendations For The Authors):

      (1) What is the effect or physiological significance of the transition in fusion models?

      In material transport in cells, explosive fusion that completes membrane fusion quickly is more efficient and physiologically advantageous than slow bridge fusion. On the other hand, larger vesicle size is more effective in membrane trafficking than smaller size because large vesicles can transport a large amount of cargo molecules. However, as our mathematical modeling predicts, an increase in vesicle size leads to bridge fusion and decreases the transportation rate. Actin forces can resolve these conflicting effects because they convert the fusion mode from bridge to explosive in late endosomes in VE cells

      (2) I am confused about how to study heterotypic fusion between late endosomes and lysosomes using only transferrin labeling.

      We are sorry for any confusion this may have caused. Indeed, at first, we discovered that late endosomes shrank and disappeared after labeling of endocytic vesicles with transferrin only (Figure 3A). However, subsequently, we speculated that this disappearance was the result of heterotypic fusion with lysosomes, and to prove this possibility, we developed a double-labeling method in which late endosomes and lysosomes were labeled with 2 different colors (Figure 3B). In short, VE cells were incubated with dextran rhodamine for 20 min and subsequently pulse-labeled with Alexa Fluor 488-labeled transferrin for 5 min: when VE cells were observed, dextran rhodamine was already transported to lysosomes, whereas Alexa Fluor 488-labeled transferrin was still present in late endosomes, enabling the two vesicles to be observed separately.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is concerning that there are several points that are not fully explained regarding microscopic image analysis.

      (a) How were zones 1, 2, and 3 defined and how were the zones determined at each observation? Did the authors determine the zones subjectively based on the approximate size of the vesicles and the passage of time, or statistically by measuring endosomes from images of whole cells? The authors should describe this and also provide the approximate z-directional thickness of each of zones 1, 2, and 3.

      Thank you for pointing out this important issue, which is also raised by Reviewer #1. We initially analyzed the distribution and size of early endosomes, late endosomes, and lysosomes in VE cells by use of vesicle-specific markers (Figure 1B). Thereafter, at each observation, we determined the zones based on the characteristic size of the vesicles and time after labeling of endocytic vesicles. Especially, images of zone 2 and zone 3 were taken by focusing on the z-axis where late endosomes and lysosomes occupied the largest area in the optical slice images, respectively (lines 636-639). As for the z-directional thickness of each zone, we have added a description to the text (lines 123-124, 127-128, and 130-131) and also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1A).

      (b) Regarding "vesicle size" measured from confocal microscopy images: Does "vesicle size" mean surface area or maximum cross-sectional area? In any case, the authors should describe how and what area of the vesicles was measured from the images. The mathematical model used the surface area of the vesicle as a parameter. Better to be consistent.

      Thank you for the important questions. As the reviewer pointed out, the cross-sectional area of endosomes varies depending on the focal plane. To ensure uniformity of the focal plane across different images, we took the images by focusing on the z-axis where late endosomes (zone 2) or lysosomes (zone 3) occupied the largest area in the image. In the focal plane, we measured the size of all intact, unfragmented endosomes. We have now added this explanation to the Method section (lines 636-639).

      (c) The authors showed several time-lapse imaging data without a description of what "0 s" is the starting time of. For example, "0 s" in Figures 2A, B, 3A, and B, may have different meanings. Other data should be carefully examined and described.

      We apologize for the inadequate description. As the reviewer pointed out, each panel has a different meaning of "0s."Therefore, we have added explanation of the meaning of “0s” to the relevant figure legends (Figure 2A, B; Figure 3A, B; Figure 6A, F; Figure 7A, E, F; Figure 8A, Figure 6–figure supplement 1C, Figure 7–figure supplement 1B, Figure 7–figure supplement 3, Figure 7–figure supplement 4).

      (d) The meaning of "fusion time" in Figures 2D and 3F is unclear. Although it was speculated that the authors estimated it from the change in shape of the vesicles, how it was measured should be described.

      We apologize for the inadequate description. To indicate more clearly, we have added an explanation of the "fusion time" to the legend of Figures 2D and 3F (lines 898-899 and line 923, respectively).

      (2) The structure of the paragraph starting on line 158 is inappropriate. The authors state in line 159 that "this disappearance appeared to result from fusion of late endosomes with the underlying lysosomes". However, no hetero-fusion was observed here, only the disappearance of vesicles. The authors should mention that hetero-fusion occurred only after analysis of Figure 3CD.

      This reviewer thinks it is natural to state in this order: first, the disappearance of transferrin-positive vesicles was observed (Figure 3A). Such vesicles became dextran-positive as the transferrin signal began to disappear (Figures 3 B ,C, D). Thus, this is thought to indicate that hetero-fusion has occurred.

      We agree with the reviewer's comment and have rewritten the text following the reviewer's suggestion (lines 163-165, 176-180).

      (3) The mathematical model estimated that the vesicle size of 0.22-1.0 [𝜇𝑚2] is the size to switch the fusion mode. Since this is close to the size of endosomes in general cells, the authors may be able to discuss the generality of the fusion mode theory. It is up to the author to respond to this suggestion or not.

      Thank you for the comments. As our mathematical model depends on the assumption that the osmotic pressure is constant, late endosomes in VE cells, exhibiting a swollen morphology, may have higher osmotic pressure compared with endosomes in other cells and if so, the predicted vesicle size when the fusion mode switches may differ. Thus, we have decided not to mention the relationship between the vesicle size and fusion mode switching.

      (4) In Line 302 the authors mentioned "These results indicated that actin spots on the surface of late endosomes were dynamically regulated, especially in the apical area." However, the t-halves of 11.5s and 18.9s are only slightly different and of the same order, so it would be too much to say that dynamic regulation of actin occurs specifically in the apical region from a difference of this magnitude. The authors should weaken their arguments. It would be good to do a statistical test for significance between the FRAP data.

      Thank you for pointing out this important issue. To highlight the significant difference in the FRAP assay, we have added a new panel showing the statistical analysis of the halftime of recovery of each region of VE cells (Figure 6E). These data indicate that a significance difference in the halftime of recovery (t1/2) between actin spots in the apical and basal regions of zone 2. However, following the reviewer’s comment, we have weakened the description of the FRAP assay results (lines 310-312).

      (5) The discussion section is rather redundant. It could be shortened to be more concise instead of repeating the results.

      Thank you for the comments. We have shortened the Discussion section.

      Minor comments

      In Figure 2C, the statistical test method was not described in the legend.

      Thank you for the comments. We have added the data of the statistical test to the figure legend of Figure 2C (lines 895-896).

      Figure 3G does not look like a normal distribution, so the t-test is inappropriate.

      Thank you for the comments. We have changed the statistical analysis method and used the Mann-Whitney U test. For the same reason, we have changed the analysis method shown in Figure 2C.

      Is Figure 5D the image of zone 1 because it is close to the apical plane? If so, are the IgG-positive structures early endosomes rather than late endosomes? This seems inconsistent with the data in Figure 1.

      Thank you for the comments. The round vesicles observed in this panel are the late endosomes in zone 2. Because most of the internalized fluorescence marker has moved to the late endosomes in zone 2 at this time point (5 min after chasing), early endosomes are not labeled in this image. We have added a dotted line to the x-z axis image (the second top panel) to indicate the depth of the x-y axis image (top panel) in Figure 5D.

      Figure 6B appears to have little or no fluorescence recovery. Is this a typical example? It is also unclear if this is an apical or basal example.

      Thank you for the comments. This image is a typical example. We focused on the dot structures on the surface of late endosomes rather than the fluorescence intensity over the entire photobleached area. To prevent misunderstanding, we have added arrowheads to highlight the actin dot structures that we were analyzing. The FRAP data shown in Figure 6B were obtained at the apical region of zone 2. We have also added this information to the figure legend.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision-making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories regarding how catecholamines modulate the instantiation of Pavlovian biases of decision making. The reviewer rightly notices that we offer three neuroanatomical routes through which methylphenidate might have acted to elicit these effects. It is important to note, however, that the current study does not provide evidence that can disentangle these different hypotheses. Accordingly, these three neuroanatomical routes raise questions for future research.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss a (i)modulation by catecholamines a striatal ‘origin’ of Pavlovian biases, (ii) catecholaminergic modulation of Pavlovian-biases through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these frontal and striatal processes. Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts.  We believe that discussing these possible explanations of our data actually enriches our discussion and strengthen our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this clearer.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this line of reasoning clearer.

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This will be corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Indeed, as we discuss, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal drives of the effects. In our planned revisions, we will try to clarify this point, as per our reply to reviewer 1.

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, in our planned revisions, we will bring forward the statements that clarify the originality of the current experiment.

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We will also clarify that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision will follow up on the suggestion of the reviewer to include discussion about the effects of MPH on noradrenaline and behavioural flexibility (and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and will incorporate these in our planned revision of the manuscript.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement in BIC for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98\=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X2 = 9.5, p=0.002). We will report these findings in the revised manuscript.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [18F]-FDOPA PET imaging is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [18F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

    1. Author response:

      Reviewer #1:

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. 

      Several times in the text we use the expression “effect of dN/dS on…” which might indeed suggest a causal relationship. The phrasing refers to dN/dS being used in the regression as an independent variable that can be able to predict the variation of the dependent variables genome size and TE content. We are going to rephrase these expressions so that correlation is not mistaken with causation.

      In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

      The analysis focuses on metazoans for two reasons: one practical and one fundamental. The practical reason is computational. Our analysis included TE annotation, phylogenetic estimation and dN/dS estimation, which would have been very difficult with the hundreds, if not thousands, of plant genomes available. If we had included plants, it would have been natural to include fungi as well, to have a complete set of multicellular eukaryotic genomes, adding to the computational burden. The second fundamental reason is that plants show important genome size differences due to more frequent whole genome duplications (polyploidization) than in animals. It is therefore possible that the effect of selection on genome size is different in these two groups, which would have led us to treat them separately, decreasing the interest of this comparison. For these reasons we chose to focus on animals that still provide very wide ranges of genome size and population size well suited to test the impact of drift.

      Reviewer #2:

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that

      (i) low-Ne species have more junk in their genomes and

      (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      The MHH is arguably the most structured and influential theoretical framework proposed to date based on the null assumption (i), therefore setting the paper up with the MHH is somehow inevitable. Because of this, in the manuscript, we mostly discuss the peculiarities of TE biology that can drive the genome away from the MHH expectations, focusing on the mutational aspect. We however agree that the hazard posed by extra DNA is not limited to the gain of function via the mutation process, but can be linked to many other molecular processes as mentioned above. In a revised manuscript, we will make the concept of hazard more comprehensive and further stress that this applies not only to TEs but any nearly-neutral mutation affecting non-coding DNA.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      We agree that comparing dispersion of the points from the non-phylogenetically corrected correlation with the results of the phylogenetic contrasts intuitively emphasizes the importance of accounting for species relatedness. Just looking at the clade colors in Figure 2 makes immediately stand out that a simple regression hides phylogenetic structure. We will stress this in the discussion to make the point clear.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

      We respectfully disagree with the reviewer that there is currently no evidence for an effect of Ne on genome size evolution. While it is accurate that our large dataset allows us to reject the universality of Ne as the major contributor to genome size variation, this does not exclude the possibility of such an effect in certain contexts. Notably, there are several pieces of evidence that find support for Ne to determine genome size variation and to entail nearly-neutral TE dynamics under certain circumstances, e.g. of particularly strongly contrasted Ne and moderate divergence times (Lefébure et al. 2017; Mérel et al. 2024; Tollis and Boissinot 2013; Ruggiero et al. 2017). The strength of such works is to analyze the short-term dynamics of TEs in response to Ne within groups of species/populations, where the cost posed by extra DNA is likely to be similar. Indeed, the MHH predicts genome size to vary according to the combination of drift and mutation under the nearly-neutral theory of molecular evolution. Our work demonstrates that it is not true universally but does not exclude that it could exist locally. Moreover, defense mechanisms against TEs proliferation are often complex molecular machineries that might or might not evolve according to different constraints among clades. We have detailed these points in the discussion.

      Reviewer #3:

      Summary

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      Strengths

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      Weaknesses

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      There are indeed some discrepancies between our estimates of low complexity repeats and those from the literature due to the approach used. Hence, occasional underestimates or overestimates of repeat content are possible. As noted, the contribution of “Other” repeats to the overall repeat content is generally very low, meaning an underestimation bias. We thank the reviewer for providing this interesting review. We will emphasize it in the discussion of our revised manuscript.

      Not being able to correctly estimate the quantity of satellites might pose a problem for quantifying the total content of junk DNA. However, the overall repeat content mostly composed of TEs correlates very well with genome size, both in the overall dataset and within clades (with the notable exception of birds) so we are confident that this limitation is not the explanation of our negative results. Moreover, while satellite information might be missing, this is not problematic to test our a priori hypothesis since we focus our attention on TEs, whose proliferation mechanism is very different from that of tandem repeats.

      Finally, divergence from the consensus can be estimated only for TEs. Therefore, recently active elements do not include simple and tandem repeats: yet the results based on recent TE content are very similar to those based on the overall repeat content.

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

      Indeed, dnaPipeTE is not good at detecting old TE copies due to the read-based approach, biasing the outcome towards new elements. We agree on TE content being underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. However, the sum of old TEs and recent TEs is extremely well correlated to genome size (Pearson’s correlation: r = 0.87, p-value < 2.2e-16; PIC: slope = 0.22, adj-R2 = 0.42, p-value < 2.2e-16). Our main result therefore does not rely on an accurate estimation of old TEs. In contrast, we hypothesized that recent TEs could be interesting if selection acted on TEs insertion and dynamics rather than on non-coding DNA. Our results demonstrate that this is not the case: it should be noted that in spite of its limits for old TEs, dnaPipeTE is especially fitting for this specific analysis as it is not biased by very repetitive new TE families that are problematic to assemble. We will clearly emphasize the limitation of dnaPipeTE and discuss the consequences on our results in the discussion of the revised manuscript.

      Finally, in a preliminary analysis on the dipteran species, we show that the TE content estimated with dnaPipeTE is generally similar to that estimated from the assembly with earlGrey (Baril et al. 2024) across a good range of genome sizes going from drosophilid-like to mosquito-like (Pearson’s correlation: r = 0.88, p-value = 3.22e-10; see also the corrected Supplementary Figure S2 below). While for these species TEs are probably dominated by recent to moderately recent TEs, Aedes albopictus is an outlier for its genome size and the estimations with the two methods are largely consistent. However, the computation time required to estimate TE content using EarlGrey was significantly longer, with a ~300% increase in computation time, making it a very costly option (a similar issue is applicable to other assembly-based annotation pipelines). Given the rationale presented above, we decided to use dnaPipeTE instead of EarlGrey.

    1. Author response:

      Reviewer #1:

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Comment 1. Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Response1: We thank the reviewers' comments. We have done the green fluorescent protein (GFP) pregnant mice-related animal experiment, which was not shown in this manuscript. The wild-type (WT) female mice were mated with either transgenic male mice, genetically modified to express GFP, or with WT male mice, in order to generate either GFP-expressing pups (GFP-pups) or their genetically unmodified counterparts (WT-pups), respectively. Mice were euthanized on day 18.5 of gestation, and the uteri of the pregnant females and the placentas of the offspring were analyzed using flow cytometry. The majority of macrophages in the uterus and placenta are of maternal origin, which was defined by GFP negative. In contrast, fetal-derived macrophages, distinguished by their expression of GFP, represent a mere fraction of the total macrophage population, signifying their inconsequential or restricted presence amidst the broader cellular landscape. We will added the GPF pregnant mice-related data in Figure 4-figure supplement 1 to explain the different macrophage populations in the uterine and placental cells.

      Comment 2. Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Response 2: We thank the reviewers' comments. In our experiments, PLX3397 or Clodronate Liposomes was used to deplete the macrophages of pregnant mice, and then we injected RUPP-derived pro-inflammatory macrophages and anti-inflammatory macrophages back into PLX3397 or Clodronate Liposomes-treated pregnant mice. And We found that RUPP-derived F480+CD206- pro-inflammatory macrophages induced immune imbalance at the maternal-fetal interface and PE-like symptoms (Figure 4E-4H and Figure 4-figure supplement 1 A-C).

      Comment 3. Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Response 3: We thank the reviewers' comments. PLX3397, the inhibitor of CSF1R, which is needed for macrophage development (Nature. 2023, PMID: 36890231; Cell Mol Immunol. 2022, PMID: 36220994), we have stated that on line 189-191. However, PLX3397 is a small molecule compound that possesses the potential to cross the placental barrier and affect fetal macrophages. We will discuss the impact of this factor on the experiment in the discussion section.

      Comment 4. Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Response 4: We thank the reviewers' comments. Upon analysis, we observed a notable elevation in the frequency of total macrophages within the CD45+ cell population. Then we subsequently performed macrophage clustering and uncovered a marked increase in the frequency of Cluster 0, implying a potential correlation between Cluster 0 and donor-derived cells. RNA sequencing revealed that the F480+CD206- pro-inflammatory donor macrophages exhibited a Folr2+Ccl7+Ccl8+C1qa+C1qb+C1qc+ phenotype, which is consistent with the phenotype of cluster 0 in macrophages observed in single-cell RNA sequencing (Figure 4D and Figure 5E). Therefore, we believe that the donor cells is cluster 0 in macrophages.

      Comment 5. Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

      Response 5: We thank the reviewers' comments. Firstly, our other animal experiments in this manuscript used the Reduction in Uterine Perfusion Pressure (RUPP) mouse model to simulate the pathology of PE. However, the RUPP model requires ligation of the uterine arteries in pregnant mice on day 12.5 of gestation, which hinders T cells returning from the tail vein from reaching the maternal-fetal interface. In addition, this experiment aims to prove that CD4+ T cells are differentiated into memory-like Th17 cells through IGF-1R receptor signalling to affect pregnancy by clearing CD4+ T cells in vivo with an anti-CD4 antibody followed by injecting IGF-1R inhibitor-treated CD4+ T cells. And we proved that injection of RUPP-derived memory-like CD4+ T cells into pregnant rats induces PE-like symptoms (Figure 6). In summary, the application of the LPS model in Figure 8 does not affect the conclusions.

      Reviewer #2:

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      Comment 1. Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      Response 1: We thank the reviewers' comments. According to the reviewer's suggestion, we will proceed with making the necessary revisions. Firstly, We will modify the title of the article to be more specific. Then, we will introduce the RUPP mouse model when interpreted Figure 4. Thirdly, we plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow. Finally, We will diligently correct the grammatical and spelling errors in the article. As for the figure comparing pro- and anti-inflammatory macrophages, The Editor requested a more comprehensive description of the macrophage phenotype during the initial submission. As a result, we conducted the transcriptomes of both uterine-derived pro-inflammatory and anti-inflammatory macrophages and conducted a detailed analysis of macrophages in single-cell data.

      Comment 2. The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      Response 2: We thank the reviewers' comments. Placental villi rather than fetal membranes and decidua were used for CyToF in this study. This detail about how human placenta samples were processed will be added to the Materials and Methods section.

      Comment 3. Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      Response 3: We thank the reviewers' comments. The detail about the analysis of the CyTOF and scRNAseq data will be added in the Materials and Methods section.

      Comment 4. There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      Response 4: We thank the reviewers' comments. In Figure 1, CD45+ immune cells were clustered into 10 subpopulations, which included gdT cells. While Figure 2 displays the further clustering analysis of CD4+T, CD8+T, and gdT cells, with gdT cells being further subdivided into 22 clusters (Figure 2-figure supplement 1C). The number of biological replicates (samples) is consistent with Figure 1.

      Comment 5. The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      Response 5: We thank the reviewers' comments. The t-SNE distributions of the 15 clusters of CD4+ T cells, 18 clusters of CD8+ T cells, and 22 clusters of gdT cells are shown separately in Figure 2A, F, and I. The heatmaps displaying the expression levels of markers in these clusters of CD4+ T cells, CD8+ T cells, and gdT cells are presented in Figure 2-figure supplement 1A, B, and C, respectively. The t-SNE distributions of the 29 clusters of CD11b+ cells are shown in Figure 3A, and the heatmap displaying the expression levels of markers in these clusters is presented in Figure 3B. As for sc-RNA sequencing, the heatmap and UMAP distributions of the 15 clusters of macrophages are shown separately in Figure 5C and 5D. The UMAP distributions and heatmap of the 12 clusters of T/NK cells are shown in Figure 6A and 6B. The UMAP distributions and heatmap of the 9 clusters of T/NK cells are shown in Figure 7A and 7B.

      Comment 6. The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      Response 6: We thank the reviewers' careful checking. During our verification, we found that one sample in the NP group had pregnancy complications other than PE and GMD. The data in Figure 2H-2K was not updated in a timely manner. We will promptly update this data and reanalyze it.

      Comment 7. There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      Response 7: We thank the reviewers' comments. We have done the Treg-related animal experiment, which was not shown in this manuscript. We will add the Treg-related data in Figure 6. The injection of CD4+ T cells derived from RUPP mouse, characterized by a reduced frequency of Tregs, could induce PE-like symptoms in pregnant mice. Additionally, we will add a necessary discussion about Tregs.

      Comment 8. In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      Response 8: We thank the reviewers' comments. Although we do not have additional tissues or cells available to conduct FACS or CyTOF staining, including for CD66b, we plan to utilize CD15 and CD66b antibodies for immunofluorescence staining of placental tissue. Suppressing effector T cells is a signature feature of MDSCs, and T cells may also influence the functions of MDSCs, we will refer to this review and discuss it in the Discussion section of the article.

      Comment 9. Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      Response 9: We thank the reviewers' comments. We already have the additional data on the efficiency ofmacrophage depletion involving PLX3397 and clodronate liposomes, which were not present in this manuscript, and we'll add it to the manuscript. The clodronate piece is mentioned in the main text (Line 197-201), but only briefly described, because the results using clodronate we obtained were similar to those using PLX3397.

      Comment 10. There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      Response 10: We thank the reviewers' comments. We plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow.

      Comment 11. There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      Response 11: We thank the reviewers' comments. We will search for more literature and reference additional studies that have conducted similar analyses.

      Comment 12. Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

      We thank the reviewers' comments. As stated in the Statistical Analysis section (lines 601-604), the Kruskal-Wallis test was used to compare the results of experiments with multiple groups. Comparisons between the two groups in Figures 6-7 were conducted using Student's t-test. The aforementioned statistical methods will be included in the figure legends.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:

      Their methods, data and analyses broadly support their claims.

      Weaknesses:

      The molecular mechanism is somewhat preliminary.

      We thank the reviewer for the constructive comments. To further elucidate the molecular mechanisms underlying the observed phenotypes, we will conduct the following experiments: (1) perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos and in id2b-deleted embryos; (2) use RNAscope to detect the expression of id2b in developing embryos; (3) validate the interaction between Id2b and Tcf3b in vivo; and (4) conduct CUT&Tag experiments in developing zebrafish embryos to further validate the Tcf3b binding sites upstream of nrg1.

      Reviewer #2 (Public review):

      Summary:

      Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:

      The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:

      The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricaine treatment was a compensatory response to the down-regulation of id2b expression.

      As suggested by the reviewer, we will perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos, as well as in id2b-deleted embryos.

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      We have examined the expression levels of id2b in both klf2a and klf2b mutants. The whole mount in situ results clearly demonstrate a decrease in id2b signal in both mutants. As noted by the reviewer, klf2 is a transcriptional regulator, suggesting that the regulation of id2b may occur at the transcriptional level. However, dissecting the molecular mechanisms underling the crosstalk between klf2 and id2b is beyond the scope of the present study.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

      We agree with the reviewer and will perform additional experiments to validate the interaction between Id2b and Tcf3b in vivo. Due to the lack of antibodies targeting these proteins, we will overexpress Flag-id2b and HA-Tcf3b in zebrafish embryos and conduct a co-IP analysis.

      Reviewer #3 (Public review):

      Summary:

      How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:

      Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:

      Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

      We thank the reviewer for the constructive assessments. Our specific responses are as follows:

      (1) As all the morpholinos used in this study, including those targeting bmp ligands, the cilia gene ift88, and tcf3b, have been published and validated using genetic mutants in previous studies, we believe these loss-of-function analyses are sufficient to delineate their role in regulating id2b expression or function.

      (2) To assess the role of BMP versus blood flow in regulating endocardial id2b expression, we plan to perform live imaging in the id2b:GFP knockin line prior to the initiation of the heartbeat, with or without of BMP inhibitors.

      (3) We will revise the data presentation and use bar graphs with individual data points.

      (4) We plan to perform additional Co-IP experiment in zebrafish embryos to assess the interaction between Tcf3b and Id2b.

      (5) To further validate the tcf3b binding sites upstream of nrg1, we will conduct CUT&Tag experiments in developing zebrafish embryos.

    1. Author response:

      Reviewer #1 (Public Review):

      Weakness #1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As introduced in the results section, we screened 57 driver strains based on previous studies, either they were reported identifying a single (a pair of) dopaminergic neuron (DAN) in larvae or identifying only several DANs in the adult brain indicating the potential of identifying single dopaminergic neuron in larvae. In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae4, while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains5,6. We examined these strains and only some of them labeled single DANs in 3rd instar larval brains (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the strain in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is a strain we screened labeling only a single DAN in the 3rd instar larval brains. Others (Figure 1g, h, l, and m) we still describe them as strains labeling single DANs, but they also label one to several non-DANs. In Figure 1, we mainly showed the strains labeling single DANs. The labeling patterns of other screened driver strains were summarized in Table1. Since all brain images of the rest 47 strains are available, we will state in Fig S1 that additional brain images can be provided upon request.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows single DA neuron in each brain hemisphere. Additional GFP (+) signals were often observed, but not from cell bodies of DANs because they were not stained by a TH antibody. These additional GFP (+) signals were mainly neurites, including axonal terminals, but could be false positive signals or weakly stained non-neuronal cell bodies. This conclusion was based on analysis of a total of 22 larval brains. We will add this in the text or Fig S1 caption. Enlarged insert of GFP (+) signals will be added also to Figure S1c.  

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for the suggestion. MB320C mainly labels PPL1-y1pedc in the adult brain, with one or two other weakly labeled cells. It will be interesting to investigate the pattern of this driver in 3rd instar larval brains. If it only covers DAN-c1, we can try to knock-down D2R in this strain to check whether it can repeat our results. This will be an interesting fly strain to test, but we believe that it will not be necessary for our current manuscript as DAN-c1 driver is very specific (for details, refer to our response to Reviewer#3). However, this line will be very useful for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. (2020) shows strongly labeled four neurons on each brain hemisphere9, indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree that the words ‘necessary’ and ‘sufficient’ are too exclusive for other neurons. As mentioned in the Discussion part, we do think other dopaminergic neurons may also be involved in larval aversive learning. We are going to re-phrase these words by replacing them with more logically appropriate words, such as ‘important’, ‘essential’, or ‘mediating’.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is a great point! Yes, we cannot rule out the possibility that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine. The experimental results with TRPA1 could be caused by depletion of dopamine, or DA inactivation due to prolonged depolarization or adaptation. However, we still think that our hypothesis on the over-excitation of DAN-c1 is more consistent with our experimental results and other published data. Our justification is as follows:

      (1) Associative learning occurs only when the CS and US are paired. In wild type larvae, a specific odor (conditioned stimulus, CS, such as pentyl acetate) depolarizes a subset of Kenyon cells in the mushroom body, while gustatory unconditioned stimulus (US, quinine) induces dopamine release from DAN-c1 to the lower peduncle (LP) compartment in the mushroom body (Figure 7a). Only when the CS and US are paired, calcium influx caused by CS and Gas activated by D1R binding to dopamine will turn on a mushroom body specific version of adenylyl cyclase, rutabaga, which is the co-incidence detector in associative learning (Figure 7d).

      (2) Rutabaga transforms ATP into cAMP, activating PKA signaling pathway and modifying the synaptic strength from mushroom body neurons (MBN, also called Kenyan cells) to the mushroom body output neurons (MBON, Figure 7d). This change in synaptic strength will lead to learned responses when the same odor appears again.

      (3) In our work, we found D2R is expressed in DAN-c1, and knockdown D2R in DAN-c1 impairs larval aversive learning. As D2R reduces cAMP level and neuronal excitability3, we hypothesized that knockdown of D2R in DAN-c1 would remove the inhibition of D2R auto-receptor, and lead to more dopamine (DA) release when US (quinine) was delivered compared to the wild type larvae. The elevated DA release along with calcium influx caused by CS increases the cAMP level in MBN, which leads to the learning deficit (over-excitation, Figure 7b). Mutant larvae with excessive cAMP, dunce, showed aversive learning deficiency, supporting our hypothesis2.

      (4) Our results of TRPA1 can be explained by this over-excitation hypothesis. When DAN-c1 is activated (34C) in distilled water group, the artificial activation mimicked the gustatory activation of quinine. The larvae showed the aversive learning responses towards the odor (Figure 2k DW group). When DAN-c1 is activated (34C) in sucrose group, the artificial activation mimicked the gustatory activation of quinine, so the larvae showed a learning response combining both appetitive and aversive learning (Figure 2k SUC group).

      (5) When DAN-c1 is activated (34C) in quinine group, the artificial activation and the gustatory activation of quinine lead to elevated DA release from DAN-c1. During training, this elevated DA caused over-excitation of MBN, leading to failure of aversive learning (Figure 2k QUI group), which had a similar phenotype compared to larvae with D2R knockdown in DAN-c1.

      (6) Similarly, optogenetic activation of DAN-c1 during aversive training, leads to elevated DA release from DAN-c1 (both gustatory activation of quinine and artificial activation). This would also cause over-excitation of MBN, and lead to failure of aversive learning. Artificial activation in other stages (resting or testing) won’t cause elevated DA release during training, so the aversive learning was not affected (Figure 5b).

      (7) However, when optogenetic activation was applied during training, we did not observe aversive learning responses in the distilled water group, or a reduction in the sucrose group (Figure 5c, Figure 5d). Our explanation is that the optogenetic stimulus we applied is too strong, DAN-c1 has already released elevated DA in both groups. So, the aversive learning in these groups has already been impaired, they just showed the corresponding learning responses to distilled water or sucrose.

      (8) We also applied this over-excitation to activate MBNs. As MBN takes over both appetitive and aversive learnings, over-excitation of MBNs led to deficit in both types of learning, which follows our hypothesis (Figure 6).

      In summary, we hypothesized that DAN-c1 restricts DA release via activation of D2R, which is important for larval aversive learning. D2R knockdown or artificial activation of DAN-c1 during training would induce elevated DA release, leading to over-excitation of MBNs and failure of aversive learning.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3 c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al., (2023) used the antibody from Draper et al.10. We have tried the same antibody, but we were not able to observe clear signals after staining. Maybe it is not specific for the neurons in the fly larval brain, or our staining protocol did not fit with this antibody.

      Unfortunately, we were not able to find Lam (1999) paper.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We also think that other DANs may be involved in aversive learning. We re-analyzed the learning assay data, seemingly D2R knockdown in DAN-g1 with miR partially affected aversive learning when trained with pentyl acetate (Figure S4e). We are going to build single statistic panels for DAN-g1 and DAN-d1. However, neither larvae with D2R knockdown in DAN-g1 using miR trained with propionic acid (Figure S5a), nor larvae with D2R knockdown in DAN-g1 using RNAi trained with pentyl acetate (Figure S5b) showing aversive learning deficit. We will add paragraphs about this in both Results and Discussion sections.

      Reviewer #2 (Public Review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Public Reviewer #1.

      Reviewer #3 (Public Review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and the suggestions. For the strain R76F02AD; R55C10DBD, we examined 22 third instar larval brains expressing GFP or Syt-GFP and Den-mCherry, all of them clearly labeled DAN-c1. Half of them only labeled DAN-c1, the rest have 1 to 5 weak labeled soma without neurites. Barely 1 or 2 strong labeled cells appear. These non-DAN-c1 neurons are seldom dopaminergic neurons. In VNC, 8 out of 12 do not label cells, 3 have 2-4 strong labeled cells. These data supported that R76F02AD;R55C10DBD exclusively labeled DAN-c1 in 3rd instar larval brains.

      For the question about the pattern of R76F02AD; R55C10DBD and the expression pattern of D2R in larval body, it is an interesting question. However, our main focus was on the central nervous system and the learning behaviors in fruit fly larvae, we may investigate this question in the future.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted this single odor larval learning paradigm from Honjo’s papers1,2. In these works, Honjo et al. first designed and performed this single odor paradigm for larval olfactory associative learning. To address the reviewer’s question about the potential non-associative effects of the 30-min quinine or sucrose exposure, we would like to defend it primarily based on results from Honjo et al. (2005 and 2009). They applied the odorant to the larvae after training, only the ones had paired training with both odor and unconditioned stimulus (quinine or sucrose) showed learning responses. Larvae exposed 30 min in only odorant or unconditioned stimulus did not show different response to the odor compared to the naïve group1,2. To validate this paradigm induces associative learning responses, they also tested the paradigm from three aspects:

      (1) The odor responses are associative. Honjo et al. showed only when the odorant paired with unconditioned stimulus would induce corresponding attraction or repulsion of larvae to the odor. Neither odorant alone, unconditioned stimulus alone, nor temporal dissociation of odorant and unconditioned stimulus would induce learning responses.

      (2) The odor responses are odor specific. When applied a second odorant that was not used for training, larvae only showed learning responses to the unconditioned stimulus paired odor. This result ruled out the explanation of a general olfactory suppression and indicates larvae can discriminate and specifically alter the responses to the odor paired with unconditioned stimulus. Although the two-odor reciprocal training is not used, these results can show the association of unconditioned stimulus and the corresponding paired odor.

      (3) Well known learning deficit mutants did not show learned responses in this learning paradigm. Honjo et al. tested mutants (e.g., rut and dnc) showing learning deficits in the adult stage with two odor reciprocal learning paradigm. These mutant larvae also failed to show learning responses tested with the single odor larval learning paradigm.

      (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid), as well as two D2R knockdown strains (UAS-miR and UAS-RNAi for D2R). We obtained similar results for larvae with D2R knockdown in DAN-c1. In addition, our naïve olfactory, naïve gustatory, and locomotion data ruled out the possibilities that the responses were caused by impaired sensory or motor functions. Comparison with the control group (odor paired with distilled water) ruled out the potential effects if habituation existed. All these results supported this single odor learning paradigm is reliable to assess the learning abilities of Drosophila larvae. And the failure of reduction in R.I when larvae with D2R knockdown in DAN-c1 were trained in quinine paired with the odorant is caused by deficit in aversive learning ability. We will add a paragraph to address this in the Discussion part.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      It is a good question. We gave 5 min during the testing stage to allow the larvae to wander in the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaching to -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibirets1 gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling1. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibirets1 insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibirets1 has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila_11, and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for _Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning12,13. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR.

      For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e). We will re-organize the figures to make them easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We will read through this paper and try to add it as possible explanations for the learning mechanisms. As we introduced in the Discussion section, the learning mechanism is quite complex, mixing both non-linear neuronal circuits and multiple signaling pathways, in responding to complex environmental learning contexts. We will try to develop a better hypothesis with the best compatibility to accommodate our results with published data.

      Reference

      (1) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (2) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (3) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (4) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (5) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (6) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (7) Hartenstein, V., Cruz, L., Lovick, J. K. & Guo, M. Developmental analysis of the dopamine-containing neurons of the Drosophila brain. J Comp Neurol 525, 363-379 (2017). https://doi.org/10.1002/cne.24069

      (8) Aso, Y. et al. The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014). https://doi.org/10.7554/eLife.04577

      (9) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (10) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (11) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (12) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (13) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

    1. Author Response:

      Thank you very much for your consideration and assessment. We really appreciate the generous comments from the reviewers on our manuscript entitled “BCAS2 promotes primitive hematopoiesis by sequestering β-catenin within the nucleus”. The comments are very helpful for the improvement of our work. We would like to provide the following provisional revision plan to address the public reviews:

      1. To clarify if Bcas2 also promotes primitive myelopoiesis by enhancing nuclear accumulation of β-catenin, bcas2 morpholino will be injected into the Tg(coro1a:EGFP) zebrafish embryos at 1-cell stage, and subsequently the β-catenin distribution in the myeloid cells will be examined. Tg(coro1a:EGFP) is commonly used to track both macrophages and neutrophils.

      2. According to the reviewers’ comments, we will quantify the fluorescence intensity in the cell nucleus and cytoplasm in Figure 3H. Meanwhile, we will adjust the exposure of Figure 5C and Figure 7E, or replaced the figures with high-resolution ones.

      3. Previous studies have reported that β-catenin can bind directly to CRM1 through its central armadillo (ARM) repeats region. β-catenin region containing ARM repeats 10 and the C terminus are essential for its nuclear export (Koike M, et al., The Journal of Biological Chemistry, 2004). In our research, BCAS2 has been demonstrated to bind to the 9-12 ARM repeats of β-catenin. Therefore, it is highly likely that Bcas2 may compete with CRM1 for binding with the nuclear export signal peptide on β-catenin. To further test this possibility, we will transfect HEK293T cells with constructs expressing full-length or truncated forms of β-catenin, and then examine their nuclear distribution. 

      4. To validate if BCAS2 affects CRM1-dependent nuclear export of other classical factors, we plan to knock down or overexpress BCAS2 in HeLa cells, and detect the distribution of ATG1 and CDC37L, which have been identified as CRM1 cargoes.

      5. Considering that the ARM repeats bound by Bcas2 (repeats 9-12) and Tcf (repeats 2-9) might not be mutually exclusive, it is indeed appealing to investigate whether β-catenin can simultaneously interact with Tcf and Bcas2. We will follow review’s suggestion to perform a three-way co-immunoprecipitation assay. Plasmids encoding these three proteins will be co-transfected into cells. Cell lysates will be immunoprecipitated using antibodyspecific to the bait protein (e.g., β-catenin) and eluted proteins will be analyzed using antibodies specific to the other two proteins.

      6. To elucidate that canonical Wnt signaling regulates hematopoietic development by activating expression of cdx1acdx4, and their downstream targets hoxb5a and hoxa9a as previously reported, we intend to examine the expression of cdx4 and hoxa9a in bcas2+/- embryos at 10 ss by performing in situ hybridization.

      7. To further validate whether Wnt signaling is required during endothelial differentiation from angioblasts, wild-type embryos will be subjected to treatment with Wnt inhibitor CCT036477 and the expression of hemangioblast markers npas4lscl, and gata2 and endothelial markers fli1 will be analyzed using in situ hybridization.

      8. In order to clarify whether coiled-coil (CC) domain 1-2 of Bcas2 is sufficient to interact with β-catenin and restore the primitive hematopoietic defect, we will overexpress CC1-2 in Tg(gata1:GFP) embryos injected with bcas2 morpholino, and then investigate the distribution of β-catenin, as well as gata1 expression at 10 ss in these embryos.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dosal and vental anatomies of a pothential new taxon of atiopodans that are closely related to trolobites. Authors assigned their specimens to Acanthomeridion serratum, and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critially, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda.

      Strengths:

      New specimens are highly qualified and informative. The morphology of dorsal exoskeleton, except for the supposed free cheek, were well illustrated and described in detail, which provide a wealth of information for taxonmic and phylogenic analyses.

      Weaknesses:

      The weaknesses of this work is obvious in a number of aspects. Technically, ventral morphlogy is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphomitric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. I am confused by author's description of free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of cephalic shield, e.g. hypostome and fixgena. Critically, homology of cephalic slits (eye slits, eye notch, doral suture, facial suture) not extensivlely discussed either morphologically or functionally. Finally, authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can be explain a deep homology of cephalic suture in molecular level and multiple co-options within the Atiopoda.

      Comments on the revised version:

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      Some more details of the changing article shape and overall length of antennae has been added to the description.

      (2) There are also imprecise descriptions of features.

      Measurements, dimensions and multiple figures are provided for many features in the text and the supplement includes more figures. In total, 11 figures are provided with details (photographs or measurements) of the material.

      (3) Ontogeny of the cephalon was not described.

      A sentence has been added to the description to note the changing width:length of the cephalon during ontogeny, with a reference to Figure 6.

      (3) The critical head element is the so called "ventral plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives an indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      As noted in the diagnosis for the genus, these notches are interpreted to accommodate the eye stalks. The homology of the ventral plates is discussed at length in the manuscript, and is the focus of the three sets of phylogenetic analyses performed.

      Reviewer #3 (Public Review):

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are proposed be associated with ventral plates that the authors homologise with the free cheeks of trilobites (although also testing alternative homologies). An update of a published phylogenetic dataset permits reconsideration of whether dorsal ecdysial sutures have a single or multiple origins in trilobites and their relatives.

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variation within a single species. New microtomographic data shed light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites.

      I think the revision does a satisfactory job of reconciling the data and analyses with the conclusions drawn from them. Referee 1's valid concerns about whether a synonymy of Acanthomeridion anacanthus is justified have been addressed by the addition of a length/width scatterplot in Figure 6. Referee 2's doubts about homology between the librigenae of trilobites and ventral plates of Acanthomeridion have been taken on board by re-running the phylogenetic analyses with a coding for possible homology between the ventral plates and the doublure of olenelloid trilobites. The authors sensibly added more trilobite terminals to the matrix (including Olenellus) and did analyses with and without constraints for olenelloids being a grade at the base of Trilobita. My concerns about counting how many times dorsal sutures evolved on a consensus tree have been addressed (the authors now play it safe and say "multiple" rather than attempting to count them on a bushy topology). The treespace visualisation (Figure 9) is a really good addition to the revised paper.

      Weaknesses:

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in Zhiwenia/Protosutura, Acanthomeridion and Trilobita. The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). This paper is not a game-changer because these questions have been asked several times over the past seven years, but there are solid, worthy advances made here.

      I'd like to see some of the most significant figures from the Supplementary Information included in the main paper so they will be maximally accessed. The "stick-like" exopods are not best illustrated in the main paper; their best imagery is in Figure S1. Why not move that figure (or at least its non-redundant panels) as well as the reconstruction (Figure S7) to the main paper? The latter summarises the authors' interpretation that a large axe-shaped hypostome appears to be contiguous with ventral plates.

      We have moved these figures from the supplementary information to the main text, and renumbered figures accordingly. Fig S1 has now been split – panels a and b are in the main text (new Fig. 4), with the remainder staying as Fig S1. Fig S7 is now Fig. 8 in the main text.

      The specimens depict evidence for three pairs of post-antennal cephalic appendages but it's a bit hard to picture how they functioned if there's no room between the hypostome and ventral plates. Also, a comment is required on the reconstruction involving all cephalic appendages originating against/under the hypostome rather the first pair being paroral near the posterior end of the hypostome and the rest being post-hypostomal as in trilobites.

      A short comment has been added to the caption.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      (2) There are also imprecise descriptions of features (see my annotations in submitted ms).

      (3) Ontogeny of the cephalon was not described.

      (3) The critical head element is the so called "vental plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives a indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      The references swap back and forth between journal titles being abbreviated or written out in full. Please standardise this to journal format rather than alternating between two different styles.

      Line 145: Perez-Peris et al. (2021) should be cited as the source for the Anacheirurus appendages.

      Added, thank you.

      Line 310: The El Albani et al (2024) paper on ellipsocephaloid appendages should be noted in connection with an A+4 (rather than A+3) head in trilobites.

      Added.

      Minor or trivial corrections:

      Line 51: move the three citations to follow "arthropods" rather than following "artiopodans", as none of these papers are specifically about Artiopoda.

      Changed thank you

      Caption to Figure 1 and line 100: Acanthomeridion appears in Figure 1 and in the text with no context. Please weave it into the text appropriately.

      Line 136: The data were...

      Corrected

      Line 164: upper case for Morphobank.

      Corrected

      Line 183: spelling of "Village" (not "Vallige").

      Corrected

      Line 197: I suggest using "articles" rather than "podomeres" for the antenna (as you did in line 232).

      Changed thank you

      Line 269: "gnathobasal spine (rather than "spin").

      Changed thank you

      Line 272: "Exopods" is used here but elsewhere "exopodites" is used.

      Exopodites is now used throughout

      Line 359: "can been seen" is awkward and, as evolutionary patterns are inferred rather than "seen", could be reworded as "... loss of the eye slit has been inferred...".

      Reworded as suggested

      Line 422 and 423: As two referees asked in the first round of review, delete "iconic" and "symbolic".

      Deleted as suggested

      Line 467: "librigena-like".

      Corrected

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      I appreciate the revisions made by the author which address all of my concerns.

      Nevertheless, I have some new questions when I read the paper again. These questions are not necessarily criticisms of the paper, which may reflect the gap in my understanding. Meanwhile, it also reflects the writing might be improved further.

      - Fig. 1:

      I understand that a critical assumption for generating the required result is that the oblique orientation has lower "energy" than the cardinal orientation (Fig. 1G). Meanwhile, I always have a concept that typically the energy is defined as the negative of log probability. If we take the log probability plotted in Fig. 1A, that will generate an energy landscape that is upside down compared with current Fig. 1G. How should I understand this discrepancy?

      As the reviewer pointed out, a higher prior distribution near cardinal orientations causes cardinal attraction in typical Bayesian models, which can correspond to lower energy around these orientations. Additionally, in the context of learning natural statistics, Hebbian plasticity in excitatory connections strengthens recurrent connections and drives attraction toward more prevalent stimuli within neural circuits.

      However, as demonstrated by Wei and Stocker (2015), Bayesian inference model can also produce cardinal repulsion when optimizing encoding efficiency. In our network, this efficient encoding is achieved through heterogeneous lateral connections and inhibitory Hebbian plasticity in the sensory module, resulting in lower energy near oblique orientations. Thus, the shape of prior distribution does not have a direct one-to-one correspondence with the bias pattern or the dynamic energy landscape. 

      - Fig. 3 and its corresponding text.

      I understand and agree the Fig. 3B&C that neurons near cardinal orientations are shaper and denser. But why the stimulus representation around cardinal orientations are sparser compared with the oblique orientation? Isn't more neurons around cardinal orientation implying a less sparser representation?

      Indeed, with sharper tuning curves, having more neurons can result in a sparser representation. Consider an extreme case where each orientation, discretized by 1°, is represented by only one active neuron with a tuning width of 1°. While this would require more neurons to represent overall stimuli compared to cases with wider tuning curves, each stimulus would be represented by fewer neurons, aligning with the traditional concept of sparse coding.

      However, in Fig. 3 and corresponding text, we did not measure the sparseness of active neurons for each orientation. Instead, we used the term ‘sparser representation’ to describe the increased distance between representations of different stimuli near the cardinal orientations. Although this increased distance can be consistent with the traditional concept of sparse coding, to avoid any confusion, we have revised the term ‘sparser representation’ to ‘more dispersed representation’ in the 3rd paragraph in pg. 5 and the 3rd paragraph in pg. 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper reports a number of somewhat disparate findings on a set of colorectal tumour and infiltrating T-cells. The main finding is a combined machine-learning tool which combines two previous state-of-the-art tools, MHC prediction, and T-cell binding prediction to predict immunogenicity. This is then applied to a small set of neoantigens and there is a small-scale validation of the prediciton at the end.

      Strengths:

      The prediction of immunogenic neoepitopes is an important and unresolved question.

      Weaknesses:

      The paper contains a lot of extraneous material not relevant to the main claim. Conversely, it lacks important detail on the major claim.

      (1) The analysis of T cell repertoire in Figure 2 seems irrelevant to the rest of the paper. As far as I could ascertain, this data is not used further.

      We appreciate the reviewer for their valuable feedback. We concur with the reviewer's observation that the analysis of the TCR repertoire in Figure 2 should be moved to the supplementary section. We have moved Figures 2B to 2F to Supplementary Figure 2.

      However, the analysis of TCR profiles is still presented in Figure 2, as it plays a pivotal role in the process of neoantigen selection. This is because the TCR profiles of eight (out of 28) patients were used for neoantigen prediction. We have added the following sentences to the results section to explain the importance of TCR profiling: “Furthermore, characterizing T cell receptors (TCRs) can complement efforts to predict immunogenicity.” (Results, Lines 311-312, Page 11)

      (2) The key claim of the paper rests on the performance of the ML algorithm combining NETMHC and pmtNET. In turn, this depends on the selection of peptides for training. I am unclear about how the negative peptides were selected. Are they peptides from the same databases as immunogenic petpides but randomised for MHC? It seems as though there will be a lot of overlap between the peptides used for testing the combined algorithm, and the peptides used for training MHCNet and pmtMHC. If this is so, and depending on the choice of negative peptides, it is surely expected that the tools perform better on immunogenic than on non-immunogenic peptides in Figure 3. I don't fully understand panel G, but there seems very little difference between the TCR ranking and the combined. Why does including the TCR ranking have such a deleterious effect on sensitivity?

      We thank the reviewer for their valuable feedback. We believe the reviewer implies 'MHCNet' as NetMHCpan and 'pmtMHC' as pMTnet tools. First, the negative peptides, which have been excluded from PRIME (1), were not randomized with MHC (HLA-I) but were randomized with TCR only. Secondly, the positive peptides selected for our combined algorithms are chosen from many databases such as 10X Genomics, McPAS, VDJdb, IEDB, and TBAdb, while MHCNet uses peptides from the IEDB database and pMTNet uses a totally different dataset from ours for training. Therefore, there is not much overlap between our training data and the training datasets for MHCNet and pMTNet. Thus, the better performance of our tool is not due to overlapping training datasets with these tools or the selection of negative peptides.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8).

      Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively. The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      (3) The key validation of the model is Figure 5. In 4 patients, the authors report that 6 out 21 neo-antigen peptides give interferon responses > 2 fold above background. Using NETMHC alone (I presume the tool was used to rank peptides according to binding to the respective HLAs in each individual, but this is not clear), identified 2; using the combined tool identified 4. I don't think this is significant by any measure. I don't understand the score shown in panel E but I don't think it alters the underlying statistic.

      Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5)

      In conclusion, the paper demonstrates that combining MHCNET and pmtMHC results in a modest increase in the ability to discriminate 'immunogenic' from 'non-immunogenic' peptide; however, the strength of this claim is difficult to evaluate without more knowledge about the negative peptides. The experimental validation of this approach in the context of CRC is not convincing.

      Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.

      (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.

      (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.

      (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weaknesses:

      (1) While multiple advanced tools and algorithms are used, the study could benefit from a more detailed explanation of the rationale behind algorithm choice and parameter settings, ensuring reproducibility and transparency.

      We thank the reviewer for their comment. We have revised the explanation regarding the rationale behind algorithm choice and parameter settings as follows: “We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with 'k' set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using 9 folds for training and 1 fold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model's ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model's performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features.” (Method, lines 225-234, page 8).

      (2) While pHLA-TCR binding displayed higher specificity, its lower sensitivity compared to pHLA binding suggests a trade-off between the two measures. Optimizing the balance between sensitivity and specificity could be crucial for the practical application of these predictions in clinical settings.

      We appreciate the reviewer's suggestion. Due to the limited availability of patient blood samples and time constraints for validation, we have chosen to prioritize high specificity and positive predictive value to enhance the selection of neoantigens.

      (3) The experimental validation was performed on a limited number of patients (four), which might affect the generalizability of the findings. Increasing the number of patients for validation could provide a more comprehensive assessment of the model's performance.

      This has been addressed earlier. Here, we restate it as follows: Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      This study presents a new approach of combining two measurements (pHLA binding and pHLA-TCR binding) in order to refine predictions of which patient mutations are likely presented to and recognized by the immune system. Improving such predictions would play an important role in making personalized anti-cancer vaccinations more effective.

      Strengths:

      The study combines data from pre-existing tools pVACseq and pMTNet and applies them to a CRC patient population, which the authors show may improve the chance of identifying immunogenic, cancer-derived neoepitopes. Making the datasets collected publicly available would expand beyond the current datasets that typically describe caucasian patients.

      Weaknesses:

      It is unclear whether the pNetMHCpan and pMTNet tools used by the authors are entirely independent, as they appear to have been trained on overlapping datasets, which may explain their similar scores. The pHLA-TCR score seems to be driving the effects, but this not discussed in detail.

      The HLA percentile from NetMHCpan and the TCR ranking from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides.Additionally, we partitioned the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%), ensuring no overlap between the training and testing datasets.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8). We also included the dataset construction workflow in Supplementary Figure 1.

      Due to sample constraints, the authors were only able to do a limited amount of experimental validation to support their model; this raises questions as to how generalizable the presented results are. It would be desirable to use statistical thresholds to justify cutoffs in ELISPOT data.

      We chose a cutoff of 2 for ELISPOT, following the recommendation of the study by Moodie et al. (2). The study provides standardized cutoffs for defining positive responses in ELISPOT assays. It presents revised criteria based on a comprehensive analysis of data from multiple studies, aiming to improve the precision and consistency of immune response measurements across various applications.

      Some of the TCR repertoire metrics presented in Figure 2 are incorrectly described as independent variables and do not meaningfully contribute to the paper. The TCR repertoires may have benefitted from deeper sequencing coverage, as many TCRs appear to be supported only by a single read.

      We appreciate the reviewer’s feedback. We have moved Figures 2B through 2F to Supplementary Figure 2. We agree with the reviewer that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. The TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite the variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Please open source the raw and processed data, code, and software output (NetMHCpan, pMTnet), which are important to verify the results.

      NetMHCpan and pMTNet are publicly available software tools (3, 4). In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model).

      (2) Comparison with more state-of-the-art neoantigen prediction models could provide a more comprehensive view of the combined model's performance relative to the current field.

      To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (5), NeoPeptide (6), dbPepneo (7), Tantigen (8), and TSNAdb (9), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10X Genomics Chromium Single Cell Immune Profiling (10-13).The findings indicate that the combined model from pMTNet and NetMHCpan outperforms NetTCR tool (14). To address the reviewer's inquiry, we have incorporated these results in Supplementary Table 6.

      (3) While the combined model shows a positive overall rank coverage score, indicating improved ranking accuracy, the scores are relatively low. Further refinement of the model or the inclusion of additional predictive features might enhance the ranking accuracy.

      We appreciate the reviewer’s suggestion. The RankCoverageScore provides an objective evaluation of the rank results derived from the final peptide list generated by the two tools. The combined model achieved a higher RankCoverageScore than pMTNet, indicating its superior ability to identify immunogenic peptides compared to existing in silico tools. In order to provide a more comprehensive assessment, we included an additional four validated samples to recalculate the rank coverage score. The results demonstrate a notable difference between NetMHCpan and the Combined model (-0.37 and 0.04, respectively). We have incorporated these findings into Supplementary Figure 6 to address the reviewer's question. Additionally, we have modified Figure 5E to present a simplified demonstration of the superior performance of the combined model compared to NetMHCpan.

      (4) Collect more public data and fine-tune the model. Then you will get a SOTA model for neoantigen selection. I strongly recommend you write Python scripts and open source.

      We thank the reviewer for their feedback. We have made the raw and processed data, as well as the model, available on GitHub. Additionally, we have gathered more public data and conducted evaluations to assess its efficiency compared to other methods. You can find the repository here: https://github.com/QuynhPham1220/Combined-model.

      Reviewer #3 (Recommendations For The Authors):

      The Methods section seems good, though HLA calling is more accurate using arcasHLA than OptiType. This would be difficult to correct as OptiType is integrated into pVACtools.

      We chose Optitype for its exceptional accuracy, surpassing 99%, in identifying HLA-I alleles from RNA-Seq data. This decision was informed by a recent extensive benchmarking study that evaluated its performance against "gold-standard" HLA genotyping data, as described in the study by Li et al.(15). Furthermore, we have tested two tools using the same RNA-Seq data from FFPE samples. The allele calling accuracy of Optitype was found to be superior to that of Acras-HLA. To address the reviewer's question, we have included these results in Supplementary Table 2, along with the reference to this decision (Method, line 200, page 07).

      I am not sufficiently expert in machine learning to assess this part of the methods.<br /> TCR beta repertoire analysis of biopsy is highly variable; though my expertise lies largely in sequencing using the 10X genomics platform, typically one sees multiple RNAs per cell. Seeing the majority of TCRs supported by only a single read suggests either problems with RNA capture (particularly in this case where the recovered RNA was split to allow both RNAseq and targeted TCR seq) or that the TCR library was not sequenced deeply enough. I'd like to have seen rarefaction plots of TCR repertoire diversity vs the number of reads to ensure that sufficiently deep sequencing was performed.

      We appreciate the suggestions provided by the reviewer. We agree that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. In addition, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them. We have already added the rarefaction plots of TCR repertoire diversity versus the number of reads in Figure 2C. These have been added to the main text (lines 329-335).

      In order to support the authors' conclusions that MSI-H tumors have fewer TCR clonotypes than MSS tumors (Figure S2a) I would have liked to see Figure 2a annotated so that it was easy to distinguish which patient was in which group, as well as the rarefaction plots suggested above, to be sure that the difference represented a real difference between samples and not technical variance (which might occur due to only 4 samples being in the MSI-H group).

      We thank the reviewer for their recommendation. Indeed, it's worth noting that the number of MSI-H tumors is fewer than the MSS groups, which is consistent with the distribution observed in colorectal cancer, typically around 15%. This distribution pattern aligns with findings from several previous studies, as highlighted in these studies (16, 17). To provide further clarification on this point, we have included rarefaction plots illustrating TCR repertoire diversity versus the number of reads in Supplementary Figure 3 (line 339). Additionally, MSI-H and MSS samples have been appropriately labeled for clarity.

      The authors write: "in accordance with prior investigations, we identified an inverse relationship between TCR clonality and the Shannon index (Supplementary Figure S1)" >> Shannon index is measure of TCR clonality, not an independent variable. The authors may have meant TCR repertoire richness (the absolute number of TCRs), and the Shannon index (a measure of how many unique TCRs are present in the index).

      We thank the reviewer for their comment regarding the correlation between the number of TCRs and the Shannon index. We have revised the figure to illustrate the relationship between the number of TCRs and the Shannon index, and we have relocated it to Figure 2B.

      The authors continue: "As anticipated, we identified only 58 distinct V (Figure 2C) and 13 distinct J segments (Figure 2D), that collectively generated 184,396 clones across the 27 tumor tissue samples, underscoring the conservation of these segments (Figure 2C & D)" >> it is not clear to me what point the authors are making: it is well known that TCR V and J genes are largely shared between Caucasian populations (https://pubmed.ncbi.nlm.nih.gov/10810226/), and though IMGT lists additional forms of these genes, many are quite rare and are typically not included in the reference sequences used by repertoire analysis software. I would clarify the language in this section to avoid the impression that patient repertoires are only using a restricted set of J genes.

      We thank for the reviewer’s feedback. We have revised the sentence as follows: " As anticipated, we identified 59 distinct V segments (Supplementary Figure 2C) and 13 distinct J segments (Supplementary Figure 2D), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Supplementary Figure 2C & D)” (Result, lines 354-356, page 12)

      As a result I would suggest moving Figure 2 with the exception of 2A into the supplementals - I would have been more interested in a plot showing the distribution of TCRs by frequency, i.e. how what proportion of clones are hyperexpanded, moderately expanded etc. This would be a better measure of the likely immune responses.

      We thank the reviewer for their comment. With the exception of Figure 2A, we have relocated Figures 2B through 2F to Supplementary Figure 2.

      The authors write "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic peptides (Supplementary Table 3)" >> The authors mean to refer to Table S4.

      We appreciate the reviewer's feedback. Here's the revised sentence: "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary Table 5)” (lines 368-370).

      The authors write "As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank < 2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p< 0.00001)" >> this is not surprising, as tools such as NetMHCpan are trained on databases of immunogenic peptides, and thus it is likely that these aren't independent measures (in https://academic.oup.com/nar/article/48/W1/W449/5837056 the authors state that "The training data have been vastly extended by accumulating MHC BA and EL data from the public domain. In particular, EL data were extended to include MA data"). In the pMTNet paper it is stated that pMNet encoded pMHC information using "the exact data that were used to train the netMHCpan model" >> While I am not sufficiently expert to review details on machine learning training models, it would seem that the pHLA scores from NetMHCpan and pMTNet may not be independent, which would explain the concordance in scores that the authors describe in Figures 3B and 3D. I would invite the authors to comment on this.

      The HLA percentiles from NetMHCpan and TCR rankings from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides. NetMHCpan is trained to predict peptide-MHC class I interactions by integrating binding affinity and MS eluted ligand data, using a second output neuron in the NNAlign approach. This setup produces scores for both binding affinity and ligand elution. In contrast, pMTNet predicts TCR binding specificity of class I pMHCs through three steps:

      (1) Training a numeric embedding of pMHCs (class I only) to numerically represent protein sequences of antigens and MHCs.

      (2) Training an embedding of TCR sequences using stacked auto-encoders to numerically encode TCR sequence text strings.

      (3) Creating a deep neural network combining these two embeddings to integrate knowledge from TCRs, antigenic peptide sequences, and MHC alleles. Fine-tuning is employed to finalize the prediction model for TCR-pMHC pairing.

      Therefore, pHLA scores from NetMHCpan and pMTNet are independent. Furthermore, Figures 3B and 3D do not show concordance in scores, as there was no equivalence in the percentage of immunogenic and non-immunogenic peptides in the two groups (≥2 HLA percentile and ≥2 TCR percentile).

      Many of the authors of this paper were also authors of the epiTCR paper, would this not have been a better choice of tool for assessing pHLA-TCR binding than pMTNet?

      When we started this project, EpiTCR had not been completed. Therefore, we chose pMTNet, which had demonstrated good performance and high accuracy at that time. The validated performance of EpiTCR is an ongoing project that will implement immunogenic assays (ELISpot and single-cell sequencing) to assess the prediction and ranking of neoantigens. This study is also mentioned in the discussion: "Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation." (lines 532-535).

      In Figure 3G it would appear that the pHLA-TCR score is driving the interaction, could the authors comment on this?

      The authors sincerely appreciate the reviewer for their valuable feedback. Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively.

      The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      In Figure 4A I would invite the authors to comment on how they chose the sample sizes they did for the discovery and validation datasets: the numbers seem rather random. I would question whether a training dataset in which 20% of the peptides are immunogenic accurately represents the case in patients, where I believe immunogenic peptides are less frequent (as in Figure 5).

      We aimed to maximize the number of experimentally validated immunogenic peptides, including those from viruses, with only a small percentage from tumors available for training. This limitation is inherent in the field. However, our ultimate objective is to develop a tool capable of accurately predicting peptide immunogenicity irrespective of their source. Therefore, the current percentage of immunogenic peptides may not accurately reflect real-world patient cases, but this is not crucial to our development goals.

      For Figure 5C I would invite the authors to consider adding a statistical test to justify the cutoff at 2fold enrichments.

      Thank you for your feedback. Instead of conducting a statistical test, we have implemented standardized cutoffs as defined in the cited study (2). This research introduces refined criteria for identifying positive responses in ELISPOT assays through a comprehensive analysis of data from multiple studies. These criteria aim to improve the accuracy and consistency of immune response measurements across various applications. The reference to this study has been properly incorporated into the manuscript (Method, line 281, page 10).

      Minor points:

      "paired white blood cells" >> use "paired Peripheral Blood Mononuclear Cells".

      We appreciate the reviewer for the feedback. We agree with the reviewer's observation. The sentence has been revised as follows: "Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient's HLA-I allele profile and the gene expression levels of mutated genes." (Introduction, lines 55-58, page 2).

      "while RNA sequencing determines the patient's HLA-I allele profile and gene expression levels of mutated genes." >> RNA sequencing covers both the mutant and reference form of the gene, allowing assessment of variant allele frequency.

      "the current approach's impact on patient outcomes remains limited due to the scarcity of effective immunogenic neoantigens identified for each patient" >> Some clearer language here would have been preferred as different tumor types have different mutational loads

      We thank the reviewer for their valuable feedback. We agree with the reviewer's observation. The passage has been revised accordingly: “The current approach's impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens.” (Introduction, lines 62-64, page 3).

      References

      (1) J. Schmidt et al., Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Rep Med 2, 100194 (2021).

      (2) Z. Moodie et al., Response definition criteria for ELISPOT assays revisited. Cancer Immunol Immunother 59, 1489-1501 (2010).

      (3) V. Jurtz et al., NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360-3368 (2017).

      (4) T. Lu et al., Deep learning-based prediction of the T cell receptor-antigen binding specificity. Nat Mach Intell 3, 864-875 (2021).

      (5) J. Xia et al., NEPdb: A Database of T-Cell Experimentally-Validated Neoantigens and Pan-Cancer Predicted Neoepitopes for Cancer Immunotherapy. Front Immunol 12, 644637 (2021).

      (6) W. J. Zhou et al., NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. Database (Oxford) 2019 (2019).

      (7) X. Tan et al., dbPepNeo: a manually curated database for human tumor neoantigen peptides. Database (Oxford) 2020 (2020).

      (8) G. Zhang, L. Chitkushev, L. R. Olsen, D. B. Keskin, V. Brusic, TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinformatics 22, 40 (2021).

      (9) J. Wu et al., TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis. Genomics Proteomics Bioinformatics 16, 276-282 (2018).

      (10) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2.

      (11) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2.

      (12) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2.

      (13) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2.

      (14) A. Montemurro et al., NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRalpha and beta sequence data. Commun Biol 4, 1060 (2021).

      (15) G. Li et al., Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 16, eade2886 (2024).

      (16) Z. Gatalica, S. Vranic, J. Xiu, J. Swensen, S. Reddy, High microsatellite instability (MSI-H) colorectal carcinoma: a brief review of predictive biomarkers in the era of personalized medicine. Fam Cancer 15, 405-412 (2016).

      (17) N. Mulet-Margalef et al., Challenges and Therapeutic Opportunities in the dMMR/MSI-H Colorectal Cancer Landscape. Cancers (Basel) 15 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes glycoprotein degradation.

      Weaknesses:

      Less clear, though, is the involvement of UGGT2 in the process. Also, to this reviewer, some data do not necessarily support the conclusion.

      Major criticisms:

      (1) One of the biggest problems I had on reading through this manuscript is that, while the authors appeared to generate UGGTs-KO cells from HCT116 and HeLa cells, it was not clearly indicated which cell line was used for each experiment. I assume that it was HCT116 cells in most cases, but I did not see that it was clearly mentioned. As the expression level of UGGT2 relative to UGGT1 is quite different between the two cell lines, it would be critical to know which cells were used for each experiment.

      Thank you for this comment. We have clarified this point, especially in the figure legends.

      (2) While most of the authors' conclusion is sound, some claims, to this reviewer, were not fully supported by the data. Especially I cannot help being puzzled by the authors' claim about the involvement of UGGT2 in the ERAD process. In most of the cases, KO of UGGT2 does not seem to affect the stability of ERAD substrates (ex. Fig. 1C, 2A, 3D). When the author suggests that UGGT2 is also involved in the ERAD, it is far from convincing (ex. Fig. 2D/E). Especially because now it has been suggested that the main role of UGGT2 may be distinct from UGGT1, playing a role in lipid quality control (Hung, et al., PNAS 2022), it is imperative to provide convincing evidence if the authors want to claim the involvement of UGGT2 in a protein quality control system. In fact, it was not clear at all whether even UGGT1 is also involved in the process in Fig. 2D/E, as the difference, if any, is so subtle. How the authors can be sure that this is significant enough? While the authors claim that the difference is statistically significant (n=3), this may end up with experimental artifacts. To say the least, I would urge the authors to try rescue experiments with UGGT1 or 2, to clarify that the defect in UGGT-DKO cells can be reversed. It may also be interesting to see that the subtle difference the authors observed is indeed N-glycan-dependent by testing a non-glycosylated version of the protein (just like NHK-QQQ mutants in Fig. 2C).

      We appreciate this comment. According to this comment, we reevaluated the importance of UGGT2 for ER-protein quality control. As this reviewer mentioned, KO of UGGT2 does not affect the stability of ATF6a, NHK, rRI332-Flag or EMC1-△PQQ-Flag (Fig. 1E, 2A, and 3DE). Furthermore, we tested whether overexpression of UGGT2 reverses the phenotype of UGGT-DKO regarding the degradation rate of NHK, and we found that it did not affect the degradation rate of NHK, whereas overexpression of UGGT1 restored the degradation rate to that in WT cells.

      Author response image 1.

      Collectively, these facts suggest that the role of UGGT2 in ER protein quality control is rather limited in HCT116 cells. Therefore, we have decided not to mention UGGT2 in the title, and weakened the overall claim that UGGT2 contributes to ER protein quality control. Tissues with high expression of UGGT2 or cultured cells other than HCT116 would be appropriate for revealing the detailed function of UGGT2.

      To this reviewer, it is still possible that the involvement of UGGT1 (or 2, if any) could be totally substrate-dependent, and the substrates used in Fig 2D or E happen not to be dependent to the action of UGGTs. To the reviewer, without the data of Fig. 2D and E the authors provide enough evidence to demonstrate the involvement of UGGT1 in preventing premature degradation of glycoprotein ERAD substrates. I am just afraid that the authors may have overinterpreted the data, as if the UGGTs are involved in stabilization of all glycoproteins destined for ERAD.

      Based on the point this reviewer mentioned, we decided to delete previous Fig. 2D and 2E. There may be more or less efficacy of UGGT1 for preventing early degradation of substrates.

      (3) I am a bit puzzled by the DNJ treatment experiments. First, I do not see the detailed conditions of the DNJ treatment (concentration? Time?). Then, I was a bit surprised to see that there were so little G3M9 glycans formed, and there was about the same amount of G2M9 also formed (Figure 1 Figure supplement 4B-D), despite the fact that glucose trimming of newly syntheized glycoproteins are expected to be completely impaired (unless the authors used DNJ concentration which does not completely impair the trimming of the first Glc). Even considering the involvement of Golgi endo-alpha-mannosidase, a similar amount of G3M9 and G2M9 may suggest that the experimental conditions used for this experiment (i.e. concentration of DNJ, duration of treatment, etc) is not properly optimized.

      We think that our experimental condition of DNJ treatment is appropriate to evaluate the effect of DNJ. Referring to the other papers (Ali and Field, 2000; Karlsson et al., 1993; Lomako et al., 2010; Pearse et al., 2010; Tannous et al., 2015), 0.5 mM DNJ is appropriate. In our previously reported experiment, 16 h treatment with kifunensine mannosidase inhibitor was sufficient for N-glycan composition analysis prior to cell collection (Ninagawa et al., 2014), and we treated cells for a similar time in Figure 1-Figure Supplement 4 and 5 (and Figure 1-Figure Supplement 6). We could see the clear effect of DNJ to inhibit degradation of ATF6a with 2 hours of pretreatment (Fig. 1G). Furthermore, our results are very reasonable and consistent with previous findings that DNJ increased GM9 the most (Cheatham et al., 2023; Gross et al., 1983; Gross et al., 1986; Romero et al., 1985). In addition to DNJ, we used CST for further experiments in new figures (Fig. 1H and Figure 1-Figure supplement 6). DNJ and CST are inhibitors of glucosidase; DNJ is a stronger inhibitor of glucosidase II, while CST is a stronger inhibitor of glucosidase I (Asano, 2000; Saunier et al., 1982; Szumilo et al., 1987; Zeng et al., 1997). An increase in G3M9 and G2M9 was detected using CST (Figure1-Figure Supplement 6). Like DNJ, CST also inhibited ATF6a degradation in UGGT-DKO cells (Fig. 1H). These findings show that our experimental condition using glucosidase inhibitor is appropriate and strongly support our model (Fig. 5). Differences between the effects of DNJ and CST are now described in our manuscript pages 8 to 10.

      Reviewer #2 (Public Review):

      In this study, Ninagawa et al., shed light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO cells, they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      While this study convincingly demonstrates early degradation of misfolded glycoproteins in the absence of UGGTs, my major concern is the need for additional experiments to support the "tug of war" model involving UGGTs and EDEMs in influencing the substrate's fate - whether misfolded glycoproteins are pulled into the folding or degradation route. Specifically, it would be valuable to investigate how overexpression of UGGTs and EDEMs in WT cells affects the choice between folding and degradation for misfolded glycoproteins. Considering previous studies indicating that monoglucosylation influences glycoprotein solubility and stability, an essential question is: what is the nature of glycoproteins in UGGTKO/EDEMKO and potentially UGGT/EDEM overexpression cells? Understanding whether these substrates become more soluble/stable when GM9 versus mannose-only translation modification accumulates would provide valuable insights.

      In the new figure 2DE, we conducted overexpression experiments of structure formation factors UGGT1 and/or CNX, and degradation factors EDEMs. While overexpression of structure formation factors (Fig. 2DE) and KO of degradation factors (Ninagawa et al., 2015; Ninagawa et al., 2014) increased stability of substrates, KO of UGGT1 (Fig. 1E, 2A and 3DF) and overexpression of degradation factors (Fig. 2DE) (Hirao et al., 2006; Hosokawa et al., 2001; Mast et al., 2005; Olivari et al., 2005) accelerated degradation of substrates. A comparison of the properties of N-glycan with the normal type and the type without glucoses was already reported (Tannous et al., 2015). The rate of degradation of substrate was unchanged, but efficiency of secretion of substrates was affected.

      The study delves into the physiological role of UGGT, but is limited in scope, focusing solely on the effect of ATF6alpha in UGGT KO cells' stress response. It is crucial for the authors to investigate the broader impact of UGGT KO, including the assessment of basal ER proteotoxicity levels, examination of the general efflux of glycoproteins from ER, and the exploration of the physiological consequences due to UGGT KO. This broader perspective would be valuable for the wider audience. Additionally, the marked increase in ATF4 activity in UGGTKO requires discussion, which the authors currently omit.

      We evaluated the sensitivity of WT and UGGT1-KO cells to ER stress (Figure 4G). KO of UGGT1 increased the sensitivity to ER stress inducer Tg, indicating the importance of UGGT1 for resisting ER stress.

      We add the following description in the manuscript about ATF4 activity in UGGT1-KO: “In addition to this, UGGT1 is necessary for proper functioning of ER resident proteins such as ATF6a (Fig. 4B-F). It is highly possible that ATF6a undergoes structural maintenance by UGGT1, which could be necessary to avoid degradation and maintain proper function, because ATF6a with more rigid in structure tended to remain in UGGT1-KO cells (Fig. 4C). Responses of ERSE and UPRE to ER stress, which require ATF6a, were decreased in UGGT1-KO cells (Fig. 4DE). In contrast, ATF4 reporter activity was increased in UGGT1-KO cells (Fig. 4F), while the basal level of ATF4 in UGGT1-KO cells was comparable with that in WT (Figure 1-Figure supplement 2B). The ATF4 pathway might partially compensate the function of the ERSE and UPRE pathways in UGGT1-KO cells in acute ER stress. This is now described on Page 17 in our manuscript.

      The discussion section is brief and could benefit from being a separate section. It is advisable for the authors to explore and suggest other model systems or disease contexts to test UGGT's role in the future. This expansion would help the broader scientific community appreciate the potential applications and implications of this work beyond its current scope.

      Thank you for making this point. The DISCUSSION part has now been separated in our manuscript. We added some points in the manuscript about other model organisms and diseases in the DISCUSSION as follows: “ Our work focusing on the function of mammalian UGGT1 greatly advances the understanding how ER homeostasis is maintained in higher animals. Considering that Saccharomyces cerevisiae does not have a functional orthologue of UGGT1 (Ninagawa et al., 2020a) and that KO of UGGT1 causes embryonic lethality in mice (Molinari et al., 2005), it would be interesting to know at what point the function of UGGT1 became evolutionarily necessary for life. Related to its importance in animals, it would also be of interest to know what kind of diseases UGGT1 is associated with. Recently, it has been reported that UGGT1 is involved in ER retention of Trop-2 mutant proteins, which are encoded by a causative gene of gelatinous drop-like corneal dystrophy (Tax et al., 2024). Not only this, but since the ER is known to be involved in over 60 diseases (Guerriero and Brodsky, 2012), we must investigate how UGGT1 and other ER molecules are involved in diseases.”

      Reviewer #3 (Public Review):

      This manuscript focuses on defining the importance of UGGT1/2 in the process of protein degradation within the ER. The authors prepared cells lacking UGGT1, UGGT2, or both UGGT1/UGGT2 (DKO) HCT116 cells and then monitored the degradation of specific ERAD substrates. Initially, they focused on the ER stress sensor ATF6 and showed that loss of UGGT1 increased the degradation of this protein. This degradation was stabilized by deletion of ERAD-specific factors (e.g., SEL1L, EDEM) or treatment with mannose inhibitors such as kifunesine, indicating that this is mediated through a process involving increased mannose trimming of the ATF6 N-glycan. This increased degradation of ATF6 impaired the function of this ER stress sensor, as expected, reducing the activation of downstream reporters of ER stress-induced ATF6 activation. The authors extended this analysis to monitor the degradation of other well-established ERAD substrates including A1AT-NHK and CD3d, demonstrating similar increases in the degradation of destabilized, misfolding protein substrates in cells deficient in UGGT. Importantly, they did experiments to suggest that re-overexpression of wild-type, but not catalytically deficient, UGGT rescues the increased degradation observed in UGGT1 knockout cells. Further, they demonstrated the dependence of this sensitivity to UGGT depletion on N-glycans using ERAD substrates that lack any glycans. Ultimately, these results suggest a model whereby depletion of UGGT (especially UGGT1 which is the most expressed in these cells) increases degradation of ERAD substrates through a mechanism involving impaired re-glucosylation and subsequent re-entry into the calnexin/calreticulin folding pathway.

      I must say that I was under the impression that the main conclusions of this paper (i.e., UGGT1 functions to slow the degradation of ERAD substrates by allowing re-entry into the lectin folding pathway) were well-established in the literature. However, I was not able to find papers explicitly demonstrating this point. Because of this, I do think that this manuscript is valuable, as it supports a previously assumed assertion of the role of UGGT in ER quality control. However, there are a number of issues in the manuscript that should be addressed.

      Notably, the focus on well-established, trafficking-deficient ERAD substrates, while a traditional approach to studying these types of processes, limits our understanding of global ER quality control of proteins that are trafficked to downstream secretory environments where proteins can be degraded through multiple mechanisms. For example, in Figure 1-Figure Supplement 2, UGGT1/2 knockout does not seem to increase the degradation of secretion-competent proteins such as A1AT or EPO, instead appearing to stabilize these proteins against degradation. They do show reductions in secretion, but it isn't clear exactly how UGGT loss is impacting ER Quality Control of these more relevant types of ER-targeted secretory proteins.

      We appreciate your comment. It is certainly difficult to assess in detail how UGGT1 functions against secretion-competent proteins, but we think that the folding state of these proteins is improved, which avoids their degradation and increases their secretion. In Figure 1-Figure supplement 2E, there is a clear decrease in secretion of EPO in UGGT1-KO cells, suggesting that UGGT1 also inhibits degradation of such substrates. Note that, as shown in Fig. 3A-C, once a protein forms a solid structure, it is rarely degraded in the ER.

      Lastly, I don't understand the link between UGGT, ATF6 degradation, and ATF6 activation. I understand that the idea is that increased ATF6 degradation afforded by UGGT depletion will impair activation of this ER stress sensor, but if that is the case, how does UGGT2 depletion, which only minimally impacts ATF6 degradation (Fig. 1), impact activation to levels similar to the UGGT1 knockout (Fig 4)? This suggests UGGT1/2 may serve different functions beyond just regulating the degradation of this ER stress sensor. Also, the authors should quantify the impaired ATF6 processing shown in Fig 4B-D across multiple replicates.

      According to this valuable comment, we reevaluated our manuscript. As this reviewer mentioned, involvement of UGGT2 in the activation of ATF6a cannot be explained only by the folding state of ATF6a. Thus, the part about whether UGGT2 is effective in activating ATF6 is outside the scope of this paper. The main focus of this paper is the contribution of UGGT1 to the ER protein quality control mechanism.

      Ultimately, I do think the data support a role for UGGT (especially UGGT1) in regulating the degradation of ERAD substrates, which provides experimental support for a role long-predicted in the field. However, there are a number of ways this manuscript could be strengthened to further support this role, some of which can be done with data they have in hand (e.g., the stats) or additional new experiments.

      In this revision period, to further elucidate the function of UGGT, we did several additional experiments (new figures Fig. 1H, 2DE, 4G and, Figure 1-Figure Supplement 6). We hope that these will bring our papers up to the level you have requested.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Abbreviations: GlcNAc, N-acetylglucosamines -> why plural?

      Corrected.

      (2) Abstract: to this reviewer, it may not be so common to cite references in the abstract.

      We submit this manuscript to eLife as “Research Advances”. In the instructions of eLife for “Research Advances”, there is the description: “A reference to the original eLife article should be included in the abstract, e.g. in the format “Previously we showed that XXXX (author, year). Here we show that YYYY.” We follow this.

      (3) Introduction: "as the site of biosynthesis of approximately one-third of all proteins." Probably this statement needs a citation?

      We added the reference there. You can also confirm this in “The Human Protein Atlas” website. https://www.proteinatlas.org/humanproteome/tissue/secretome

      (4) Figure 1F - the authors claimed that maturation of HA was delayed also in UGGT2 cells, but it was not at all clear to me. Rescue experiments with UGGT2 would be desired.

      We agree with this reviewer, but there was a statistically significant difference in the 80 min UGGT2-KO strain. Previously, it was reported that HA maturation rate was not affected by UGGT2 (Hung et al., 2022). We think that the difference is not large. A rescue experiment of UGGT2 on the degradation of NHK was conducted, and is shown in this response to referees.

      (5) Figure 4A, here also the authors claim that UGGT2 is "slightly" involved in folding of ATF6alpha(P) but it is far from convincing to this reviewer.

      Now we also think that involvement of UGGT2 in ER protein quality control should be examined in the future.

      (6) Page 11, line 7 from the bottom: "peak of activation was shifted from 1 hour to 4 hours after the treatment of Tg in UGGT-KO cells". I found this statement a bit awkward; how can the authors be sure that "the peak" is 4 hours when the longest timing tested is 4 hours (i.e. peak may be even later)?

      Corrected. We deleted the description.

      (7) Page 11, line 4 "a more rigid structure that averts degradation" Can the authors speculate what this "rigid" structure actually means? The reviewer has to wonder what kind of change can occur to this protein with or without UGGT1. Binding proteins? The difference in susceptibility against trypsin appears very subtle anyway (Figure 4 Figure Supplement 1).

      Let us add our thoughts here: Poorly structured ATF6a is immediately routed for degradation in UGGT1-KO cells. As a result, ATF6a with a stable or rigid structure have remained in the UGGT1-KO strain. ATF6a with a metastable state is tended to be degraded without assistance of UGGT1.

      (8) Figure 1 Figure supplement 2; based on the information provided, I calculate the relative ratio of UGGT2/UGGT1 in HCT116 which is 4.5%, and in HeLa 26%. Am I missing something? Also significant figure, at best, should be 2, not 3 (i.e. 30%, not 29.8%).

      Corrected. Thank you for this comment.

      Reviewer #2 (Recommendations For The Authors):

      (1) The effect in Fig. 2B with UGGT1-D1358A add-back is minimal. Testing the inactive and active add-back on other substrates, such as ATF6alpha, which undergoes a more rapid degradation, would provide a more comprehensive assessment.

      To examine the effect of full length and inactive mutant of UGGT1 in UGGT1-KO and UGGT2-KO on the rate of degradation of endogenous ATF6a, we tried to select more than 300 colonies stably expressing full-length Myc-UGGT1/2, UGGT1/2-Flag, and UGGT1/2 (no tag), and their point mutant of them. However, no cell lines expressing nearly as much or more UGGT1/2 than endogenous ones were obtained. The expression level of UGGT1 seemed to be tightly regulated. A low-expressing stable cell line could not recover the phenotype of ATF6a degradation.

      We also tried to measure the degradation rate of exogenously expressed ATF6a. But overexpressed ATF6a is partially transported to the Golgi and cleaved by proteases, which makes it difficult to evaluate only the effect of degradation.

      (2) In reference to this statement on pg. 11:

      "This can be explained by the rigid structure of ATF6(P) lacking structural flexibility to respond to ER stress because the remaining ATF6(P) in UGGT1-KO cells tends to have a more rigid structure that averts degradation, which is supported by its slightly weaker sensitivity to trypsin (Figure 4-figure supplement 1A). "

      The rationale for testing ATF6(P) rigidity via trypsin digestion needs clarification. The authors should provide more background, especially if it relates to previous studies demonstrating UGGT's influence on substrate solubility. If trypsin digestion is indeed addressing this, it should be applied consistently to all tested misfolded glycoproteins, ensuring a comprehensive approach.

      We now provide more background with three references about trypsin digestion. Trypsin digestion allows us to evaluate the structure of proteins originated from the same gene, but it can sometimes be difficult to comparatively evaluate the structure of proteins originated from different genes. For example, antitrypsin is resistant to trypsin by its nature, which does not necessarily mean that antitrypsin forms a more stable structure than other proteins. NHK, a truncated version of antitrypsin, is still resistant to trypsin compared with other substrates.

      (3) Many of the figures described in the manuscript weren't referred to a specific panel. For example, pg. 12 "Fig. 1E and Fig.5," the exact panel for Fig. 5 wasn't referenced.

      Thank you for this comment. Corrected.

      (4) For experiments measuring the composition of glycoproteins in different KO lines, it is necessary to do the experiment more than once for conducting statistical analysis and comparisons. Moreover, the authors did not include raw composition data for these experiments. Statistical analysis should also be done for Fig. 4E-F.

      Our N-glycan composition data (Figure 1-Figure supplement 5 and 6C) is consistent with previous our papers (George et al., 2021; George et al., 2020; Ninagawa et al., 2015; Ninagawa et al., 2014). We did it twice in the previous study and please refer to it regarding statistical analysis (George et al., 2020). We add the raw composition data of N-glycan (Figure 1-Figure supplement 4 and 6B). In Fig. 4D-F, now statistical analysis is included.

      Ali, B.R., and M.C. Field. 2000. Glycopeptide export from mammalian microsomes is independent of calcium and is distinct from oligosaccharide export. Glycobiology. 10:383-391.

      Asano, N. 2000. Glycosidase-Inhibiting Glycomimetic Alkaloids. Biological Activities and Therapeutic Perspectives. Journal of Synthetic Organic Chemistry, Japan. 58:666-675.

      Cheatham, A.M., N.R. Sharma, and P. Satpute-Krishnan. 2023. Competition for calnexin binding regulates secretion and turnover of misfolded GPI-anchored proteins. J Cell Biol. 222.

      George, G., S. Ninagawa, H. Yagi, J.I. Furukawa, N. Hashii, A. Ishii-Watabe, Y. Deng, K. Matsushita, T. Ishikawa, Y.P. Mamahit, Y. Maki, Y. Kajihara, K. Kato, T. Okada, and K. Mori. 2021. Purified EDEM3 or EDEM1 alone produces determinant oligosaccharide structures from M8B in mammalian glycoprotein ERAD. Elife. 10.

      George, G., S. Ninagawa, H. Yagi, T. Saito, T. Ishikawa, T. Sakuma, T. Yamamoto, K. Imami, Y. Ishihama, K. Kato, T. Okada, and K. Mori. 2020. EDEM2 stably disulfide-bonded to TXNDC11 catalyzes the first mannose trimming step in mammalian glycoprotein ERAD. Elife. 9:e53455.

      Gross, V., T. Andus, T.A. Tran-Thi, R.T. Schwarz, K. Decker, and P.C. Heinrich. 1983. 1-deoxynojirimycin impairs oligosaccharide processing of alpha 1-proteinase inhibitor and inhibits its secretion in primary cultures of rat hepatocytes. Journal of Biological Chemistry. 258:12203-12209.

      Gross, V., T.A. Tran-Thi, R.T. Schwarz, A.D. Elbein, K. Decker, and P.C. Heinrich. 1986. Different effects of the glucosidase inhibitors 1-deoxynojirimycin, N-methyl-1-deoxynojirimycin and castanospermine on the glycosylation of rat alpha 1-proteinase inhibitor and alpha 1-acid glycoprotein. Biochem J. 236:853-860.

      Hirao, K., Y. Natsuka, T. Tamura, I. Wada, D. Morito, S. Natsuka, P. Romero, B. Sleno, L.O. Tremblay, A. Herscovics, K. Nagata, and N. Hosokawa. 2006. EDEM3, a soluble EDEM homolog, enhances glycoprotein endoplasmic reticulum-associated degradation and mannose trimming. J Biol Chem. 281:9650-9658.

      Hosokawa, N., I. Wada, K. Hasegawa, T. Yorihuzi, L.O. Tremblay, A. Herscovics, and K. Nagata. 2001. A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO reports. 2:415-422.

      Hung, H.H., Y. Nagatsuka, T. Solda, V.K. Kodali, K. Iwabuchi, H. Kamiguchi, K. Kano, I. Matsuo, K. Ikeda, R.J. Kaufman, M. Molinari, P. Greimel, and Y. Hirabayashi. 2022. Selective involvement of UGGT variant: UGGT2 in protecting mouse embryonic fibroblasts from saturated lipid-induced ER stress. Proc Natl Acad Sci U S A. 119:e2214957119.

      Karlsson, G.B., T.D. Butters, R.A. Dwek, and F.M. Platt. 1993. Effects of the imino sugar N-butyldeoxynojirimycin on the N-glycosylation of recombinant gp120. Journal of Biological Chemistry. 268:570-576.

      Lomako, J., W.M. Lomako, C.A. Carothers Carraway, and K.L. Carraway. 2010. Regulation of the membrane mucin Muc4 in corneal epithelial cells by proteosomal degradation and TGF-beta. Journal of cellular physiology. 223:209-214.

      Mast, S.W., K. Diekman, K. Karaveg, A. Davis, R.N. Sifers, and K.W. Moremen. 2005. Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins. Glycobiology. 15:421-436.

      Ninagawa, S., T. Okada, Y. Sumitomo, S. Horimoto, T. Sugimoto, T. Ishikawa, S. Takeda, T. Yamamoto, T. Suzuki, Y. Kamiya, K. Kato, and K. Mori. 2015. Forcible destruction of severely misfolded mammalian glycoproteins by the non-glycoprotein ERAD pathway. J Cell Biol. 211:775-784.

      Ninagawa, S., T. Okada, Y. Sumitomo, Y. Kamiya, K. Kato, S. Horimoto, T. Ishikawa, S. Takeda, T. Sakuma, T. Yamamoto, and K. Mori. 2014. EDEM2 initiates mammalian glycoprotein ERAD by catalyzing the first mannose trimming step. J Cell Biol. 206:347-356.

      Olivari, S., C. Galli, H. Alanen, L. Ruddock, and M. Molinari. 2005. A novel stress-induced EDEM variant regulating endoplasmic reticulum-associated glycoprotein degradation. J Biol Chem. 280:2424-2428.

      Pearse, B.R., T. Tamura, J.C. Sunryd, G.A. Grabowski, R.J. Kaufman, and D.N. Hebert. 2010. The role of UDP-Glc:glycoprotein glucosyltransferase 1 in the maturation of an obligate substrate prosaposin. J Cell Biol. 189:829-841.

      Romero, P.A., B. Saunier, and A. Herscovics. 1985. Comparison between 1-deoxynojirimycin and N-methyl-1-deoxynojirimycin as inhibitors of oligosaccharide processing in intestinal epithelial cells. Biochem J. 226:733-740.

      Saunier, B., R.D. Kilker, J.S. Tkacz, A. Quaroni, and A. Herscovics. 1982. Inhibition of N-linked complex oligosaccharide formation by 1-deoxynojirimycin, an inhibitor of processing glucosidases. Journal of Biological Chemistry. 257:14155-14161.

      Szumilo, T., G.P. Kaushal, and A.D. Elbein. 1987. Purification and properties of the glycoprotein processing N-acetylglucosaminyltransferase II from plants. Biochemistry. 26:5498-5505.

      Tannous, A., N. Patel, T. Tamura, and D.N. Hebert. 2015. Reglucosylation by UDP-glucose:glycoprotein glucosyltransferase 1 delays glycoprotein secretion but not degradation. Molecular biology of the cell. 26:390-405.

      Zeng, Y., Y.T. Pan, N. Asano, R.J. Nash, and A.D. Elbein. 1997. Homonojirimycin and N-methyl-homonojirimycin inhibit N-linked oligosaccharide processing. Glycobiology. 7:297-304.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers and editor for their helpful comments and suggestions. In response, we have revised the manuscript in two main ways:

      (1) To address the comments about rearranging figures and tables, we added a new Figure 3 that summarizes neurotransmitter assignments across all neuron classes. Our rationale for this change is detailed below.

      (2) To address the comment on clarifying neurotransmitter synthesis versus uptake, we analyzed two additional reporter alleles that tag the monoamine uptake transporters for 5-HT and potentially tyramine. These results are now presented in a new Figure 8 and corresponding sections in the manuscript. Related tables have been updated to include this expression data. Two more authors have been added due to their contributions to these experiments.

      For more detailed changes, please see our responses to the specific reviewer's comments as well as the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Wang and colleagues conducted a study to determine the neurotransmitter identity of all neurons in C. elegans hermaphrodites and males. They used CRISPR technology to introduce fluorescent gene expression reporters into the genomic loci of NT pathway genes. This approach is expected to better reflect in vivo gene expression compared to other methods like promoter- or fosmid-based transgenes, or available scRNA datasets. The study presents several noteworthy findings, including sexual dimorphisms, patterns of NT co-transmission, neuronal classes that likely use NTs without direct synthesis, and potential identification of unconventional NTs (e.g. betaine releasing neurons). The data is well-described and critically discussed, including a comparison with alternative methods. Although many of the observations and proposals have been previously discussed by the Hobert lab, the current study is particularly valuable due to its comprehensiveness. This NT atlas is the most complete and comprehensive of any nervous system that I am aware of, making it an extremely useful tool for the community. 

      Reviewer #2 (Public Review):

      Summary: 

      Together with the known anatomical connectivity of C. elegans, a neurotransmitter atlas paves the way toward a functional connectivity map. This study refines the expression patterns of key genes for neurotransmission by analyzing the expression patterns from CRISPR-knocked-in GFP reporter strains using the color-coded Neuropal strain to identify neurons. Along with data from previous scRNA sequencing and other reporter strains, examining these expression patterns enhances our understanding of neurotransmitter identity for each neuron in hermaphrodites and the male nervous system. Beyond the known neurotransmitters (GABA, Acetylcholine, Glutamate, dopamine, serotonin, tyramine, octopamine), the atlas also identifies neurons likely using betaine and suggests sets of neurons employing new unknown monoaminergic transmission, or using exclusively peptidergic transmission. 

      Strengths: 

      The use of CRISPR reporter alleles and of the Neuropal strain to assign neurotransmitter usage to each neuron is much more rigorous than previous analysis and reveals intriguing differences between scRNA seq, fosmid reporter, and CRISPR knock-in approaches. Among other mechanisms, these differences between approaches could be attributed to 3'UTR regulatory mechanisms for scRNA vs. knockin or titration of rate-limited negative regulatory mechanisms for fosmid vs. knockin. It would be interesting to discuss this and highlight the occurrences of these potential phenomena for future studies.  

      We recognize that readers of this study may be interested in understanding the differences between the three approaches. Therefore, in the Introduction, we addressed the potential risk of overexpression artifacts associated with multicopy transgenes, such as fosmid-based reporters, which can affect rate-limiting negative regulatory mechanisms. Additionally, in the Discussion, we included a section titled 'Comparing approaches and caveats of expression pattern analysis' to further explore these comparative methods and their associated nuances.

      Weaknesses: 

      For GABAergic transmission, one shortcoming arises from the lack of improved expression pattern by a knockin reporter strain for the GABA recapture symporter snf-11. In its absence, it is difficult to make a final conclusion on GABA recapture vs GABA clearance for all neurons expressing the vesicular GABA transporter neurons (unc-47+) but not expressing the GAD/UNC-25 gene e.g. SIA or R2A neurons. At minima, a comparison of the scRNA seq predictions versus the snf-11 fosmid reporter strain expression pattern would help to better judge the proposed role of each neuron in GABA clearance or recycling. 

      The snf-11 fosmid-based reporter data shows very good overlap with scRNA seq predictions (now included in Supp. Table S1). 

      But there are two much stronger reasons why we did not seek to further the analysis of expression of the snf-11 GABA uptaker:

      (1) Due to available anti-GABA staining data, we do know which neurons have the potential to take up GABA (via SNF-11).

      (2) Focusing on SNF-11 function rather than expression, we can ask which neurons lose anti-GABA staining in snf-11 mutants.

      Both of these types of analyses have been done in an earlier study from our lab (Gendrel et al., 2016, PMID 27740909), which, among other things, investigated GABA uptake mechanisms via SNF-11. Apart from analyzing the expression of a fosmid-based snf-11 reporter, we immunostained worms for GABA in both snf-11 mutant and wild type backgrounds (results summarized in Tables 1 and 2 of Gendrel et al.). Of the neurons that typically stain for GABA (Table 1, Gendrel et al.), two neuron classes (ALA and AVF) lost the staining in snf-11 mutants, suggesting that these neurons likely uptake GABA via SNF-11. Importantly, one of the neurons the reviewer mentioned, R2A, stains for GABA in both wild type and snf-11 mutants, indicating that it likely does not uptake GABA via SNF-11. The other neuron mentioned, SIA, does not stain for GABA in wild type (Table 2, Gendrel et al.), hence not a GABA uptake neuron. In cases like SIA and other neurons, where a neuron does not express unc-25 but does express unc-47 reporters (either fosmid or CRISPR reporter alleles), we speculate that UNC-47 transport another neurotransmitter.

      Considering the complexities of different tagging approaches, like T2A-GFP and SL2-GFP cassettes, in capturing post-translational and 3'UTR regulation is important. The current formulation is simplistic. e.g. after SL2 trans-splicing the GFP RNA lacks the 5' regulatory elements, T2A-GFP self-cleavage has its own issues, and the his-44-GFP reporter protein does certainly have a different post-translational life than vesicular transporters or cytoplasmic enzymes. 

      Yes, agreed, these points are mentioned in the Introduction and discussed in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      Do all splicing variants of neurotransmitter-related genes translate into functional proteins? The possibility that some neurons express a non-functional splice variant, leading to his-74-GFP reporter expression without functional neurotransmitter-related protein production is not addressed. 

      We thank the reviewer for bringing up this really interesting point, which we had not considered. First and foremost, with the exception of unc-25 (discussed in the next point), for all other genes that produce multiple splice forms, we made sure to append our tag (at 5’ or 3’ end) such that the expression of all splice forms is captured. The reviewer raises the interesting point that in an alternative splicing scenario, some of the cells that express the primary transcript may “switch” to an inactive form. While we cannot exclude this possibility, we have confirmed by sequence analysis in WormBase that in five of the six cases where there is alternative splicing, the alternatively spliced exon lies outside the conserved, functionally relevant (enzymatic or structural) domain. In one case, unc-25, a shorter isoform is produced that does cut into the functionally relevant domain; however, since all unc-25 reporter allele expression cells are also staining positive for GABA, this may not be an issue. 

      Also, one tagged splice variant of unc-25 is expected to fail to produce a GFP reporter, can this cause trouble? 

      Yes, there is indeed a third splice variant of unc-25 with an alternative C-terminus. To address potential expression of this isoform, we CRISPR-engineered another reporter, unc-25(ot1536[unc-25b.1::t2a::gfp::h2b]), in which the inserted t2a::gfp::h2b sequences are fused to the C-terminus of the alternative splice form, but we did not observe any expression of this reporter. Now included in the manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, Wang et al. provide the most comprehensive description and comparison of the expression of the different genes required to synthesize, transport, and recycle the most common neurotransmitters (Glutamate, Acetylcholine, GABA, Serotonin, Dopamine, Octopamine, and Tyramine) used by hermaphrodite and male C. elegans. This paper will be a seminal reference in the field. Building and contrasting observations from previous studies using fosmid, multicopy reporters, and single-cell sequencing, they now describe CRISPR/Cas-9-engineered reporter strains that, in combination with the multicolor pan-neuronal labeling of all C. elegans neurons (NeuroPAL), allows rigorous elucidation of neurotransmitter expression patterns. These novel reporters also illuminate previously unappreciated aspects of neurotransmitter biology in C. elegans, including sexual dimorphism of expression patterns, cotransmission, and the elucidation of cell-specific pathways that might represent new forms of neurotransmission. 

      Strengths: 

      The authors set out to establish neurotransmitter identities in C. elegans males and hermaphrodites via varying techniques, including integration of previous studies, examination of expression patterns, and generation of endogenous CRISPR-labeled alleles. Their study is comprehensive, detailed, and rigorous, and achieves the aims. It is an excellent reference for the field, particularly those interested in biosynthetic pathways of neurotransmission and their distribution in vivo, in neuronal and non-neuronal cells. 

      Weaknesses: 

      No weaknesses were noted. The authors do a great job linking their characterizations with other studies and techniques, giving credence to their findings. As the authors note, there are sexually dimorphic differences across animals and varying expression patterns of enzymes. While it is unlikely there will be huge differences in the reported patterns across individual animals, it is possible that these expression patterns could vary developmentally, or based on physiological or environmental conditions. It is unclear from the study how many animals were imaged for each condition, and if the authors noted changes across individuals during development (could be further acknowledged in the discussion?)  

      We have updated the Methods section to specify the number of animals used for imaging. We agree with the reviewer that documenting the developmental dynamics of neurotransmitter expression would be interesting. However, except for one gene (tph-1, Fig. S2), we did not analyze the expression during different developmental stages for most genes in this study. Following the reviewer's suggestion, we have included this as a potential future direction in "Conclusions" at the end of the revised manuscript.

      Recommendations for the authors:

      After the consultation session, a common suggestion from the reviewers is to bring the tables more upfront, perhaps even in the form of legible main Figures and in alphabetical order of neurons; since we believe that the study will be in the long-term often used for these data; while the Figures with fluorescent expression patterns could be moved to the supplemental information. 

      We appreciate the reviewers' and editor's acknowledgment of the tables' possibly frequent usage by the field. We have considered carefully how to order the data presentation. We prefer to keep most of the fluorescent figures in the main text because they convey important subtleties that we want the reader to be aware of.

      To address the suggestions to bring key data more upfront, we have added an entirely new figure (Figure 3) before the ensuing data figures that summarized expression patterns of the fluorescent reporters. This new figure (A) summarizes the neurotransmitter use for all neuron classes and (B) illustrates this information within worm schematics, showing the position of neurons in the whole worm. This figure serves as a good overview of neurotransmitter assignments but also specifically refers to the more extensive data and supplementary tables with detailed notes. We believe this solution effectively balances the need for comprehensive information and ease of reference.

      Reviewer #1 (Recommendations for The Authors):

      Suggestions: 

      (1) The study contains up to 10 Figures with gene expression patterns; however, I believe the community will use this paper mostly in the future for its summarizing tables. I wonder if it would be more useful to edit the tables and move them to the main figures while most fluorescent reporter images could be moved to the supplementary part. 

      Yes, as mentioned above, we made new summary table & schematic upfront. We do prefer to keep primary data in main figure body. Please see above (Public Review & Response).

      (2) In the section titled 'Neurotransmitter Synthesis versus Uptake', the author's wording could be more careful. The data rather suggests functions for individual neuronal classes, such as clearance neurons or signaling neurons. However, these functions remain hypotheses until further detailed studies are conducted to test them. 

      These are fair points. We have made several improvements: 

      (1) In the referenced section, we added a sentence at the end of the paragraph on betaine to suggest the importance of future functional studies.

      (2) We analyzed reporter allele expression for two additional genes: the known uptake transporter for 5-HT (mod-5, reporter allele vlc47) and the predicted uptake transporter for tyramine (oct-1, reporter allele syb8870). The results from these experiments are presented in the new Figure 8 and discussed in Results and Discussion correspondingly. We also collaborated with Curtis Loer, who conducted anti-5-HT staining in wild type and mod-5 mutant animals (results shown in Figure 12). These experiments have enhanced our understanding of 5-HT uptake mechanisms and potential tyramine uptake mechanisms.

      (3) At the end of the Conclusions, we emphasized the need for future detailed studies to test the functions of neurotransmitter synthesis and uptake.

      (3) Page 21; add to the discussion: neurons could use mainly electrical synapses for communication. Especially for RMG neurons, this might be the case (in addition to neuropeptide communication). 

      “Main usage” is a difficult term to use. If there were neurons that are clearly devoid of any form of synaptic vesicle (small or DCV; note that RMG has plenty of DCVs), but show robust and reproducible electrical synapses, we would agree that such neurons could primarily be a “coupling” neuron. But this call is very hard to make for any C. elegans neuron (RMG included) and hence we prefer to not add further to an already quite long Discussion section.

      (4) Page 23: I believe that multi-copy promoter-based transgenes (despite array suppression mechanisms) could be potentially more sensitive than single-copy insertion of fluorescent reporters. In our lab, we observed this a couple of times. This could be discussed. 

      We discuss this in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      We have also added a third possibility (i.e. technical issues related to neuron-ID) in the revised manuscript.   

      Reviewer #2 (Recommendations For The Authors): 

      Comment during consultation session: As for my feedback on the lack of an SNF-11 reporter strain, exercising more caution in their conclusions would suffice for me. Other comments are simple edits/discussion.  

      Please see above.  

      Several neurotransmitter symporters exist in the C. elegans genome, does any express specifically in the "orphan" UNC-47+ neurons? 

      Yes, good point, we considered this possibility, but of the >10 SLC6-family of neurotransmitter reporters, only the classic, de-orphanized ones that we discuss here in the paper show robust scRNA signals (as discussed in the paper) and none of those give clues about the orphan unc-47(+) neurons.

      Based on UNC-47+ expression the article suggests a "Novel inhibitory neurotransmitter". Why would any new neurotransmitter using UNC-47 be necessarily inhibitory? The presence of one potential glycine-gated anion channel and one GPCR in C. elegans genome sounds poor evidence to suggest a sign of glycine or b-alanine transmission. 

      Yes, agreed, it does not need to be inhibitory. Fixed in Results and Discussion. 

      To help readers the expression of the knocked in GFP in neurons should not be reported as binary in table S1 which leads to a feeling of strong discrepancy between scRNA seq and CRISPR GFP, which is not the case.  

      There might be some misunderstanding regarding the coloring in this table. To clarify, the green-filled Excel cells denote the expression of reporters utilized in prior studies, rather than the CRISPR reporter alleles. Expression of the CRISPR alleles is instead indicated on the left side of the neuron names, marked as "CRISPR+" in green font. For signifying absence of expression, we used "no CRISPR" in red font in the first submission. We have now changed it into "CRISPR-" for greater clarity.

      The variable expression of reporter GFP between individuals for the same neuron is intriguing. It is unclear if this is observed only for dim neurons or can be more of an ON/OFF expression. 

      Variability only occurs for dim expression. We have now clarified this point in Discussion, "Comparing approaches and caveats of expression pattern analysis".

      The multiple occurrences of co-transmission, especially in male neurons, are interesting. It will be interesting in the future to establish whether the neurotransmitters are synaptically segregated or coreleased. As the section on sexual dimorphism of neurotransmitter usage does not discuss novel information coming from this study, it is not very necessary. 

      Agreed. We added this perspective to the Discussion, "Co-transmission of multiple neurotransmitters".  

      In the abstract, dopamine is missing in the main known transmitter.  

      Fixed. Thanks for spotting this.

      Reviewer #3 (Recommendations For The Authors): 

      Great article. Minor suggestions to strengthen presentation: 

      Figure 1B is hard to interpret. There could be more intuitive ways of representing the data and the methodologies that support a given expression pattern. Neurons should also be reordered by alphabetical order rather than expression levels to facilitate finding them.  

      We considered alternative ways of presenting this data, but, regrettably, did not come up with a better approach. To clarify, the primary focus of Fig. 1B is to compare expression of previously reported reporters and scRNA data, which was quite literally the initial impetus for our analysis, i.e. we noted strong scRNA signals that had not previously been supported by transgenic reporter data. For a comprehensive version of the table that includes more details on the expression of CRISPR reporter alleles, please refer to Table S1, which we referenced in the figure legend.   

      GFP-only channel images in Figures 3, 4, 5, and 9 sometimes show dim signals that the authors are highlighting as new findings. We recommend using the inverted grayscale version of that channel since the contrast of dim signals is more noticeable to the human eye rather than when the image is colorized. 

      Good point, we implemented these suggestions in the figures the reviewer mentioned, now re-numbered Figures 4, 5, 6, and 12. For Figure 6 (tph-1, bas-1, and cat-1 expression in hermaphrodites), we used a new cat-1 head image to reflect the newly identified ASI and AVL expression that wasn’t readily visible in the original projection used in the earlier version of this manuscript. We also added grayscale images in Figure 13 to reflect dim tbh-1 expression in IL2 neurons more clearly.

      A plan to integrate this new information into WormAtlas. The C. elegans community is characterized by the open sharing of information on platforms that are user-friendly and accessible. Ideally, the new information would not just 'erase' what was observed before but will describe the new observations and will let the community reach their own conclusions since there is no perfect method and even these CRISPR/Cas9 reporter strains are only proxy for gene expression that subject to post-transcriptional regulation since they depend on T2A and SL2 sequences. 

      We completely agree with the reviewer’s suggestion. We will coordinate with WormAtlas on integrating this new information. 

      In the case of neurons that were removed from using a specific neurotransmitter, like PVQ. What do the authors conclude overall, if it does not use glutamate, are there any new hypotheses to what it could be using?

      Since all neurons express multiple neuropeptides, we hypothesize neurons such as PVQ may be primarily peptidergic. This is included in Discussion, "Neurons devoid of canonical neurotransmitter pathway genes may define neuropeptide-only neurons".  

      In Table S5, the I4 neuron is listed as a variable for eat-4 expression but in Table S1 it says that there was no CRISPR expression detected. Which one is correct? 

      Thanks for spotting this. Table S5 is correct, we saw very dim and variable expression of the eat-4 reporter allele in I4. Table S1 is fixed now.

      Additional discussion points that might be important for the community: 

      CRIPSR strains used here should be deposited in the CGC. 

      Yes, all strains generated in this study have already been deposited to CGC. 

      It would be great to have an additional discussion point on how the neural clusters in CenGEN were defined based on the fosmid reporter expression, so in a way using the defining factor as one that was already defined by it might make results confusing. 

      Neural cluster definition in CeNGEN did not rely on isolated data points but on the combination of many expression reagents, each with its own shortcomings, but in combination providing reliable identification. Since one feedback we have gotten from many readers of our manuscript is that it is already very long as is, we prefer not to dilute the discussion further.

      It would be important to discuss the rate of neurotransmitter genes that have variable expression patterns. Are any of those genes used in NeuroPAL to define specific neuronal classes? This is important to describe as NeuroPAL labeling is being used to define neuronal identity. 

      All the reporters used in NeuroPAL are promoter-based, very robust and do not include the full loci of genes, so they are not directly comparable with the CRISPR reporter alleles in this study. However, we recognize that some expression pattern variability could be confusing. We have discussed this more in the section "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets. 

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure. 

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. 

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song. 

      We thank the reviewer for this suggestion, and plan to include a comparison of the triplet loss embedding space to the VAE space for song similarity comparisons in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field. 

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding. 

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver. 

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods. 

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.  

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches: 

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions.

      (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods. 

      We recognize the similarities between these approaches, and plan to include a comparison of triplet loss embeddings compared with MMD and VAE embeddings compared with MMD and EMD in the revised manuscript. Thank you for this suggestion.  

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability. 

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term. 

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field. 

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies. 

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs. 

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. 

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We will revise the original manuscript to make this clearer. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      There appears to be some misunderstanding regarding our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and provide some additional explanation in the manuscript. First, we are not training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128-dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of twodimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a self-supervised learning task, as it does require syllable labels to generate the triplets. A common self-supervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we plan to include a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript.  

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, there appears to be some misunderstanding of our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful lowdimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN). 

      We did compare multiple methods for syllable segmentation (WhisperSeg,  TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.  

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird. 

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in testing. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1seg scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and nonstationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was ever any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see supplemental figure 2b), but still very high precision scores (supplemental figure 2a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Fig 3c) or syllable duration entropy (supplemental figure 7a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) and be willing to dedicate the time and resources to manually labeling a subset of recordings from each of their birds, we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.  

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well. 

      We appreciate the reviewer’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings share with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets.  

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data. 

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology and is outside the scope of our current efforts.  

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method. 

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human annotations for each individual bird being analyzed using AVN was never the goal of our pipeline, would require significant changes to AVN’s design, and is outside the scope of this manuscript.  

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one. 

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#SyllableRepetitions

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy. 

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We will expand our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (supplementary figure 2). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments nonvocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label, but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another. 

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and supplemental figure 4b&e. We will also expand our discussion of these different types of errors in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy. 

      We apologize for not making this distinction sufficiently clear in the manuscript and will add additional explanation to the main text to make the reasoning more apparent. We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space.  

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate. 

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. Anecdotally, we observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, but we will add an additional supplementary figure to the revised manuscript showing this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      We would like to see the reviewers' critiques be addressed satisfactorily.

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript reads fairly well, there are a number of minor grammatical edits that would improve the reading of this paper.

      To improve the reading, we sent our manuscript out for language polishing using Wiley Editing Services. The changes were labeled in Red color.

      The opening paragraph, while seeking to establish clinical relevance, likely can be removed or tailored.

      We agreed with this concern, the first paragraph was tailored in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Although the authors provided a substantial amount of data to support the conclusion, there are several important issues to be added to strengthen the study, as highlighted below:

      Figure 2: In this figure, the authors provided evidence that TAK1 phosphorylates PLCE1 at serine 1060. To make the data more convincing, the authors need to perform an in vitro kinase assay to confirm this result. Ideally, the in vitro kinase assay also includes a mutant form of PLCE1-S1060A as a control.

      Thank the referee for this constructive comment. Since we cannot perform experiments with radioactive compounds in our institute, therefore the phosphorylation of PLCE1 at serine 1060 induced by TAK1 cannot be further confirmed by a routine in vitro kinase, in which 32P was used. Instead, we performed TAK1 and PLCE1 pulldown, and incubated these two proteins in a kinase assay buffer. The resulting samples were analyzed by western blot. Our data showed that TAK1 phosphorylates PLCE1 at serine 1060, as evidenced by a strong band for p-PLCE1 S1060 when TAK1 incubated with PLCE1. For the sample contained TAK1 and PLCE1 S1060A, the band density for p-PLCE1 S1060 was largely decreased. Ideally, there should be no band for p-PLCE1 S1060 when TAK1 incubated with PLCE1 S1060A. However, our current data detected p-PLCE1 S1060 in this reaction, although it was decreased as compared to wild type PLCE1. The reason for this is likely due to the presence of endogenous wild type PLCE1 in the TAK1 pull-down samples. These data were presented as Figure S6C in the revised manuscript.

      Figure 4: In this part of the study, the author claimed that TAK1 inhibits PLCE1 enzyme activity. However, they fall short of evidence that this inhibitory effect of TAK1 on PLCE1 enzyme activity is mediated via phosphorylation at S1060.

      Thank the referee for this critical comment. Actually, we measured the effect of TAK1 on mutate PLCE1 activity, which was presented in Figure 4B. The data showed that TAK1 has no inhibitory effect on PLCE1 S1060A enzyme activity. In contract, TAK1 repressed wild type PLCE1 activity (Figure 4A). These data indicate that, at least in part, the inhibitory effect of TAK1 on PLCE1 enzyme activity is mediated via phosphorylation at S1060.

      Figures 6 and 7: Here the authors used ESCC metastasis model in nude mice to establish the role of TAK1 and PLCE1, respectively. However, the effects of TAK1 and PLCE1 are studied separately, and there no link to show that TAK1 inhibits metastasis via activation of PLCE1. Ideally the authors should use the transgenic mice with expression of mutant PLCE1-S1060A to support the conclusion.

      We agreed with this notion that the transgenic mice with expression of mutant PLCE1-S1060A will further strengthen our conclusions. However, due to limited time and resource, we cannot generate such genetic mice. Thank the referee for this insightful and critical comment.

      Reviewer #3 (Recommendations For The Authors):

      (1) Have the authors ever checked the phosphorylation status of endogenous PLCE1 S1060p in the TAK1 overexpression alone ECA-109 cell line? Does it increase? Similarly, in siMap3k7 ECA-109 cells, does endogenous PLCE1 S1060p reduce?

      Thank the referee for these critical comments. During the revision, we examined whether TAK1 overexpression or knockdown affects endogenous p-PLCE1 S1060 in ECA-109 cells. Our data showed that TAK1 overexpression induced an increase in p-PLCE1 S1060, whereas TAK1 knockdown resulted in a decrease in p-PLCE1 S1060. These data were presented in Figure S6A, B.

      (2) The authors show that using TAK1 inhibitors cannot completely abolish all the phosphorylation of PLCE1 S1060 in cells and mice. Does it mean some other potential kinases also target PLCE1 S1060?

      Thank the referee for this insightful comment. As mentioned by the referee, TAK1 inhibitors cannot completely abolish all the phosphorylation of PLCE1 S1060 in cells and mice. Therefore, it is likely that some other potential kinases also target PLCE1 S1060, we added this notion in the Discussion in the revised manuscript.

      (3) PLCE1 S1060A completely bans the migration and invasion regulation function of TAK1 (Figure S10), indicating that PLCE1 S1060 is a very unique downstream target of TAK1 in migration and invasion regulation in the ECA-109 cell line. As a MAP3K, TAK1 was documented to regulate migration and invasion through multiple signal transduction pathways such as IKK, JNK, p38 MAPK, et al. Have the authors ever tried to test the effect of overexpression/knockdown of TAK1 on a few of these pathways in the ECA-109 cell line?

      Thank the referee for these constructive comments. During the revision, we analyzed the effects of TAK1 on IKK, JNK, p38 MAPK, and ERK. Our data showed that TAK1 positively regulates these signal transduction pathways. For example, TAK1 overexpression increased p-IKK, p-JNK, p-P38 MAPK, and p-ERK in ECA-109 cells, whereas TAK1 knockdown decreased these protein levels. Although these pathways are affected by TAK1, with respect to cell migration and invasion, PLCE1 is likely a unique substrate of TAK1 in migration and invasion regulation in ECA-109 cells. We added these contents in the Results section in revised manuscript, and these data were presented in Figure S12A-D.

      (4) Does TAK1 only catalyze the S1060 site on PLCE1 protein?

      Thank the referee for this insightful comment. Currently, we just found TAK1 catalyze the S1060 site on PLCE1 protein, which cannot exclude the possibility that TAK1 also phosphorylates other residues on PLCE1 protein.

      (5) Is there any PLCE1 S1060 point mutation existing in ESCC patients? Does it influence the prognosis of ESCC patients?

      Thank the referee for this critical and constructive comment, which would further strengthen the significance of current study. However, we are facing a shortage of enough patient tumor samples for addressing this very important issue.

      (6) What's the effect of TAK1 inhibitor on mice body weight?

      Thank the referee for this critical comment. Since body weight is an important parameter, we measured mouse body weight during the whole experiments. The results showed that the body weight growth rate is not affected by TAK1 inhibitor, Takinib. These data were included in the revised manuscript as Figure S20A.

      (7) For the control groups of the mouse xenograft tumor model in Figures 6 vs 7, why does the number of metastases behave so differently?

      In Figure 6, the control mice were administered with ECA-109 cells via tail vein injection, mice were then treated with vehicle (saline). As for the control mice in Figure 7, they were administered with ECA-109 cells via tail vein injection. It should be mentioned that these cells were transduced with control lentivirus. Due to these differences, therefore, these two control mice have different number of metastases.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, James Lee, Lu Bai, and colleagues use a multifaceted approach to investigate the relationship between transcription factor condensate formation, transcription, and 3D gene clustering of the MET regulon in the model organism S. cerevisiae. This study represents a second clear example of inducible transcriptional condensates in budding yeast, as most evidence for transcriptional condensates arises from studies of mammalian systems. In addition, this study links the genomic location of transcriptional condensates to the potency of transcription of a reporter gene regulated by the master transcription factor contained in the condensate. The strength of evidence supporting these two conclusions is strong. Less strong is evidence supporting the claim that Met4-containing condensates mediate the clustering of genes in the MET regulon.

      Strengths:

      The manuscript is for the most part clearly written, with the overriding model and specific hypothesis being tested clearly explained. Figure legends are particularly well written. An additional strength of the manuscript is that most of the main conclusions are supported by the data. This includes the propensity of Met4 and Met32 to form puncta-like structures under inducing conditions, formation of Met32-containing LLPS-like droplets in vitro (within which Met4 can colocalize), colocalization of Met4-GFP with Met4-target genes under inducing conditions, enhanced transcription of a Met3pr-GFP reporter when targeted within 1.5 - 5 kb of select Met4 target genes, and most impressively, evidence that several MET genes appear to reposition under transcriptionally inducing conditions. The latter is based on a recently reported novel in vivo methylation assay, MTAC, developed by the Bai lab.

      Weaknesses:

      My principal concern is that the authors fail to show convincing evidence for a key conclusion, highlighted in the title, that nuclear condensates per se drive MET gene clustering. Figure 4E demonstrates that Met4 molecules, not condensates per se, are necessary for fostering distant cis and trans interactions between MET6 and three other Met4 targets under -met inducing conditions. In addition, the paper would be strengthened by discussing a recent study conducted in yeast that comes to many of the same conclusions reported here, including the role of inducible TF condensates in driving 3D genome reorganization (Chowdhary et al, Mol. Cell 2022).

      Following the reviewer’s advice, we carried out MTAC with the VP near MET6 in WT Met4 and ΔIDR2.3 strains (results shown below). The conclusions are somewhat ambiguous. For long-distance interactions with MUP1, YKG9, STR3, and MET13, we indeed observe decreased MTAC signals close to background levels in the ΔIDR2.3 strain, which aligns with the model suggesting that Met4 condensation promotes clustering among Met4 targeted genes. However, we also noticed significant decreases in the local MTAC signals (HIS3 and MET6). It is possible that the changes in Met4 condensates alter the chromosomal folding near MET6, thereby affecting the local MTAC signals. Alternatively, LacI-M.CviPI (the methyltransferase) could be induced to a lesser extent in the ΔIDR2.3 strain, leading to a genome-wide decrease in MTAC signals. Due to this ambiguity, we decided not to include the following plot in the main figure.

      Author response image 1.

      We discussed Hsf1 and added the suggested reference on page 13.

      Other concerns:

      (1) A central premise of the study is that the inducible formation of condensates underpins the induction of MET gene transcription and MET gene clustering. Yet, Figure 1 suggests (and the authors acknowledge) that puncta-like Met4-containing structures pre-exist in the nuclei of non-induced cells. Thus, the transcription and gene reorganization observed is due to a relatively modest increase in condensate-like structures. Are we dealing with two different types of Met4 condensates? (For example, different combinations of Met4 with its partners; Mediator- or Pol II-lacking vs. Mediator- or Pol II-containing; etc.?) At the very least, a comment to this effect is necessary.

      Although Met4 can form smaller puncta in the +met condition (Figure 1A), it cannot be recruited to its target genes due to the absence of its sequence-specific binding partners, Met31 and Met32 (these two factors are actively degraded in the +met condition). Consistently, in the +met condition, Met4 shows extremely low genome-wide ChIP signals (Figure 3C). Therefore, these Met4 puncta in +met do not have organize the 3D genome or have gene regulatory functions. This discussion is added on page 12.

      (2) Using an in vitro assay, the authors demonstrate that Met4 colocalizes with Met32 LLPS droplets (Figure 2F). Is the same true in vivo - that is, is Met32 required for Met4 condensation? This could be readily tested using auxin-induced degradation of Met32. Along similar lines, the claim that Met32 is required for MET gene clustering (line 250) requires auxin-induced degradation of this protein.

      As the reviewer pointed out above, cells in the +met condition also show small Met4 puncta. In this condition, Met32 is essentially undetectable (Met31 level is even lower and remains undetectable even in the -met conditions). Therefore, Met4 does not strictly require the presence of Met32 in vivo (may require other factors or modifications). Met4 does not have DNA-binding activity, and therefore it cannot target and organize chromosomes on its own. Although we did not do the Met32 degradation experiment, we measured the 3D genome conformation in +met and showed that there are no detectable interactions among Met4 target genes.

      (3) The authors use a single time point during -met induction (2 h) to evaluate TF clustering, transcription (mRNA abundance), and 3D restructuring. It would be informative to perform a kinetic analysis since such an analysis could reveal whether TF clustering precedes transcriptional induction or MET gene repositioning. Do the latter two phenomena occur concurrently or does one precede the other?

      We appreciate the reviewer’s insightful question. It is indeed intriguing to consider whether TF clustering precedes transcriptional induction and MET gene clustering. However, as mentioned on page 12 of our manuscript, this experiment poses significant challenges. The low intensities of the Met4 and Met32 signals necessitate high excitation for imaging, which also makes them prone to photo-bleaching. Consequently, we have been unable to measure the dynamics of Met4 and Met32 puncta in vivo, let alone co-image them with DNA/RNA. Undertaking this experiment will require considerable effort, which we plan to pursue in the future.

      (4) Based on the MTAC assay, MET13 does not appear to engage in trans interactions with other Met4 targets, whereas MET6 does (Figures 4C and 4E). Does this difference stem from the greater occupancy of Met4 at MET6 vs. MET13, greater association of another Met co-factor with the chromatin of MET6 vs. MET13, or something else?

      We were also surprised by this result, given that MET13 emerged as one of the strongest transcriptional hotspots in our previous screen. It also exhibits one of the highest Met4 ChIP signals and is closely associated with the nuclear pore complex. Our earlier findings indicate that DNA dynamics near the VP significantly influence the MTAC signal; specifically, a VP with constrained motion is less effective at methylating interacting sites (Li et al., 2024). Therefore, it is plausible that MET13 is associated with a large Met4 condensate, which constrains the motion of nearby chromatin and diminishes MTAC efficiency.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript combines live yeast cell imaging and other genomic approaches to study how transcription factor (TF) condensates might help organize and enhance the transcription of the target genes in the methionine starvation response pathway. The authors show that the TFs in this response can form phase-separated condensates through their intrinsically disordered regions (IDRs), and mediate the spatial clustering of the related endogenous genes as well as reporter inserted near the endogenous target loci.

      Strengths:

      This work uses rigorous experimental approaches, such as imaging of endogenously labeled TFs, determining expression and clustering of endogenous target genes, and reporter integration near the endogenous target loci. The importance of TFs is shown by rapid degradation. Single-cell data are combined with genomic sequencing-based assays. Control loci engineered in the same way are usually included. Some of these controls are very helpful in showing the pathway-specific effect of the TF condensates in enhancing transcription.

      Weaknesses:

      Perhaps the biggest weakness of this work is that the role of IDR and phase separation in mediating the target gene clustering is unclear. This is an important question. TF IDRs may have many functions including mediating phase separation and binding to other transcriptional molecules (not limited to proteins and may even include RNAs). The effect of IDR deletion on reduced Fano number in cells could come from reduced binding with other molecules. This should be tested on phase separation of the purified protein after IDR deletion. Also, the authors have not shown IDR deletion affects the clustering of the target genes, so IDR deletion may affect the binding of other molecules (not the general transcription machinery) that are specifically important for target gene transcription. If the self-association of the IDR is the main driving force of the clustering and target gene transcription enhancement, can one replace this IDR with totally unrelated IDRs that have been shown to mediate phase separation in non-transcription systems and still see the gene clustering and transcription enhancement effects? This work has all the setup to test this hypothesis.

      We thank the reviewer for raising this point, and we tried more in vitro and in vivo experiments with Met4 IDR deletions. See the answer to Reviewer 1 for the in vivo 3D mapping experiment.

      We purified Met4-ΔIDR2 with an MBP tag, but its low yield made labeling and conducting thorough experiments challenging. At concentrations above ~10 μM, the protein tends to aggregate, while at lower concentrations, it remains diffusive in solution and does not form condensates. When we mixed purified Met4-ΔIDR2 with Met32, we observed reduced partitioning inside Met32 condensates compared to the full-length Met4. As the reviewer noted, this diminished interaction may contribute to the decreased puncta formation observed in vivo. This result is added to the manuscript on page 11 and supplementary figure 5.

      The Met4 protein was tagged with MBP but Met 32 was not. MBP tag is well known to enhance protein solubility and prevent phase separation. This made the comparison of their in vitro phase behavior very different and led the authors to think that maybe Met32 is the scaffold in the co-condensates. If MBP was necessary to increase yield and solubility during expression and purification, it should be cleaved (a protease cleavage site should be engineered) to allow phase separation in vitro.

      Following the reviewer’s advice, we purified Met4-TEV-MBP so that the MBP can be cleaved off. Unfortunately, concentrated Met4-TEV-MBP needs to be stored at high salt (400mM) to be soluble. When exchanged into a suitable buffer for TEV cleavage (≤200 mM NaCl), nearly all soluble protein aggregates. Attempts to digest the protein in storage buffer results in observable aggregation before significant cleavage (see below).  

      Author response image 2.

      Are ATG36 and LDS2 also supposed to be induced by -met? This should be explained clearly. The signals are high at -met.

      Genomic loci ATG36 and LDS2 were chosen as controls because they are not bound by Met TFs (ChIP-seq tracks) and their expressions are not induced by -met (RNA-seq data). This information is added to the manuscript on page 9. When MET3pr-GFP reporter is inserted into these loci, GFP is induced by -met (because it is driven by the MET3 promoter), but the induction level is less than the same reporter inserted into the transcriptional hotspot like MET13 and MET6 (Figure 6E, also see Du et al., Plos Genetics, 2017).

      ChIP-seq data:

      Author response image 3.

      RNA-seq counts:

      Author response table 1.

      Figure 6B, the Met4-GFP seems to form condensates at all three loci without a very obvious difference, though 6C shows a difference. 6C is from only one picture each. The authors should probably quantify the signals from a large number of randomly selected pictures (cells) and do statistics.

      If we understand this comment correctly, the reviewer is referring to the fact that all three loci in Figure 6B appear to show a peak in GFP intensity. This pattern emerges because these images are averaged among many cells (number of cells analyzed in 6B has been added to the Figure legends). GFP intensities near the center will always be higher because peripheral pixels are more likely to fall outside the nuclei boundaries, where Met4 signals are absent (same as in Figure 3F). Importantly, MET6 locus shows higher intensity near the center in comparison to PUT1 and ATG36, indicating its co-localization with Met4 condensates.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors probe the connections between clustering of the Met4/32 transcription factors (TFs), clustering of their regulatory targets, and transcriptional regulation. While there is an increasing number of studies on TF clustering in vitro and in vivo, there is an important need to probe whether clustering plays a functional role in gene expression. Another important question is whether TF clustering leads to the clustering of relevant gene targets in vivo. Here the authors provide several lines of evidence to make a compelling case that Met4/32 and their target genes cluster and that this leads to an increase in transcription of these genes in the induced state. First, they found that, in the induced state, Met4/32 forms co-localized puncta in vivo. This is supported by in vitro studies showing that these TFs can form condensates in vitro with Med32 being the driver of these condensates. They found that two target genes, MET6 and MET13 have a higher probability of being co-localized with Met4 puncta compared with non-target loci. Using a targeted DNA methylation assay, they found that MET13 and MET6 show Met4-dependent long-range interactions with other Met4-regulated loci, consistent with the clustering of at least some target genes under induced conditions. Finally, by inserting a Met4-regulated reporter gene at variable distances from MET6, they provide evidence that insertion near this gene is a modest hotspot for activity.

      Weaknesses:

      (1) Please provide more information on the assay for puncta formation (Figure 1). It's unclear to me from the description provided how this assay was able to quantitate the number of puncta in cells.

      Due to the variation in puncta size and intensity (as illustrated in Figure 1A), counting the number of puncta would be highly subjective with arbitrary cutoffs. Therefore, we chose to calculate the CV and Fano values instead, which are unbiased measures. Proteins that form puncta will exhibit greater pixel-to-pixel variations in GFP intensity, resulting in higher CV and Fano values.

      (2) How does the number of puncta in cells correspond with the number of Met-regulated genes? What are the implications of this calculation?

      As previously mentioned, defining the exact number of Met4 puncta is challenging. The number of puncta does not necessarily have one-to-one correspondence to the number of Met4 target genes. Some puncta may not be associated with chromosomes, while others may interact with multiple genes.

      (3) A control for chromosomal insertion of the Met-regulated reporter was a GAL4 promoter derivative reporter. However, this control promoter seems 5-10 fold more active than the Met-regulated promoter (Figure 6). It's possible that the high activity from the control promoter overcomes some other limiting step such that chromosomal location isn't important. It would be ideal if the authors used a promoter with comparable activity to the Met-reporter as a control.

      We agree with the reviewer that it will be better to use another promoter with comparable activity. Indeed, this was our rationale for selecting the attenuated GAL1 promoter over the WT version; however, it still exhibited substantially higher activity than the MET3pr. Unfortunately, we do not have a promoter from a different pathway that is calibrated to match the activity level of MET3pr. Nonetheless, MET17pr has much higher activity (~3 fold) than MET3pr, and we observed similar degree of stimulus from the hotspot in comparison to the control locus for both promoters (1.5-2-fold increase in GFP expression) (Figure 6E & F). This suggests that the observed effects are more likely to depend on the activation pathway and TF identity rather than the promoter strength.

      (4) It seems like transcription from a very large number of genes is altered in the Met4 IDR mutant (Figure 7F). Why is this and could this variability affect the conclusions from this experiment?

      We agree with the reviewer that ΔIDR 2.3 truncation affects the expression of 2711 (P-adj <0.05) genes (1339 up,1372 down). We suspect that this is due to the decreased expression of Met4 target genes, leading to altered levels of methionine and other sulfur-containing metabolites. Such changes would have a global impact on gene expression. Importantly, despite the similar number of genes that show up vs down regulation in the ΔIDR 2.3 strain, almost all Met4 targets showed decreased expression (Fig 7F). This supports the model where Met4 condensates lead to increased expression in its target genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) The introduction contains multiple miscitations. Rather than gene clustering, most of the studies and reviews cited (e.g., lines 35-39) report interactions between genomic loci (E-E, E-P, and P-P). There are other claims not supported by the papers cited. Moreover, the authors lump together original research papers and reviews within a given group without distinguishing which is which.

      We thank the reviewer for pointing this out. We reorganized the references in the introduction.

      (2) One option to address the concern regarding the lack of evidence that nuclear condensates per se drive MET gene clustering is to test the impact of Met4 ΔIDR2.3 on MTAC signals.

      We carried out the suggested experiment. See answer above (Reviewer #1, Question #1).

      (3) Authors claim that there are significant differences between values depicted in Figures 1B and 3G. Statistical tests are necessary to show this.

      Significance values were calculated in comparison to free GFP using two-tailed Student’s t-test in 1B,1C, and 3G. The corresponding figure legends are updated.

      (4) How are the data in Figures 3F, G, and 6B, C generated? This is unclear from the information provided in the Figure legends and Materials and Methods.

      For each cell, we projected the highest mCherry and GFP intensity at each pixel for all z positions onto a 2D plane (MIP). The MIP images were aligned with the mCherry dot at the center and averaged among all cells. To calculate the GFP intensities like in Figure 3G and 6C, a single line was drawn across the center and the GFP profile was analyzed by ImageJ. We now describe this in the corresponding figure legends, and the Materials and Methods are also updated.

      (5) Typos/ unclear writing: lines 24, 58, 79, 82, 84, 96, 117, 121, 131, 142, 147, 161 (terminus, not "terminal"), 250, 325, 349, 761 (was, not "are"). For several of these: "condense" is not "condensate"; for many others: inappropriate use of "the". Supplementary Figure 1 legend: not "a single nuclei" instead "a single nucleus".

      We thank the reviewer for pointing this out. We tried our best to correct grammatical errors.

      (6) Define GAL1Spr (Figure 6F).

      The GAL1S promoter is an attenuated GAL1 promoter that lacks two out of the four Gal4 binding site. The original paper is now cited in the manuscript on page 10.  

      (7) Figure 7B, C: there appears to be an inconsistency between the image and bar graph value for ΔIDR3.

      The Fano values calculated in 7C are averaged among a population of cells (we added the cell numbers to the legend), while the image in 7B is an example of an individual nucleus. There is some cell-to-cell variability in how the Met4 appears. To be more representative, we chose a different image for ΔIDR3.

      (8) Supplementary Tables: use descriptive titles for file names.

      This is corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      Figure 4F is not cited in the text, and the color legend seems wrong for targeted and control.

      Figure 4F is now cited in the text. The labels were corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for your thorough evaluation. We are pleased that you find our mouse model for acute retinal artery occlusion to be sophisticated and clinically relevant. Your recognition of the model’s utility in studying the cellular and molecular mechanisms of RAO, as well as its potential for advancing therapeutic research, is highly encouraging and underscores the significance of our work. We are grateful for your supportive feedback.

      Public Reviews:

      Reviewer #1:

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block blood supply to the mouse inner retina, which mimic clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      The revised manuscript has been significantly improved in clarity and readability. It has addressed all my questions convincingly.

      Thank you for your positive feedback. We are pleased to hear that the revisions have significantly improved the manuscript's clarity and readability, and that we have convincingly addressed all your questions. Your encouraging words are of great importance to us.

      Reviewer #2 (Public Review):

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes of major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach for studying retinal artery occlusion. The study is very comprehensive.

      Thank you for your positive feedback. We are delighted that you find the UPOAO model to be a novel and comprehensive approach to studying retinal artery occlusion. Your recognition of the depth and significance of our study is highly valuable and encourages us in our ongoing research.

      Weaknesses:

      Originally, some statements were incorrect and confusing. However, the authors have made clarifications in the revised manuscript to avoid confusion.

      We sincerely appreciate your meticulous review of the manuscript. We have thoroughly addressed the inaccuracies identified in the revised version. Additionally, we have polished the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely. We appreciate your careful attention to detail, and your patience and meticulous suggestions have significantly improved the clarity and readability of our manuscript.


      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1:

      The revised manuscript has been significantly improved in clarity and readability. It has addressed all my questions convincingly.

      Thank you for your positive feedback. We are pleased to hear that the revisions have significantly improved the manuscript's clarity and readability, and that we have convincingly addressed all your questions. Your encouraging words are of great importance to us.

      Reviewer #2:

      The authors have revised the manuscript and/or provided answers to the majority of prior comments, which have helped to strengthen the work. However, addressing the following concerns is still necessary to further improve the manuscript.

      Thank you for acknowledging our revisions and the improvements made to the manuscript. We appreciate your continued feedback and will address the remaining concerns to further enhance the quality of our work.

      The quantification method of RGCs is described in detail in the response letter, but this detailed methodology was not included in the revised manuscript to clarify the quantification process.

      Thank you for your helpful recommendations. We have added detailed methodology in the revised manuscript to clarify the quantification process (line 180-188).

      The graphs in Fig. 3D b-wave and Fig. 3E-b wave are duplicated.

      We apologize for the error in our figures. We have corrected the mistake by replacing the duplicated image in Fig. 3E-b wave with the correct one (line 880). Your careful observation has been very helpful in improving our manuscript. Thank you for bringing this to our attention.

      The quantifications of the thickness of retinal layers in HE-stained sections in Figure 4 (IPL) and Response Figure 2 are incorrect. For mice retina, the thickness of the IPL is approximately 50 µm.

      Thank you for your meticulous review of the manuscript. We have rectified the inaccuracies in the quantification of retinal layer thickness in HE-stained sections in Figure 4, addressing the initial issue with the scale bar.

      We consulted with a microscope engineer and used a microscope microscale to calibrate the scale of the fluorescence microscope (BX63; Olympus, Tokyo, Japan) at the suggestion of the engineer.

      We recount the thickness of all layers of the HE-stained retinal section (line 902). The inner retina thickness in Figure 4 has been adjusted under a new scale bar, and the thickness of the outer retinal layers is now displayed in

      Author response image 1. However, the IPL thickness of the sham eye in the UPOAO model is still not aligned with the common thickness of 50 µm. Therefore we review the literature within our laboratory, focusing on C57BL/6 mice from the same source, revealed that the inner retina thickness (GCC+INL) in the HE-stained sections of the sham eye in the UPOAO model (around 80 µm) is consistent with previous findings (see Author response image 2) conducted by Kaibao Ji and published in Experimental Eye Research in 2021 [1].

      We captured and analyzed the average retinal thickness of each layer over a long range of 200-1100 μm from the optic nerve head (see Author response image 3, highlighted by the green line). The field region has been corrected in the revised manuscript (line 232). Considering the significant variation in retinal thickness from the optic nerve to the periphery, we consulted literature on multi-point measurements of HE-stained retinas. The average thickness of the GCC layer in the control group was approximately 57 µm at 600 µm from the optic nerve head and about 48 µm at 1200 µm from the optic nerve head in the literature [2] (see Author response image 4). The GCC layer thickness of the sham eye in the UPOAO model is around 50 µm, in alignment with existing literature. In future studies, we will pay more attention to the issue of thickness averaging.

      We appreciate your thorough review and valuable feedback, which has enabled us to correct errors and enhance the accuracy of our research.

      Author response image 1.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      Author response image 2.

      Cited from Ji, K., et al., Resveratrol attenuates retinal ganglion cell loss in a mouse model of retinal ischemia reperfusion injury via multiple pathways. Experimental Eye Research, 2021. 209: p. 108683.

      Author response image 3.

      Schematic diagram illustrating the selection of regions. The figure was captured using a fluorescence microscope (BX63; Olympus, Tokyo, Japan) under a 4X objective. Scale bar=500 µm.

      Author response image 4.

      Cited from Feng, L., et al., Ripa-56 protects retinal ganglion cells in glutamate-induced retinal excitotoxic model of glaucoma. Sci Rep, 2024. 14(1): p. 3834.

      There are some typos in the summary table. For example: 'Amplitudes of a-wave (0.3, 2.0, and 10.0 cd.s/m²)' should be 'Amplitudes of a-wave (0.3, 3.0, and 10.0 cd.s/m²)'; and 'IINL thickness' in HE' should be 'INL thickness'.

      Thank you for pointing out the typos in the summary table (line 1073). We have corrected 'Amplitudes of a-wave (0.3, 2.0, and 10.0 cd.s/m²)' to 'Amplitudes of a-wave (0.3, 3.0, and 10.0 cd.s/m²)' and 'IINL thickness' to 'INL thickness'. Your attention to detail is greatly appreciated and has been very helpful in improving our manuscript.

      References

      (1) Ji, K., et al., Resveratrol attenuates retinal ganglion cell loss in a mouse model of retinal ischemia reperfusion injury via multiple pathways. Experimental Eye Research, 2021. 209: p. 108683.

      (2) Feng, L., et al., Ripa-56 protects retinal ganglion cells in glutamate-induced retinal excitotoxic model of glaucoma. Sci Rep, 2024. 14(1): p. 3834.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I praise the authors for their impressive work; all my major concerns have been addressed. I believe the revised article is much stronger and will surely raise the interest of a broad readership.

      I list in the following a few minor points that the authors might want to consider when finalizing the work:

      - It might be helpful for the reader to know if EPIC-ATAC can also be used on tissues different from tumors and PBMC/blood, and how (i.e. which reference should they use). 

      We thank the reviewer for this comment. In the discussion, we have clarified this point as follows:

      “Although not tested in this work, the TME marker peaks and profiles could be used on normal tissues where immune cells are expected to be present. In cases where specific cell types are expected in a sample but are not part of our list of reference profiles (e.g., neuronal cells in brain tumors or tissues other than human PBMCs or tumor samples), custom marker peaks and reference profiles can be provided to EPIC-ATAC to perform cell-type deconvolution. To this end, users should select markers that are cell-type specific, which could be identified using pairwise differential analysis performed on ATAC-Seq data from sorted cells from the populations of interest, following the approach developed in this work (Figure 1, see Code availability).”

      - In Fig 2 the numbers are hard to read as they are too close or overlapping.We have updated Figure 2 to avoid the overlap between the numbers.

      - In Fig 5 I see some squared around the sub-panels, but it might be due to the PDF compression. 

      We do not see these squares on the Figure 5 but have seen such squares on Figure 1. We have checked that all the PDF files uploaded on the eLife submission system do not contain the previously mentioned squares.

      - In the Introduction, some "deconvolution concepts" are introduced (e.g. Line 63-65), but not explained/illustrated. It might be helpful to refer to a "didactic" review. 

      We have added two references to these sentences in the introduction:

      “As described in more details elsewhere (Avila Cobos et al., 2018; Sturm et al., 2019), many of these tools model bulk data as a mixture of reference profiles either coming from purified cell populations or inferred from single-cell genomic data for each cell type.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular, apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affect decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

      Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy, anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. The biggest drawback is that it does not provide evidence for the idea that a match between chronotype and delay matters is especially relevant for people with depression or continuous measures like anhedonia and apathy. It is unclear whether disorders further interact with chronotype and time of day to determine a bias against effort. On the other hand, the study does provide evidence that future studies should consider such interactions when examining questions about effort expenditure in psychiatric disorders.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria.

      Weaknesses:

      The authors do not explore how chronotype and depression are related (does one mediate the effect of the other etc). Both variables are included in the same model in the revised article now which is a great improvement, but it also means psychopathology and circadian rhythms are treated as distinct phenomena and their relationship in predicting effort-reward preferences is not examined.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Two points in response to changes the authors made:

      (1) "motivational tendency" is in our opinion not an improved phrase over "choice bias". A paper by Jon Roiser calls it "overall bias to accept effortful challenges" (but that's maybe too long?)

      We thank the reviewer for their suggestion of renaming our computational parameter and agree it would be of value to introduce and label this parameter in line with other work, improving consistency across the literature. Hence, we have updated our manuscript and now introduce the parameter as bias to accept effortful challenges for reward and refer to the parameter as acceptance bias thereafter.

      We have updated this nomenclature throughout the manuscript text, figures and supplement.

      (2) The new title "Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making" sounds slightly causal (as would be the case in a longitudinal or intervention study). Maybe instead the authors could use "are associated with" or similar?

      We agree with the reviewers that our current title could be interpreted in a causal manner. We have updated our title to now read A common alteration in effort-based decision-making in apathy, anhedonia, and late circadian rhythm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of oxytocin (OT) neurons in the paraventricular nucleus (PVN) and their projections to the medial prefrontal cortex (mPFC) in regulating pup care and infanticide behaviors in mandarin voles. The researchers used techniques like immunofluorescence, optogenetics, OT sensors, and peripheral OT administration. Activating OT neurons in the PVN reduced the time it took pup-caring male voles to approach and retrieve pups, facilitating pup-care behavior. However, this activation had no effect on females. Interestingly, this same PVN OT neuron activation also reduced the time for both male and female infanticidal voles to approach and attack pups, suggesting PVN OT neuron activity can promote pup care while inhibiting infanticide behavior. Inhibition of these neurons promoted infanticide. Stimulating PVN->mPFC OT projections facilitated pup care in males and in infanticide-prone voles, activation of these terminals prolonged latency to approach and attack. Inhibition of PVN->mPFC OT projections promoted infanticide. Peripheral OT administration increased pup care in males and reduced infanticide in both sexes. However, some results differed in females, suggesting other mechanisms may regulate female pup care.

      Strengths:

      This multi-faceted approach provides converging evidence, strengthens the conclusions drawn from the study, and makes them very convincing. Additionally, the study examines both pup care and infanticide behaviors, offering insights into the mechanisms underlying these contrasting behaviors. The inclusion of both male and female voles allows for the exploration of potential sex differences in the regulation of pup-directed behaviors. The peripheral OT administration experiments also provide valuable information for potential clinical applications and wildlife management strategies.

      Weaknesses:

      While the study presents exciting findings, there are several weaknesses that should be addressed. The sample sizes used in some experiments, such as the Fos study and optogenetic manipulations, appear to be small, which may limit the statistical power and generalizability of the results. Effect sizes are not reported, making it difficult to evaluate the practical significance of the findings. The imaging parameters and analysis details for the Fos study are not clearly described, hindering the interpretation of these results (i.e., was the entire PVN counted?). Also, does the Fos colocalization align with previous studies that look at PVN Fos and maternal/ paternal care? Additionally, the study lacks electrophysiological data to support the optogenetic findings, which could provide insights into the neural mechanisms underlying the observed behaviors. 

      In some previous studies (He et al., 2019; Mei, Yan, Yin, Sullivan, & Lin, 2023), the sample size in morphological studies is also small and may be representative. We agree with reviewer’s opinion that results from larger sample size may be more statistically powerful and generalizable. We will pay attention to this issue in the future study. As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio. We have added the objective magnification used in the figure legend. The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, and Fos, OT and merged positive neurons were counted. Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly, Hiura, Saunders, & Ophir, 2017). To support the optogenetic findings, we used c-Fos expression as a marker of neuron activity and revealed significant increases/decreases of c-Fos positive neurons induced by optogenetic activation/inhibition (Supplementary Data Fig. 1), and additionally we found that optogenetic inhibition of OT neurons reduced levels of OT release using OT1.0 sensors. Based on these two experiments, we verified that optogenetic manipulation in the present study is validate and results of optogenetic experiment are reliable (Supplementary Data Fig. 5).

      The study has several limitations that warrant further discussion. Firstly, the potential effects of manipulating OT neurons on the release of other neurotransmitters (or the influence of other neurochemicals or brain regions) on pup-directed behaviors, especially in females, are not fully explored. Additionally, it is unclear whether back-propagation of action potentials during optogenetic manipulations causes the same behavioral effect as direct stimulation of PVN OT cells. Moreover, the authors do not address whether the observed changes in behavior could be explained by overall increases or decreases in locomotor activity.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study. For the optogenetics experiments, we have referred to some of the previous research (Mei et al., 2023; Murugan et al., 2017), and in our study we have also carried out the verification of the reliability of the methods. To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The authors do not specify the percentage of PVN->mPFC neurons labeled that were OT-positive, nor do they directly compare the sexes in their behavioral analysis (or if they did, it is not clear statistically). While the authors propose that the sex difference in pup-directed behaviors is due to females having greater OT expression, they do not provide evidence to support this claim from their labeling data. It is also uncertain whether more OT neurons were manipulated in females compared to males. The study could benefit from a more comprehensive discussion of other factors that could influence the neural circuit under investigation, especially in females.

      AAV11-Ef1a-EGFP virus can infect fibers and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4). In addition, as reviewers suggested, we compared the numbers of OT neurons, activated OT neurons (OT and Fos double-labeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. We did not analyze whether more OT neurons were manipulated in females compared to males, which is indeed a limitation of this study that requires our attention. 

      As the reviewers suggested, we also discussed other factors that could influence the neural circuit under investigation. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice, pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021). The effects of these factors on pup-directed responses should also be considered in the future study. 

      Reviewer #2 (Public Review):

      Summary:

      This series of experiments studied the involvement of PVN OT neurons and their projection to the mPFC in pup-care and attack behavior in virgin male and female Mandarin voles. Using Fos visualization, optogenetics, fiber photometry, and IP injection of OT the results converge on OT regulating caregiving and attacks on pups. Some sex differences were found in the effects of the manipulations.

      Strengths:

      Major strengths are the modern multi-method approaches and involving both sexes of Mandarin vole in every experiment.

      Weaknesses:

      Weaknesses include the lack of some specific details in the methods that would help readers interpret the results. These include:

      (1) No description of diffusion of centrally injected agents.

      Thanks for your professional consideration. Individuals with appropriate viral expression and optical fiber implant location were included in the statistical analysis, otherwise excluded. For optogenetic experiments, the virus (AAV2/9-mOXT-hCHR2(H134R)–mCherry-ER2-WPRE-pA or rAAV-mOXT-eNpHR3.0-mCherry-WPRE-hGH-pA) was designed and constructed to only infect OT neurons, which limited the diffusion of the virus. For fiber photometric experiments, the OT1.0 sensor was largely able to restrict expression within the mPFC brain region, and additionally individuals with incorrect optical fiber embedding position were not included in the statistical analysis. The diffusion of central optogenetic viruses and OT1.0 sensors are shown in the supplemental figure (Supplementary Data Fig. 7).

      (2) Whether all central targets were consistent across animals included in the data analyses. This includes that is not stated if the medial prelimbic mPFC target was in all optogenetic study animals as shown in Figure 4 and if that is the case, there is no discussion of that subregion's function compared to other mPFC subregions.

      As shown in Figure 4 and in the schematic diagram of the optogenetic experiment, the central targets of virus infection and fiber location remain consistent in the data analysis, otherwise the data would be excluded. In the present study, viruses were injected into the prelimbic (PrL). The PrL and infralimbic (IL) regions of the mPFC play different roles in different social interaction contexts (Bravo-Rivera, Roman-Ortiz, Brignoni-Perez, Sotres-Bayon, & Quirk, 2014; Moscarello & LeDoux, 2013). A study has shown that the PrL region of the mPFC contributes to active avoidance in situations where conflict needs to be mitigated, but also contributes to the retention of conflict responses for reward (Capuzzo & Floresco, 2020). This may reveal that the suppression of infanticide by PVN to mPFC OT projections is a behavioral consequence of active conflict avoidance. In a study on pain in rats, OT neurons projections from the PVN to the PrL were found to increase the responsiveness of cell populations in the PrL, suggesting that OT may act by altering the local excitation-inhibition (E/I) balance in the PrL (Liu et al., 2023). A study on anxiety-related behaviors in male rats suggests that the anxiolytic effects of OT in the mPFC are PrL-specific but not infralimbic or anterior cingulate and that this is achieved primarily through the engagement of GABAergic neurons, which ultimately modulate downstream anxiety-related brain regions, including the amygdala (Sabihi, Dong, Maurer, Post, & Leuner, 2017). This finding may provide possible downstream pathways for further research. 

      (3) How groups of pup-care and infanticidal animals were created since there was no obvious pretest mentioned so perhaps there was the testing of a large number of animals until getting enough subjects in each group.  

      Before the experiments, we exposed the animals to pups, and subjects may exhibit pup care, infanticide, or neglect; we grouped subjects according to their behavioral responses to pups, and individuals who neglected pups were excluded.

      (4) The apparent use of a 20-minute baseline data collection period for photometry that started right after the animals were stressed from handling and placement in the novel testing chamber.

      In fiber photometric experiments, all experimental animals were required to acclimatize to the environment for at least 20 minutes prior to the experiment as described in the Methods section. The time 0 in Fig. 4 represents the point in time when a behavior or a segment of behavior started and is not the actual time 0 at which the test was started.

      (5) A weakness in the results reporting is that it's unclear what statistics are reported (2 x 2 ANOVA main effect of interaction results, t-test results) and that the degrees of freedom expected for the 2 X 2 ANOVAs in some cases don't appear to match the numbers of subjects shown in the graphs; including sample sizes in each group would be helpful because the graph panels are very small and data points overlap.

      Thanks for your suggestion. We displayed analysis methods for the data statistics and the sample sizes for each group of experiments in the figure legends.

      The additional context that could help readers of this study is that the authors overlook some important mPFC and pup caregiving and infanticide studies in the introduction which would help put this work in better context in terms of what is known about the mPFC and these behaviors. These previous studies include Febo et al., 2010; Febo 2012; Peirera and Morrell, 2011 and 2020; and a very relevant study by Alsina-Llanes and Olazábal, 2021 on mPFC lesions and infanticide in virgin male and female mice. The introduction states that nothing is known about the mPFC and infanticide. In the introduction and discussion, stating the species and sex of the animals tested in all the previous studies mentioned would be useful. The authors also discuss PVN OT cell stimulation findings seen in other rodents, so the work seems less conceptually novel. Overall, the findings add to the knowledge about OT regulation of pup-directed behavior in male and female rodents, especially the PVN-mPFC OT projection.

      We appreciate you very much to provide so many valuable references. We have cited them in the introduction and discussion. We agree with the reviewer’s opinion that nothing is known about the mPFC and infanticide is incorrect. It should be whether mPFC OT projections are involved in paternal cares and infanticide remains unclear. A study in mother rats indicated that inactivation or inhibition of neuronal activity in the mPFC largely reduced pup retrieval and grouping (Febo, Felix-Ortiz, & Johnson, 2010). In a subsequent study on firing patterns in the mPFC of mother rats suggested that sensory-motor processing occurs in the mPFC that may affect decision making of maternal care to their pups (Febo, 2012). In a study on new mother rats examining different regions of the mPFC (anterior cingulate (Cg1), PrL, IL), they identified a involvement of the IL cortex in biased preference decision-making in favour of the offspring (Pereira & Morrell, 2020). A study on maternal motivation in rats suggests that in the early postpartum period, the IL and Cg1 subregion in mPFC, are the motivating circuits for pup-specific biases (Pereira & Morrell, 2011), while the PrL subregion, are recruited and contribute to the expression of maternal behaviors in the late postpartum period (Pereira & Morrell, 2011).

      Reviewer #3 (Public Review):

      Summary:

      Here Li et al. examine pup-directed behavior in virgin Mandarin voles. Some males and females tend towards infanticide, others tend towards pup care. c-Fos staining showed more oxytocin cells activated in the paraventricular nucleus (PVN) of the hypothalamus in animals expressing pup care behaviors than in infanticidal animals. Optogenetic stimulation of PVN oxytocin neurons (with an oxytocin-specific virus to express the opsin transgene) increased pup-care, or in infanticidal voles increased latency towards approach and attack.

      Suppressing the activity of PVN oxytocin neurons promoted infanticide. The use of a recent oxytocin GRAB sensor (OT1.0) showed changes in medial prefrontal cortex (mPFC) signals as measured with photometry in both sexes. Activating mPFC oxytocin projections increased latency to approach and attack in infanticidal females and males (similar to the effects of peripheral oxytocin injections), whereas in pup-caring animals only males showed a decrease in approach. Inhibiting these projections increased infanticidal behaviors in both females and males and had no effect on pup caretaking.

      Strengths:

      Adopting these methods for Mandarin voles is an impressive accomplishment, especially the valuable data provided by the oxytocin GRAB sensor. This is a major achievement and helps promote systems neuroscience in voles.

      Weaknesses:

      The study would be strengthened by an initial figure summarizing the behavioral phenotypes of voles expressing pup care vs infanticide: the percentages and behavioral scores of individual male and female nulliparous animals for the behaviors examined here. Do the authors have data about the housing or life history/experiences of these animals? How bimodal and robust are these behavioral tendencies in the population?

      As our response to reviewer 2, animals generally exhibit three types of behavioral responses toward pups, and data on the percentage of these different behavioral types occurring in the group will be included in another study in our lab. The reviewer's suggestion of scoring the behaviors is an inspiring idea that will help us to more fully parse these behaviors. Mandarin voles were captured from the wild in Henan, China. The experimental subjects were F2 generation voles reared in the Experimental Animal Centre of Shaanxi Normal University. In our observations, pup care and infanticide behaviors were conserved across several pup exposures, especially pup care behaviors, whereas for infanticide behaviors we did not conduct more pup exposures in order to protect the pups. 

      Optogenetics with the oxytocin promoter virus is a nice advance here. More details about their preparation and methods should be in the main text, and not simply relegated to the methods section. For optogenetic stimulation in Figure 2, how were the stimulation parameters chosen? There is a worry that oxytocin neurons can co-release other factors- are the authors sure that oxytocin is being released by optogenetic stimulation as opposed to other transmitters or peptides, and acting through the oxytocin receptor (as opposed to a vasopressin receptor)?

      As reviewer suggested, more detailed information about virus construction and choice of optogenetic stimulation parameter have been added in the revised manuscript. The details about the construction of CHR2 and mCherry viruses used in optogenetic manipulation can refer to a previous study in which they constructed an rAAV-expressing Venus from a 2.6 kb region upstream of OT exon 1, which is conserved in mammalian species (Knobloch et al., 2012). For details about construction of the eNpHR 3.0 virus, expression of the vector is driven by the mouse OXT promoter, a 1kb promoter upstream of exon 1 of the OXT gene, which has been shown to induce cell type-specific expression in OXT cells (Peñagarikano et al., 2015). Details about the construction of OT1.0 sensor can be referred to the research of Professor Li's group (Qian et al., 2023). The mapping of the viral vectors and OT1.0 sensor is shown below. 

      The optogenetic stimulation parameters were used based on a previous study (He et al., 2021). However, our description of the parameters in the experiment is still not in detail, so some information about optogenetic stimulation parameters has been added in the method. In pupdirected pup care behavioral test, light stimulation lasted for 11 min. Parameters used in optogenetic manipulation of PVN OT neurons were ~ 3 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF and parameters used in optogenetic manipulation of PVN OT neurons projecting to mPFC were ~ 10 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF to cover the entire interaction. We performed fiber photometric experiments to determine the role that OT plays in behavior, and these results were able to support each other with optogenetic experiments. In addition, we further confirmed the role of optogenetic manipulation on OT release in combination with optogenetic inhibition and OT1.0 sensors (Supplementary Data Fig. 2). It has been previously shown that OT is able to act specifically on OTR in mPFC-PL (Sabihi et al., 2017). Our study focuses on oxytocin neurons as well as oxytocin release, and more research is needed to construct a more complex and complete network regarding the involvement of the OTR and other factors in the mPFC in these behaviors.

      Author response image 1.

      Author response image 2.

       

      Given that they are studying changes in latency to approach/attack, having some controls for motion when oxytocin neurons are activated or suppressed might be nice. Oxytocin is reported to be an anxiolytic and a sedative at high levels.

      As our response to reviewer 1, to exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The OT1.0 sensor is also amazing, these data are quite remarkable. However, photometry is known to be susceptive to motion artifacts and I didn't see much in the methods about controls or correction for this. It's also surprising to see such dramatic, sudden, and large-scale suppression of oxytocin signaling in the mPFC in the infanticidal animals - does this mean there is a substantial tonic level of oxytocin release in the cortex under baseline conditions?

      The optical fiber recording system used in the present study can automatically exclude effects of motion artifacts by simultaneously recording signals stimulated by a 405nm light source. As shown in the formula below, the z-score data were calculated and presented, and the increase and decline of the OT signal is a trend relative to the baseline. For a smooth baseline, the decreasing signal is generally amplified after calculation. In our experiments combining optogenetic inhibition and OT1.0 sensors, we were able to find that there was a certain level of OT release at baseline, on which there was room for a decrease in the signal recorded by the OT1.0 sensor.

      Figure 5 is difficult to parse as-is, and relates to an important consideration for this study: how extensive is the oxytocin neuron projection from PVN to mPFC?

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected the this virus (green, AAV11-Ef1aEGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).  

      In Figures 6 and 7, the authors use the phrase 'projection terminals'; however, to my knowledge, there have not been terminals (i.e., presynaptic formations opposed to a target postsynaptic site) observed in oxytocin neuron projections into target central regions.

      According your suggestion, we replaced the ‘terminals’ with ‘fibers’ to describe it more accurately..

      Projection-based inhibition as in Figure 7 remains a controversial issue, as it is unclear if the opsin activation can be fast enough to reduce the fast axonal/terminal action potential. Do the authors have confirmation that this works, perhaps with the oxytocin GRAB OT sensor?

      Thanks for your suggestion. We measured the OT release using OT1.0 sensors when the OT neuron projections in the mPFC were optogenetically inhibited. The result showed that optogenetic inhibition of OT neuron fibers in the mPFC significantly reduced OT release that validate the method of projection-based inhibition (Supplementary Data Fig. 5).

      As females and males had similar GRAB OT1.0 responses in mPFC, why would the behavioral effects of increasing activity be different between the sexes?

      In the present study, females released higher levels of OT into the mPFC (Figure 4 d, e) than males upon occurrence of different behaviors. In addition, females already exhibited more rapid approach and retrieval of pups than male before the optogenetic activation this may be the reason no effects of this manipulation were found in female.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Check for spelling and grammar errors throughout.

      Thanks to the reviewer's suggestion, we have checked and revised the article.

      (2) Report effect sizes for all significant findings to allow evaluation of practical significance.

      As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio.

      (3) Provide detailed information on the imaging parameters and analysis methods used in the Fos study.

      The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, andFos, OT and merged positive neurons were counted.

      (4) Compare the Fos colocalization results with previous studies examining PVN Fos and maternal/paternal care to contextualize the findings.

      Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly et al., 2017).

      (5) Discuss the limitations of the study, such as the potential effects of manipulating OT neurons on the release of other transmitters or the influence of other neurochemicals or brain regions on pupdirected behaviors, especially in females.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study.

      (6) Address the possibility of back-propagation of action potentials in the optogenetic manipulations causing the same behavioral effects as PVN OT cell stimulation.

      We agree with the reviewer’s opinion hat optogenetic manipulation may possibly induce back-propagation of action potentials that may result in same behavioral effects as OT cell stimulation. We will pay attention to this issue in the future study.  

      (7) Investigate whether changes in locomotor behavior could explain the observed effects on pupdirected behaviors.

      To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      (8) Report the percentage of PVN->mPFC neurons labeled that were OT-positive.

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).

      (9)  Directly compare the sexes in the behavioral analysis and discuss any potential sex differences.

      We agree with the reviewer's suggestion and have added comparisons between two sexes and discussion about relevant results. 

      (10) If available, report and discuss the OT expression levels and the number of OT neurons manipulated in each sex.

      In the present study, we have counted the number of OT cells, but did not measure the level of OT expression using WB or qPCR. In addition, the percentages of CHR2(H134R) and eNpHR3.0 virus infected neurons in total OT positive neurons were presented (Supplementary Data Fig. 7), but we did not know how many cells were actually manipulated during the optogenetic experiment.

      (11) Expand the discussion to include what could be regulating or interacting with the OT circuit under investigation, particularly in females where the effects were less pronounced.

      As the reviewers suggested, we have also added relevant discussion. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021).

      Reviewer #2 (Recommendations For The Authors):

      A few additional things the authors may want to consider:

      (1) I don't understand the subject numbers in the peripheral OT study data shown in Figure 8. Panels p and q have 69 females shown and 50 males. Was there a second, much larger, IP injection study conducted that was different than the subjects shown in panels a-o that had ~5 subjects per treatment group per sex?

      Sorry for the confusing. More animals were used to test effects of OT on infanticide behaviors in our pre-test. These data combined with data from formal pharmacological experiment were presented in Fig. 8p, q. After OT treatment, the changes in detailed and specific behaviors were only collected in several animals. We have clarified that in the revised manuscript. 

      (2) The authors suggest higher baseline OT release in the female mPFC, which makes sense and helps explain some of their results. It seems that the data in Figure 1 show what is probably no sex difference in OT cell numbers in the PVN of Mandarin voles, which is unlike the old studies in mice or rats. If readers look at the data in Figure 1 showing what seems to be no sex difference in OT cell number, the authors' argument in the discussion about mPFC OT release levels higher in females would be inconsistent with their own data shown. The authors have the brain sections they need to help support or undermine this argument in the discussion, so maybe it would be useful to analyze the OT cell numbers across the PVN and report it in this paper or briefly mention it in the discussion.

      We compared the numbers of OT neurons, activated OT neurons (OT and Fos doublelabeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. The inconsistency of the OT cell numbers with previous studies may be due to the method of cell counting, as we did not count all slides consecutively.  

      (3) The discussion suggests visual cues are involved in mPFC OT release relevant for pup care or infanticide, but this is a very odd claim for nocturnal animals that live and nest with their pups in underground burrows.

      Sorry for the confusing. Here, we cited the finding in mice that activation of PVN OT neurons induced by visual stimulation promoted pup care to support our finding that the activity of OT cells of the PVN is involved in pup care, rather than to illustrate the role of visual stimulation in voles. We have clarified that in the revised manuscript.

      (4) The lack of decrease in mPFC OT release in the 2nd and 3rd approaches to pups is probably because the release was so high after the 1st approach that it didn't have time to drop before the subsequent approaches. The authors don't state how long those between-approach intervals were on average to help readers interpret this result.

      As described in our methods, we spaced about 60 s between each behavioral test to allow the signal return back to the baseline level.

      (5) Do PVN-mPFC OT somata collateralize to other brain sites? Could mPFC terminal stimulation activate entire PVN cells and every site they project to? A caveat could be mentioned in the discussion if there's support for this from other optogenetic and PVN OT cell projection studies.

      We verified the OT projections from PVN to mPFC, to validate the optogenetic manipulation of this pathway, but did not investigate whether the OT neurons projecting from PVN to mPFC also project collaterally to other brain regions. It is suggested that mPFC terminal stimulation only activate PVN OT cells projecting mPFC, whether other OT neurons were activated remains unclear. 

      (6) I don't see an ethics statement related to the experiments obviously having to involve pup injury or death. Nothing is said in methods about what happened after adult subjects attacked pups. I assumed the tests were quickly terminated and pups euthanized.

      In case the pups were attacked, we removed them immediately to avoid unnecessary injuries, and injured pups were euthanized.

      (7) The authors could be more specific about what psychological diseases they refer to in the abstract and elsewhere that are relevant to this study. Depression? Rare cases of psychosis? Even within the already rare parental psychosis, infanticide is tragic but rare.

      Infanticide is caused by a variety of factors, mental illness, especially depression and psychosis, is often a very high risk factor among them (Milia & Noonan, 2022; Naviaux, Janne, & Gourdin, 2020). In human, infanticide has been used to refer to the killing, neglect or abuse of newborn babies and older children (Jackson, 2006). Here, we believe that research on the neural mechanisms of infanticide can also contribute to the understanding and treatment of attacks on children, physical and verbal abuse, and direct killing of babies. 

      (8) Figure 8 - in one case the "*" is a chi-square result , correct?

      Thanks for your careful checking. In Figure 8p, q, we applied the chi-square test and  added it in the legend.

      Reviewer #3 (Recommendations For The Authors):

      The only other thing is a typo on line 135: the authors mean 'stimulation' instead of 'simulation'.

      Corrected.

      References

      Bravo-Rivera, C., Roman-Ortiz, C., Brignoni-Perez, E., Sotres-Bayon, F., & Quirk, G. J. (2014). Neural structures mediating expression and extinction of platform-mediated avoidance. J Neurosci, 34(29), 9736-9742. doi:10.1523/jneurosci.0191-14.2014

      Capuzzo, G., & Floresco, S. B. (2020). Prelimbic and Infralimbic Prefrontal Regulation of Active and Inhibitory Avoidance and Reward-Seeking. J Neurosci, 40(24), 4773-4787. doi:10.1523/jneurosci.0414-20.2020

      Febo, M. (2012). Firing patterns of maternal rat prelimbic neurons during spontaneous contact with pups. Brain Res Bull, 88(5), 534-542. doi:10.1016/j.brainresbull.2012.05.012

      Febo, M., Felix-Ortiz, A. C., & Johnson, T. R. (2010). Inactivation or inhibition of neuronal activity in the medial prefrontal cortex largely reduces pup retrieval and grouping in maternal rats. Brain Res, 1325, 77-88. doi:10.1016/j.brainres.2010.02.027

      He, Z., Young, L., Ma, X. M., Guo, Q., Wang, L., Yang, Y., . . . Tai, F. (2019). Increased anxiety and decreased sociability induced by paternal deprivation involve the PVN-PrL OTergic pathway. Elife, 8. doi:10.7554/eLife.44026

      He, Z., Zhang, L., Hou, W., Zhang, X., Young, L. J., Li, L., . . . Tai, F. (2021). Paraventricular Nucleus Oxytocin Subsystems Promote Active Paternal Behaviors in Mandarin Voles. J Neurosci, 41(31), 66996713. doi:10.1523/jneurosci.2864-20.2021

      Jackson, M. (2006). Infanticide. The Lancet, 367(9513), 809. doi:https://doi.org/10.1016/S01406736(06)68323-2

      Kelly, A. M., Hiura, L. C., Saunders, A. G., & Ophir, A. G. (2017). Oxytocin Neurons Exhibit Extensive Functional Plasticity Due To Offspring Age in Mothers and Fathers. Integr Comp Biol, 57(3), 603618. doi:10.1093/icb/icx036

      Kenkel, W. M., Paredes, J., Yee, J. R., Pournajafi-Nazarloo, H., Bales, K. L., & Carter, C. S. (2012). Neuroendocrine and behavioural responses to exposure to an infant in male prairie voles. J Neuroendocrinol, 24(6), 874-886. doi:10.1111/j.1365-2826.2012.02301.x

      Knobloch, H. S., Charlet, A., Hoffmann, L. C., Eliava, M., Khrulev, S., Cetin, A. H., . . . Grinevich, V. (2012). Evoked axonal oxytocin release in the central amygdala attenuates fear response. Neuron, 73(3), 553-566. doi:10.1016/j.neuron.2011.11.030

      Liu, Y., Li, A., Bair-Marshall, C., Xu, H., Jee, H. J., Zhu, E., . . . Wang, J. (2023). Oxytocin promotes prefrontal population activity via the PVN-PFC pathway to regulate pain. Neuron, 111(11), 17951811.e1797. doi:10.1016/j.neuron.2023.03.014

      Mei, L., Yan, R., Yin, L., Sullivan, R. M., & Lin, D. (2023). Antagonistic circuits mediating infanticide and maternal care in female mice. Nature, 618(7967), 1006-1016. doi:10.1038/s41586-023-061479

      Milia, G., & Noonan, M. (2022). Experiences and perspectives of women who have committed neonaticide, infanticide and filicide: A systematic review and qualitative evidence synthesis. J Psychiatr Ment Health Nurs, 29(6), 813-828. doi:10.1111/jpm.12828

      Moscarello, J. M., & LeDoux, J. E. (2013). Active avoidance learning requires prefrontal suppression of amygdala-mediated defensive reactions. J Neurosci, 33(9), 3815-3823. doi:10.1523/jneurosci.2596-12.2013

      Murugan, M., Jang, H. J., Park, M., Miller, E. M., Cox, J., Taliaferro, J. P., . . . Witten, I. B. (2017). Combined Social and Spatial Coding in a Descending Projection from the Prefrontal Cortex. Cell, 171(7), 1663-1677.e1616. doi:10.1016/j.cell.2017.11.002

      Naviaux, A. F., Janne, P., & Gourdin, M. (2020). Psychiatric Considerations on Infanticide: Throwing the Baby out with the Bathwater. Psychiatr Danub, 32(Suppl 1), 24-28. 

      Okabe, S., Tsuneoka, Y., Takahashi, A., Ooyama, R., Watarai, A., Maeda, S., . . . Kikusui, T. (2017). Pup exposure facilitates retrieving behavior via the oxytocin neural system in female mice. Psychoneuroendocrinology, 79, 20-30. doi:10.1016/j.psyneuen.2017.01.036

      Peñagarikano, O., Lázaro, M. T., Lu, X. H., Gordon, A., Dong, H., Lam, H. A., . . . Geschwind, D. H. (2015). Exogenous and evoked oxytocin restores social behavior in the Cntnap2 mouse model of autism. Sci Transl Med, 7(271), 271ra278. doi:10.1126/scitranslmed.3010257

      Pereira, M., & Morrell, J. I. (2011). Functional mapping of the neural circuitry of rat maternal motivation: effects of site-specific transient neural inactivation. J Neuroendocrinol, 23(11), 1020-1035. doi:10.1111/j.1365-2826.2011.02200.x

      Pereira, M., & Morrell, J. I. (2020). Infralimbic Cortex Biases Preference Decision Making for Offspring over Competing Cocaine-Associated Stimuli in New Mother Rats. eNeuro, 7(4). doi:10.1523/eneuro.0460-19.2020

      Qian, T., Wang, H., Wang, P., Geng, L., Mei, L., Osakada, T., . . . Li, Y. (2023). A genetically encoded sensor measures temporal oxytocin release from different neuronal compartments. Nat Biotechnol, 41(7), 944-957. doi:10.1038/s41587-022-01561-2

      Sabihi, S., Dong, S. M., Maurer, S. D., Post, C., & Leuner, B. (2017). Oxytocin in the medial prefrontal cortex attenuates anxiety: Anatomical and receptor specificity and mechanism of action. Neuropharmacology, 125, 1-12. doi:10.1016/j.neuropharm.2017.06.024

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 Public:

      - The authors should carefully address the potential confounding of not counterbalancing the conditions of the first trial in both interoceptive tasks for the 9-month and 18-month age groups. The results of these groups could indeed be driven by having seen the synchronous trial first. 

      Upon addressing this comment, we noticed an error in our presentation scripts that resulted in a fixed-experimental design for most of the infants. Therefore, it is crucial to investigate the impact of the fixed-experimental design on our results. We have conducted extensive additional analyses comparing data from infants with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended, which can be found in Supplementary Materials A. In summary, we do not find that the fixed order design had a strong impact on the findings, as we do not find that looking behavior differed systematically between different randomization orders, while also looking patterns across ages and tasks indicate that we were able to adequately capture variance associated with these features. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its implications on the interpretation of the results.

      For instance, on pages 30 and 31 we have added the following paragraphs:

      “The data presented in this study holds several limitations. First, due to an error in our experimental scripts we unintentionally used a fixed-order design, in which almost all infants saw the same fixed order of condition (always starting with a synchronous trial), image assigned to condition, and location of the image (left/right) instead of a semi-randomized design. Such a fixed-order design holds several important limitations as visual preferences might be influenced by the experimental design, i.e., the first trial always being synchronous might have influenced a mean group preference. Further, we cannot rule out that mean group preferences were influenced by the stimuli used (as in most cases the same stimuli were used for synchronous/asynchronous trials) or by the location of the image in a given trial (left/right). Still, there is no strong theoretical argument as to why image used or location should have an impact on infants’ preferences. The stimuli were selected to be similar to each other, in order not to evoke a piori preferences. To further illustrate the impact of the fixed order design we have conducted several additional analyses, which can be found in Supplementary Materials A, which do not indicate that there was a strong impact of the fixed-order design. Specifically, we find no evidence for systematic differences between infants tested with the fixed design and infants tested with a randomized design.

      Despite these limitations fixed-order designs also hold advantages, as they are more suitable to investigate individual differences (Dang et al., 2020; Hedge et al., 2018). When each participant is exposed to the same procedure, individual differences are less likely to be attributed to effects of randomization but are more likely to reflect real differences between participants. Also, when considering the impact of the randomization, one must consider our results in relation to earlier studies (Maister et al. 2017, Weijs et al. 2022, Imafuku et al. 2023), some of which used the exact same stimuli as we did (Maister et al., 2017), with fully randomized designs. Results of these studies indicate no looking times differences depending on the stimulus assigned to each condition or systematic preferences for one of the stimuli.”

      - The conclusion that cardiac interoception remains stable across infancy is not fully warranted by the data. Given the small sample size of 18-month-old toddlers included in the final analyses, it might be misleading to state this without including the caveat that the study may be underpowered. In other words, the small sample size could explain the direction of the results for this age group. 

      We agree with the reviewer and explicitly acknowledge this issue now in the discission, p.  23: 

      “However, due to the small sample size at 18 months the results regarding changes and stability of interoceptive sensitivity in the second year of life must be considered speculative and need to be validated in further research.”

      Reviewer #1 (Recommendations For The Authors): 

      Below are some comments that the authors may wish to take into account: 

      - Why did the authors choose to apply different statistical analyses across the dataset (i.e. Bayesian t-test is used with the 3-month-old sample, whereas a paired t-test is used for the 9 and 18-month-olds)? 

      The use of different statistical analyses was driven by the timeline of the project, as we had to update our initial plans. Due to challenges related to the Covid-19 pandemic, it was not possible to recruit 3-month-old babies for out study at the time we started the data collection. Thus, we first collected the 9- and 18-month-olds, and the 3-month-olds later. For the 9- and 18-month-old samples we aimed at directly replicating the approach by Maister et al. (2017). However, for the 3-month-olds we wanted to focus more on classification of the strength of evidence in favor/against an effect, taking the results of the equivalence tests for the 9- and 18-month-olds into account.

      The following parts have been added to the manuscript to clarify our approach:

      Sample (p 33): “The 3-month-old sample was tested after completion of the 9- and 18-monthold samples. Initially, we had planned to start data collection with the 3-month-old sample.

      However, due to the Covid-19 pandemic this was not possible.”

      Statistical analysis (p. 41): “At 3 months we used a Bayesian paired t-test as the data collection was done after having collected the 9- and 18-month-old samples. Our intention in the analysis of the 3-month-old sample was to focus more strongly on strength of evidence in favor of/against an effect instead of a binary classification for/against an effect.”

      - I found the way in which sample sizes are reported a little unclear. This may be due to having the Results section before the Methods section (in line with journal requirements), but it would be helpful if the authors could clarify their sample size from the outset. For example, sample size for the 3-month-olds first says N = 80 (page 9), but then it becomes apparent that N = 53 completed the iBEAT and N = 40 completed the iBREATH. I think for the purpose of explaining the results, it might be more helpful to the reader to only know the final sample size and then specify recruited participants and dropout in the Methods. 

      We have adapted the description of sample sizes in the Results section. We now only refer to the number of infants included in a given analysis when reporting the results of the analysis. In addition, we have added the following clarification for the MEGA analysis (p. 11): “This approach allowed us to include 135 observations for the iBEATs from 125 infants, and 120 observations for the iBREATH from 107 infants. The sample size differs slightly from our preregistered approach given that we used the same preprocessing approach for the MEGAanalysis for all samples. “ 

      In addition, we now refer to the sample of the MEGA-analysis in the abstract, to make the understanding of our approach more intuitive.

      - I think the sentence "Interestingly, we find evidence for a positive relationship between cardiac and respiratory perception in our 18-month-old sample" at page 25 could be deleted given that the small sample size of 18-month-olds suggests this result should be interpreted with caution. The authors already explained this in the earlier paragraph (page 24) and simply re-stating this (weak) effect without further elaborating may not be necessary. 

      We have removed the sentence.

      - In multiple places in the manuscript, the authors hint at the association between interoception and certain social and self-related abilities (e.g. joint attention, mirror self-recognition), however, these are not fully elaborated on. Could the authors elaborate on the relation between mirror self-recognition and respiratory interoception (page 30)? Why would the ability to recognise the self-face be associated with the individual's ability to perceive their breathing pattern? How these two processes may be linked is not immediately obvious. 

      We have rephrased the sentence on page 30 to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      - Aren't the 18-month-old infants effectively 19-month-olds? The mean age is 576.65 days, and the age window of recruitment was between 18 and 20 months. 

      We have added a sentence clarifying how we refer to the infants age ranges. “To stay coherent, we refer to each age group throughout the manuscript with regard to the lower end of the age range in which we included infants (e.g., we tested infants between 9 and 10 months, but refer to them as the 9-month-old group).”

      Reviewer #2 Public:

      Weaknesses: 

      (1) My primary concern is that this study did not counterbalance the conditions of the first trial in both iBEAT and iBREATH tests for the 9-month and 18-month age groups. In these tests, the first trial invariably involved a synchronous stimulus. I believe that the order of trials can significantly influence an infant's looking duration, and this oversight could potentially impact the results, especially where a marked preference for synchronous stimuli was observed among infants. 

      Upon conducting further analyses to address this comment, we noticed an error in our presentation scripts that resulted in the inadvertent use of a fixed-experimental design for most infants. Therefore, we have conducted extensive additional analysis which can be found in Supplementary Materials A. Specifically, we compared data from infants who were tested with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its potential implications for the interpretation of the results.

      (2) The analysis indicated that the study's sample size was too small to effectively assess the effects within each age group. This limitation fundamentally undermines the reliability of the findings. 

      We have added a statement addressing this issue to the limitation section: “The reduced sample size might have impacted the statistical power to detect mean preferences for some age groups. Still, it must be noted that even the smaller sample sizes included were of similar size as used in previous studies on infant interoceptive sensitivity (Imafuku et al., 2023; Maister et al., 2017; Weijs et al., 2023).”

      (3) The authors attribute the infants' preferential-looking behavior solely to the effects of familiarity and novelty. However, the meaning of "familiarity" in relation to external stimuli moving in sync with an infant's heartbeat or breathing is not clearly defined. A deeper exploration of the underlying mechanisms driving this behavior, such as from the perspectives of attention and perception, is necessary. 

      We have adapted the respective paragraph in the discussion to clarify the term familiarity, and to also address that other aspects of attention and perception, might be relevant (p. 25): 

      “In this context familiarity might refer to the infant’s perception of congruence between internal signal and external stimuli which might drive the infant’s attention. Specifically, the synchronous condition should be easier to process due to the intersensory redundancy and predictability between interoceptive and external signals. “

      “However, it is important to consider that other cognitive and attentional mechanisms could also influence these responses.”

      Reviewer #2 (Recommendations For The Authors):  

      Introduction: 

      (1) The relevance of respiration to self-regulation and social interaction was not clearly described. 

      We have rephrased the relevant section to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      (2) In the last line of page 5, it might be more appropriate to use the term "meta-cognitive awareness" instead of "meta-perception," as the latter can refer to a different concept. 

      We have changed the word as recommended. 

      (3) The authors predicted a positive correlation in sensitivity between the cardiac and respiratory domains, despite studies in adults suggesting these are not related. How did the authors arrive at this prediction, and how do they interpret the results showing a correlation only in 18-montholds, the age group closest to adults in this study? 

      We have elaborated on our reasoning for our prediction (p. 7): “Adult cardiac and respiratory interoception paradigms typically use two conceptually different paradigms. Thus, null results in the adult literature might be due to the unique characteristics of those paradigms.”

      Further, we have expanded on this result in the discussion (p. 24): “Still, we find a relationship between cardiac and respiratory signals in the oldest sample tested here, the 18-month-olds, which is closest to adults. Although this effect needs to be interpreted with caution due to the small sample size, this might indicate that using conceptually similar experimental paradigms might be a promising avenue to investigate relationships between different interoceptive modalities in adults.”

      Results: 

      (4) Please provide the descriptive statistics (means and standard deviations of looking time) for each independent condition, especially for the 18-month and 3-month age groups where this information is missing and only differences in looking times between conditions were mentioned. Furthermore, since the asynchronous condition includes both fast and slow stimuli, descriptive statistics for each should be included to help readers determine whether effects are due to synchronicity or stimulus speed. 

      We have added the information on mean and sd of looking times to synch and asynch trials to the results section. Mean looking times to both types of asynchronous trials can be found in supplementary materials C. We have added the information about standard deviations to this part. 

      (5) Regarding the MEGA analysis for iBEATs, where a main effect of condition was found (OR = 1.13, t(1769) = 2.541, p = .011), are these t-value and p-value based on the GLMM analysis, or did the authors conduct a separate t-test? This query arises because the p-value of the main effect differs from that in Table 2. Also, is it conventional to present GLMM results in the manner of Table 2, comparing specific level combinations (i.e., synchronous condition and 3month age group), instead of listing main effects and interactions? 

      Thank you very much for pointing out that the results of the GLMM were not reported as precise as possible, which might lead to confusion over the presented p-values. The main effect of condition refers to a post-hoc comparison using estimated marginal means from the GLMM across all age groups, while Table 2 refers to the main effect of condition for age group 3 months. 

      To make the results more accessible we have restructured parts of the manuscript following your suggestions: In the main manuscript we now focus on the interaction effects for condition and age, as well as the post hoc comparison, while we now report null-full model comparison, and tables for all age groups in the supplements. 

      We have added the following clarifying sentences to the manuscript, p. 12:

      “In reporting these results we focus on whether we found evidence for interactions between age groups, and whether we found evidence for a general effect across age groups. In-depth results and tables can be found in Supplementary Materials C. 

      […]

      Next, we computed post hoc comparisons using estimated marginal means from the MEGAanalysis across all age groups to investigate whether we find indications for a similar effect across ages.”

      (6) I am confused about the results indicating a significant effect of condition for the iBREATH dataset excluding 18-month-olds (Table 5, OR = 1.15, t(1050) = 2.397, p = .017), as the description in Table 5 suggests no statistical significance (p = .070). The decision to exclude the 18-month group seems arbitrary, particularly since the age-by-condition interaction was not significant in the GLMM across all three age groups. 

      Thank you very much for the comment, we have removed the analysis excluding the 18-month-old group

      (7) Regarding the relationship between cardiac and respiratory interoceptive sensitivity, the statement "However, we found a significant interaction between iBEATs scores and age at the 18-month level" (p16) seems unclear. Clarification is needed, as mentioning age interaction at a specific age stage is unusual. A pairwise comparison between 3 and 9 months should also be included. 

      Thank you for pointing out that the results could be presented more clearly! Similar to the other MEGA analyses we have put detailed tables of the results of the beta regression in the supplements and have kept a single table with the most important results in the main manuscript. Further, we have clarified the text passage as follows: “However, we found a significant interaction between the iBEATs scores and age, specifically comparing the 3- and 18-month-old groups (β = 3.13, SE = 1.41, p = .027). This interaction indicates that the relationship between iBEATs and iBREATH scores changes between 3 and 18 months of age.”  Also, we have now included a pairwise comparison between 3- and 9-month-olds. 

      Discussion: 

      (8) In pages 27-28, the authors discuss the results of the specification curve analysis, but there is no explanation for the 7th entry (statistical analysis) in Table 9. This entry seems particularly important. 

      We did not include an explanation for the 7th entry, as the impact of the statistical test used was comparatively less pronounced. However, to acknowledge this result we have added the following sentence to the discussion: “Moreover, the statistical test used (paired t-test vs linear mixed model, Table 9, 7th entry) had a rather small impact on the results. However, given the large number of analyses conducted, this might be related to not being able to precisely formulate the model to fit the complexity of the data for each specification.”

      Methods: 

      (9) What were the colors of the stimuli? 

      We have added the colors of the stimuli to the methods section. Further, the stimuli can be found in the osf project associated with the manuscript.

      (10) The percentage of trials excluded during preprocessing should be stated. Additionally, the number of trials included in the statistical analyses for each condition (including synchronous, fast, and slow) should be detailed separately. 

      We have added information on numbers of trials completed and included in Table 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      The spatial transcriptomics data presented in the figures could be considered a valuable resource if shared, with the potential for improved and clarified analyses. The primary conclusion of the paper, that C. violaceum infection in the liver cannot be contained without macrophages, would benefit from clarification.

      We thank the reviewer for their time and effort in evaluating our manuscript.

      While the spatial transcriptomic data generated in the figures are interesting and valuable, they could benefit from additional information. The manual selection of regions of granulomas for analysis could use additional context - was the rest of the liver not sequenced, or excluded for other reasons? Including a healthy liver in the analysis could serve as a control for any lasting effects at the final time point of 21 days.

      We revised the text in the methods section to include additional information about manual selection of regions. The entire tissue section was sequenced, but using H&E as a guide, we manually selected each representative lesion and a surrounding layer of healthy hepatocytes at each timepoint. We agree that an uninfected control could be useful, however we did not include an uninfected mouse in the experiment because we were most interested in the cells that make up the granuloma, not hepatocytes outside the lesion. Additionally, we find that in the 21 DPI timepoint the surrounding hepatocytes appear to have returned to a homeostatic transcriptional state; at 21 DPI the majority of mice have undetectable CFU burdens.

      Providing more context for the scalebars throughout the spatial analyses, such as whether the data are raw counts or normalized based on the number of reads per spatial spot, would be helpful for interpretation, as changes in expression could signal changes in the numbers of cells or changes in the gene expression of cells.

      The scalebars for the SpatialFeaturePlots display the normalized gene expression values. The data are normalized based on the number of reads per spatial spot, using the sctransform method published in (Hafemeister & Satija, 2019). We agree that the changes in expression could result from changes in cell numbers and/or changes in gene expression on a per cell basis. However, the sctransform method is designed to preserve biological variation while minimizing technical effects observed in transcriptomics platforms. Regardless of the heterogeneity of sequencing depth, it is clear from these plots that gene expression changes dynamically over time and space, which was the focus of our analysis. We have updated the figure legends to clarify scalebar units, and revised the methods section. 

      In Figure 4, qualitative measurements are valuable, but having an idea of the raw data for a few of the pursued chemokines/receptors would aid interpretation

      All of the SpatialFeaturePlots utilized to generate Figure 4 have been included in the manuscript, either in the main figures or in the supplemental figures. For example, the SpatialFeaturePlots of Cxcl4, Cxcl9, and Cxcl10 are all in Figure 4 – figure supplement 1.

      In Figure 4 it would also be beneficial to clarify whether the reported values are across all clusters and consider focusing on clusters with the greatest change in expression.

      Figure 4 summarizes the expression of each gene at each timepoint for the entire selected area, independently of cluster identity. Different clusters do show variability in the relative change in expression. To better show these data, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster, many of which include chemokines (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.   

      Figures 5E and F would benefit from clarification regarding the x-axis units and whether the expression levels are summed across all clusters for each time point

      Figures 5E and 5F display the normalized gene expression values for all spots (independent of cluster identity) at each timepoint. We have updated the figure legend to reflect this clarification.

      Additionally, information on the sequencing depth of the samples would be helpful, particularly as shallow sequencing of RNA can result in poor capture of low-expression transcripts.

      We agree with the reviewer that sequencing depth is an additional factor to take into consideration. We have included an additional supplemental figure (Figure 1 – figure supplement 1A-B) to display raw counts spatially at the various timepoints, and within each cluster.

      Regarding the conclusion of the essentiality of macrophages in granuloma formation, it may be prudent to further investigate the role of macrophages versus CCR2. Consideration of experiments deleting macrophages directly, instead of CCR2, could provide more definitive evidence of the necessity of macrophage migration in containing infections.

      While CCR2 is expressed on a number of other cells besides monocytes, it is well-documented that loss of CCR2 results in accumulation of monocytes in the bone marrow and a significant reduction in the blood-monocyte population. As a result, monocytes are not recruited to the site of infection in numerous prior publications in the field; we confirm this as shown by flow cytometry and IHC. Nonetheless, future studies will aim to rescue Ccr2–/– mice via adoptive transfer of monocytes to further show that monocyte-derived macrophages are essential for defense against infection. We also intend to perform clodronate depletion experiments at various timepoints, however, clodronate will also deplete Kupffer cells and has off-target effects on neutrophils. Overall, the established importance of CCR2 for monocyte egress from the bone marrow and our observation that the macrophage ring fails to form give us sufficient confidence to conclude that monocyte-derived macrophages are essential for this innate granuloma.

      Analyzing total cell counts in the liver after infection could provide insight into whether the decrease in the fraction of macrophages is due to decreased numbers or infiltration of other cell types...

      Our flow data suggest that the decrease in macrophages in Ccr2–/– mice is due to both a decrease in macrophage number and an increase in the infiltration of other cell types (namely neutrophils). To better illustrate this, we now include an additional quantification of the total cell counts in the liver and spleen (new Figure 6 – figure supplement 1), which supports our conclusion that Ccr2–/– mice have a defect in granuloma macrophage numbers. We have also repeated the experiment to reach sufficient numbers to perform statistical analysis (revised Figure 6F–K).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil- and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      We thank the reviewer for their thoughtful comments and suggestions.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil- and monocyte-recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result...

      We agree with the reviewer, and the limited solubility of reparixin and other chemokine-receptor inhibitors is a major caveat of this study and others in the field. In future studies, there are several other inhibitors that could be used to further assess the role of CXCR1/2.

      On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control. I would hesitate to agree with the authors' interpretation that their data proves macrophages are serving as a physical barrier from the uninvolved liver. It is possible and likely that they are contributing to bacterial control through direct immunological activity and not simply as a structural barrier.

      We agree that macrophages do not form a physical or structural barrier, a word that implies epithelial-like function. Instead, we agree that macrophages mostly act immunologically. We revised the text to remove the term barrier.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per time point. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability.

      Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous, both between and within individuals. It will be difficult to assert how broadly generalizable the transcriptomic features are to other C. violaceum granulomas...

      We thank the reviewers for highlighting this key difference between granulomas in other infectious diseases, and granulomas induced by C. violaceum. Based on many prior experiments, we observe that C. violaceum-induced granulomas are very reproducible between and within individuals (highlighted in our previous publication). As this is a major advantage of this model system, we chose specific timepoints based on key events that consistently occur in the majority of lesions assessed at each timepoint, allowing us to be confident in the selection of representative granulomas. However, it is worth noting that granulomas within an individual mouse are seeded and resolved somewhat asynchronously. This did indeed affect our spatial transcriptomic data, as the 7 DPI timepoint was not histologically representative of a typical 7 DPI granuloma. Therefore, we excluded the 7 DPI timepoint from our analyses.

      Furthermore, this undermines any opportunity for statistical testing of features between time points, limiting the potential value of the temporal data.

      We agree with the reviewer that there is much more characterization and quantification that can be done. As demonstrated by the abundance of spatial and temporal data for the chemokine family alone, the spatial transcriptomics dataset is rich and will likely supply us with many years of analyses and investigations. Our current approach is to use the spatial transcriptomics dataset as a hypothesis-generating tool, followed by in vivo studies that seek to uncover physiological relevance for our observations. In the current paper, the strength of the spatial transcriptomic data for CCL2, CCL7 and their receptor CCR2 prompted us to study Ccr2–/– mice. These mice then prove the relevance of the spatial transcriptomic data. In regard to conclusions about temporal changes in chemokine expression, in this manuscript we do not make conclusions that CCL2 is important at one timepoint but not another. We are characterizing the broad temporal trends of expression in order to cast a broad net to inform future in vivo studies. There is much work for us to do to explore all the induced chemokines and their receptors.

      Another caveat to these data is the limited or incompletely informative data analysis. The authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, it would be beneficial to see more general analyses considering Visum captures the entire transcriptome. Some important questions that are left unanswered from this study are:

      What major genes defined each spatial cluster?...

      The initial characterization of each spatial cluster was performed in Harvest et al., 2023. In brief, we used a mixture of published single-cell sequencing data, histological-based parameters, and ImmGen to define each cluster. We have not re-stated those methods in the current manuscript, but instead reference our prior paper.

      What were the top differentially expressed genes across time points of infection?...

      Though the top differentially expressed genes for each cluster can be informative in some situations, we chose a more targeted approach because of the obvious importance of chemokines. Nonetheless, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.  

      Did the authors choose to focus on chemokines/receptors purely from a hypothesis perspective or did chemokines represent a major signature in the transcriptomic differences across time points?

      We chose to focus on chemokines because of their obvious importance for recruitment of immune cells. They were also among the highest induced genes in the spatial transcriptome (new Table 4).

      In addition to the absence of deep characterization of the spatial transcriptomic data, the study lacks sufficient quantitative analysis to back up the authors' qualitative assessments...

      See above comment regarding statistical comparisons.

      Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors.

      Several factors make quantification challenging. Lesions grow considerably in size in the first few days of infection, and then shrink in size in the latter days. This makes quantification challenging between timepoints. Radial quantification is also challenging due to the irregular shapes of each granuloma (see comment below for further discussion). Most importantly, the key next experiments are to validate the importance of each chemokine and receptor in vivo. Once we know which ones are the most important, this will justify putting more effort into spatial quantitative analysis and patterning of expression for those chemokines. 

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As the authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The Visium analysis would be strengthened by

      (1) Showing several histology examples of granulomas at each timepoint to help aid the reader in seeing how 'representative' each Visium sample is...

      These histological analyses are performed in our previous manuscript, and indeed were a crucial aspect of the initial characterization of the spatial transcriptomics dataset, which was performed in Harvest et al., 2023. Full liver sections are shown in that paper at each timepoint, and readers can see that the architecture is highly reproducible.

      (2) Validating their results in other tissues, either with Visium or with more targeted assays for their study's key molecules, such as immunohistochemistry or in situ hybridization

      We agree on the importance of validation studies and have plans to perform single-cell RNA sequencing experiments to further enhance resolution. With key genes in mind, we then plan to perform more in vivo studies to assess physiological relevance of upregulated genes in specific cell types.

      At the very least it would be important to validate the expression of CXCL1 and CXCL2 in other tissues and at the protein level, given the importance of those findings

      We think that the reviewer is asking us to validate that CXCL1 and CXCL2 are actually expressed given the negative reparixin data. However, if we do prove that they are expressed, this will not resolve whether they have critical roles in neutrophil recruitment. To prove this, we would need either a better CXCR2 inhibitor or Cxcr2 knockout mice. Therefore, we are saving further exploration for the future. Regarding validating other chemokines, we establish that CCR2 is critical, and we now show by immunofluorescence and ELISA (new Figure 7 – figure supplement 4) that CCL2 is highly expressed in WT mice, and Ccr2–/– mice actually have strongly elevated CCL2 expression at 3 DPI compared to WT mice.

      In Figure 1B, the UMAP here is largely uninformative. To display the clusters, the authors should instead show a heatmap or equivalent visualization of which genes defined each cluster. It would be helpful for the authors to also write out the full name of each cluster before using the abbreviations shown.

      Please see our previous comment about the initial characterization of clusters performed in Harvest et al., 2023, which details the characteristic genes for each cluster. We have written the full names of each cluster in the legend of Figure 1.

      In Figure 1C the authors, use a binary representation of whether a cluster is present or not at a particular time point. However, the spot size is arbitrary, and the colors of the dots are the same as the cluster color code. It is not clear what threshold the authors (or SpatialDimPlots) use to declare a given cluster is present at a given time point. Therefore, this chart does not give any sense of the extent of each cluster's presence at each time. The authors should revisualize these data to display the abundance of each cluster at each timepoint. This could simply be done by adjusting the size of the circle or using a more traditional heatmap.

      We have now updated this graphic to display the extent of a cluster’s presence, with the size of each dot corresponding to the abundance of each cluster.

      In Figures 2 and 3 the authors describe the kinetics of each chemokine by cluster. While the dynamic expression is evident in the images, it is challenging to determine which clusters are driving expression in the absence of cluster annotation in those figures. The authors should support their visual findings with quantification of each factor in each cluster across time points.

      In Figure 5, violin plots are shown for Cxcl1 and Ccl2 that depict gene expression by each cluster. However, because each capture area is approximately 50 µm in diameter, the data do not achieve single-cell resolution and are not as informative as one would hope. Therefore, violin plots for each chemokine were not shown, though we have generated these graphics. We did not add these graphics to the revision because we did not think readers would generally want to see several pages of violin plots in the supplement. As mentioned, we plan to do single-cell RNA sequencing to further assess chemokine expression by each cell type present within the granulomas at key timepoints.

      With respect to the lack of spatial analysis, the authors describe certain transcript signals (ie. peripheral region versus central region of the granuloma) across each lesion. To back up these qualitative assertions, the authors could use line profiles from the center of each granuloma to the outside to plot the variation in expression of each transcript over radial space. This would provide a more direct way to determine the spatial coordination between various transcripts.

      We considered using line profiles to quantify spatial variation within each lesion at each timepoint. However, this was exceptionally challenging due to the asymmetrical nature of some lesions, and the size discrepancy at different timepoints as the granulomas grow (during infection) and shrink (during resolution). When attempting to decide where to draw the line profiles, we determined that this approach did not enhance our analyses beyond using the cluster overlay and H&E to identify and interrogate different clusters.

      The data visualization in Figure 4 seems unnecessarily confusing. The authors put the transcriptomic signal into categories of 'absent', 'low', 'medium', and 'high.' Why not simply use a continuous scale? The data would also benefit from hierarchical clustering of the heatmap rows to highlight chemokines and their receptors with similar expression patterns across time.

      We considered using a continuous scale as suggested by the reviewer. However, we chose not to create a continuous scale because quantitation is challenging due to the size changes in the lesions over time, such that larger lesions have greater inclusion of surrounding hepatocytes as well as necrotic cores, which would dilute the signal if averaged with the active immunologic granuloma zones. Figure 4 was intended to simplify the entirety of the SpatialFeaturePlots in an easy-to-digest manner, to aid in hypothesis generation as we consider the potential function of each chemokine and receptor in this model. We chose to organize each chemokine ligand based on family, maintaining a numerical order to allow Figure 4 to serve as a quick reference for anyone who is interested in a particular chemokine ligand or receptor.

      Do the authors feel confident in the transcriptomic signal coming from regions of necrosis? Given that many of their bright signals are coming from within clusters annotated as necrosis or necrosis-adjacent this raises an important technical consideration. Can the authors use the H&E image to estimate the cellular density (based on nuclear counts) in each region annotated by Visium? Are there any studies supporting the accurate performance of spatial transcriptomic methods in necrosis? Necrosis can be a source of non-specific binding during in situ hybridization assays.

      The reviewer raises a good point. A defining characteristic of the areas of necrosis is the lack of defined cell borders, with faded or absent nuclei. In these regions, it is impossible to estimate cellular density. Given these concerns, we have included an additional figure (new Figure 1 – figure supplement 1A-B) to display raw counts in each cluster across all timepoints. Though regions of necrosis do display lower read quantity compared to other areas, we are still confident in the positive transcriptomic signal coming from adjacent regions because there are plenty of negative examples in which expression is not detected. In other words, temporal and spatial upregulation of key genes is still observed in the tissues, and future experiments will aim to interrogate the physiological relevance of each gene, while validating the spatial transcriptomics data with other methodologies.

      The methods should include a much more detailed description of the tissue preparation and collection for the Visium experiment. The section on the computational analysis of the Visium data is also extremely limited. At a minimum, the authors should include details on how they performed clustering of the Visium regions.

      The detailed description of tissue preparation, computational analysis, and clustering is in our previous manuscript, from which this dataset originates. We can add a direct quote of the methodology if the reviewer requests.

      The cluster labels in Figure 5 A-B are very difficult to see. Furthermore, it would help if the authors displayed the annotated cluster names (ie. Those shown in 5C) instead of their numerical coding for a more direct interpretation of the data.

      We agree and have updated this figure with annotated cluster names.

      The scale bars in Figure 7 are very difficult to see.

      The scale bars in histology images were kept small intentionally so as not to occlude data, and eLife is an online-only, digital media platform which allows readers to sufficiently zoom on high-resolution histology images. We have increased the DPI resolution for histology images to further aid in visualization.

      The information presented in Tables 2 and 3 is greatly appreciated and will really help guide the reader through the analyses.

      We assembled this information for our own learning about chemokines and hope that it is useful for the reader.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …the degree to which the predictions can vary according to environmental composition remains difficult to quantify, and the work does not address the sensitivity of the modeling predictions beyond a simulated medium containing 33 root exudates. I find this especially important given that relatively few (84 of 243) species were predicted to grow even after cross-feeding, suggesting that a richer medium could lead to different interaction network structures. While the authors do state the importance of environmental composition and have carefully designed an in silico medium, I believe that simulating a broader set of resource pools would add necessary insight into both the predictive power of the models themselves and trophic interactions in the rhizosphere more generally.

      The original analyses were indeed focused on a single well-defined environment supporting the growth of only a subset of the species. We have added a paragraph to the discussion section dealing with the potential limitations of this approach. 

      On line 289 we write:

      "Overall, the successive iterations connected 84 out of 243 native members of the apple rhizosphere GSMM community via trophic exchanges. The inability of the remaining bacteria to grow, despite being part of the native root microbiome, possibly reflects the selectiveness of the root environment, which fully supports the nutritional demands of only part of the soil species, whereas specific compounds that might be essential to other species are less abundant1. It is important to note that the specific exudate profile used here represent a snapshot of the root metabolome as root secretion-profiles are highly dynamic, reflecting both environmental and plant developmental conditions. A possible complementary explanation to the observed selective growth might be the partiality of our simulation platform, which examined only plant-bacteria and bacteria-bacteria interactions while ignoring other critical components of the rhizosphere system such as fungi, archaea, protists and mesofauna, as well as less abundant bacterial species, components all known to metabolically interact2. Finally, the MAG collection, while relatively substantial, represents only part of the microbial community. Accordingly, the iterative growth simulations represent a subset of the overall hierarchical-trophic exchanges in the root environment, necessarily reflecting the partiality of the dataset."

      In addition, we have tried to better explain the advantages of a limited/defined medium to such an analysis. On Line 231 we add:

      "By avoiding the inclusion of non-exudate organic metabolites, the true-to-source rhizosphere environment was designed to reveal the hierarchical directionality of the trophic exchanges in soil, as rich media often mask various trophic interactions taking place in native communities3"

      More generally, beyond the above justification of our specific medium selection, we agree that simulating a broader set of resource pools would contribute to a more comprehensive understanding of the trophic interactions. Therefore, we conducted the analysis in an additional environment, in which cellulose was used as an input. We were able to follow its well-documented degradation via multiple steps, conducted by different community members, to serve as a benchmark to our suggested framework. 

      On line 357 we add:

      "To validate the ability of MCSM to capture trophic dependencies and succession, we further tested whether it can trace the well-documented example of cellulose degradation - a multi-step process conducted by several bacterial strains that go through the conversion of cellulose and its oligosaccharide derivatives into ethanol, acetate and glucose, which are all eventually oxidized to CO24. Here, the simulation followed the trophic interactions in an environment provided with cellulose oligosaccharides (4 and 6 glucose units) on the 1st iteration (Supp. Table 3). The formed trophic successions detected along iterations captured the reported multi-step process (Supp.

      Fig.7)." 

      Finally, we have included additional text regarding the challenge of defining our simulation environment in the Discussion section. 

      On line 532 we add:

      "In the current study, the root environment was represented by a single pool of resources (metabolites). As genuine root environments are highly dynamic and responsive to stimuli, a single environment can represent, at best, a temporary snapshot of the conditions. Conductance of simulations with several sets of resource pools (e.g., representing temporal variations in exudation profile) can add insights regarding their effect on trophic interactions and community dynamics. In parallel, confirming predictions made in various environments will support an iterative process that will strengthen the predictive power of the framework and improve its accuracy as a tool for generating testable hypotheses. Similarly, complementing the genomicsbased approaches used here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."

      And we add in Line 520:

      "For these reasons, among others, the framework presented here is not intended to be used as a stand-alone tool for determining microbial function. The framework presented is designed to be used as a platform to generate educated hypotheses regarding bacterial function in a specific environment in conjunction with actual carbon substrates available in the particular ecosystem under study. The hypotheses generated provide a starting point for experimental testing required to gain actual, targeted and feasible applicable insights6,7. While recognizing its limitations, this framework is in fact highly versatile and can be used for the characterization of a variety of microbial communities and environments. Given a set of MAGs derived from a specific environment and environmental metabolomics data, this computational framework provides a generic simulation platform for a wide and diverse range of future applications." 

      Reviewer #2 (Public review):

      There are two main drawback approaches like the one described here, both related only partially to the authors' work yet with great impact in the presented framework. First, the usage of automatic GSMM reconstruction requires great caution. It is indicative of how the semicurated AGORA models are still considered reconstructions and expect the user to parameterize those in a model. In this study, CarveMe was used. CarveMe is a well-known tool with several pros [1]. Yet, several challenges need to be considered when using it [2]. For example, the biomass function used might lead to an overestimation of auxotrophies. Also, as its authors admit in their reply paper, CarveMe does gap fill in a way [3]; models are constructed to ensure no gaps and also secure a minimum growth. However, curation of such a high number of GSMMs is probably not an option. Further, even if FVA is way more useful than FBA for the authors' aim, it does not yet ensure that when a species secretes one compound (let's say metabolite A), the same flux vector, i.e. the same metabolic functioning profile, secretes another compound (metabolite B) at the same time, even if the FVA solution suggests that metabolite B could be secreted in general.

      We thank Reviewer #2 for highlighting this key limitation of our analysis. Below and in the 'recommendations to authors' section we address these concerns. 

      Concerning the first point raised (models' accuracy) we have now clearly acknowledged in the text the limitations of using an automated GSMM reconstruction tool such as CarveMe. More generally, the framework applied here was built in order to meet the challenges of analyzing highthroughput data while acknowledging the inherent potential of introducing inaccuracies. Pros & cons are now discussed. 

      On line 507 we write:

      "Moreover, the use of an automatic GSMM reconstruction tool (CarveMe8), though increasingly used for depicting phenotypic landscapes, is typically less accurate than manual curation of metabolic models9. This approach typically neglects specialized functions involving secondary metabolism10 and introduces additional biases such as the overestimation of auxotrophies11,12. Nevertheless, manual curation is practically non-realistic for hundreds of MAGs, an expected outcome considering the volume of nowadays sequencing projects. As the primary motivation of this framework is the development of a tool capable of transforming high-throughput, low-cost genomic information into testable predictions, the use of automatic metabolic network reconstruction tools was favored, despite their inherent limitations, in pursuit of addressing the necessity of pipelines systematically analyzing metagenomics data." 

      Regarding using FVA solutions, indeed such solutions return all potential metabolic fluxes in GSMMs (ranges of all fluxes satisfying the objective function, which by default is set to biomass increase) in a given environment. However, as indicated by the reviewer, predicted fluxes do not necessarily co-occur (i.e., when a metabolite is secreted another metabolite is not necessarily secreted too), yet, they provide the full set of potential solutions (unlike the single solution provided by FBA). A possible strategy to reduce inflated predictions provided by FVA and further constrain the solution space (reduce the set of metabolic fluxes) can be the incorporation of additional `omics data layers, as for example was done in the work of Zampieri et al5. Such approach could allow for instance limiting active reactions (blocking fluxes) from the network reconstructions if not coming to play in situ, and therefore impose further constraints and narrow the solution space. We now refer in the text to this limitation and to potential routes to overcome it. 

      On line 541 we now write:

      Similarly, complementing the genomics-based approaches done here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5.  

      Reviewer #3 (Public review):

      When presenting a computational framework, best practices include running it on artificial (synthetic) data where the ground truth is known and therefore the precision and accuracy of the method may be assessed. This is not an optional step, the same way that positive/negative controls in lab experiments are not optional. Without this validation step, the manuscript is severely limited. The authors should ask themselves: what have we done to convince the reader that the framework actually works, at least on our minimal synthetic data? 

      Thank you for this suggestion. To validate the ability of MCSM to capture trophic succession, we conducted an additional analysis testing whether it can track the well documented example of cellulose degradation - a multi-step process conducted by several bacterial strains. This example has been included in the manuscript to serve as a case study (i.e. positive control) for metabolic interactions occurring within the bacterial community (Supp. Fig. 7). 

      On line 357 we add:

      "To validate the ability of MCSM to capture trophic dependencies and succession, we further tested whether it can track the well-documented example of cellulose degradation - a multi-step process conducted by several bacterial strains that go through the conversion of cellulose and its oligosaccharide derivatives into ethanol, acetate and glucose, which are all eventually oxidized to CO24. Here, the simulation followed the trophic interactions in an environment provided with cellulose oligosaccharides (4 and 6 glucose units) on the 1st iteration (Supp. Table 3). The formed trophic successions detected along iterations captured the reported multi-step process (Supp. Fig.

      7)."  

      "Supplementary Figure 7. Application of MCSM over the process of cellulose decomposition as described by Kato et al4. 5-partite network exhibiting the uptake of cellulose oligomers (4 and 6 units of connected D-glucose) by primary decomposers, through secretion of intermediate compounds and their metabolization by secondary decomposers to CO2. Distribution of phyla of primary and secondary decomposers is denoted by pie charts. Though MAGs were not constructed for the original species as in Kato et al., among the primary consumers, species corresponding to the Acidobacteria (Acidobacteriales)13, Actinobacteria14, Bacteriodetes15, Proteobacteria (Xanthomonadales)16 and Verrucobacteria17 groups are found to be capable of degrading cellulose compounds via enzymatic mechanisms."

      More generally, beyond the above addition, the relevance of the framework to the analysis of the data is discussed throughout the analysis (in the original version of the manuscript). We have scrutinized each of our observations in light of current available information and provided a corroborating evidence as well as a few discrepancies for multiple steps in the analysis.  Examples include the following discussions:

      On line 312, we discuss the biological relevance of taxonomic classes classified as primary versus secondary degraders

      "As in the full GSMM data set (Community bar, Fig. 3C), most of the species which grew in the 1st iteration belonged to the phyla Acidobacteriota, Proteobacteria, and Bacteroidota. This result concurred with findings from the work of Zhalnina et al, which reported that bacteria assigned to these phyla are the primary beneficiaries of root exudates18. Species from three out of the 17 phyla that did not grow in the first iteration - Elusimicrobiota, Chlamydiota, and Fibrobacterota, did grow on the 2nd iteration (Fig. 3C). Members of these phyla are known for their specialized metabolic dependencies. Such is the case for example with members of the Elusimicrobiota phylum, which include mostly uncultured species whose nutritional preferences are likely to be selective19.

      At the order level, bacteria classified as Sphingomonadales (class Alphaproteobacteria), a group known to include typical inhabitants of the root environment20, grew in the initial Root environment. In comparison, other root-inhabiting groups including the orders Rhizobiales and Burkholderiales_20, did not grow in the first iteration. _Rhizobiales and Burkholderiales did, however, grow in the second and third iterations, respectively, indicating that in the simulations, the growth of these groups was dependent on exchange metabolites secreted by other community members (Supp. Fig. 4)."

      On line 331, we provide support to the classification of specific metabolites as exchange molecules

      "Overall, 158 organic compounds were secreted throughout the MCSM simulation (from which 12 compounds overlapped with the original exudate medium). These compounds varied in their distribution and were mapped into 12 biochemical categories (Fig. 3D). Whereas plant secretions are a source of various organic compounds, microbial secretions provide a source of multiple vitamins and co-factors not secreted by the plant. Microbial-secreted compounds included siderophores (staphyloferrin, salmochelin, pyoverdine, and enterochelin), vitamins (pyridoxine, pantothenate, and thiamin), and coenzymes (coenzyme A, flavin adenine dinucleotide, and flavin mononucleotide) – all known to be exchange compounds in microbial communities21,22. In addition, microbial secretions included 11 amino acids (arginine, lysine, threonine, alanine, serine, phenylalanine, tyrosine, leucine, glutamate, isoleucine, and methionine), also known as a common exchange currency in microbial communities23. Some microbial-secreted compounds, such as phenols and alkaloids, were reported to be produced by plants as secondary metabolites24,25. Additional information regarding mean uptake and secretion degrees of compounds classified to biochemical groups is found in Supp. Fig. 5."

      On line 432, we provide corroborative support to the classification of exudates as associated with beneficial/non beneficial root communities

      "Notably, the S-classified root exudates included compounds reported to support dysbiosis and ARD progression. For example, the S-classified compounds gallic acid and caffeic acid (3,4-dihidroxy-trans-cinnamate) are phenylpropanoids – phenylalanine intermediate phenolic compounds secreted from plant roots following exposure to replant pathogens26. Though secretion of these compounds is considered a defense response, it is hypothesized that high levels of phenolic compounds can have autotoxic effects, potentially exacerbating ARD. Additionally, it was shown that genes associated with the production of caffeic acid were upregulated in ARD-infected apple roots, relative to those grown in γ-irradiated ARD soil27,28, and that root and soil extracts from replant-diseased trees inhibited apple seedling growth and resulted in increased seedling root production of caffeic acid29."

      On line 446, we provide a supporting evidence to the classification of secreted compounds as associated with beneficial/non beneficial root communities

      "Several secreted compounds classified as healthy exchanges (H) were reported to be potentially associated with beneficial functions. For instance, the compounds L-Sorbose (EX_srb__L_e) and Phenylacetaladehyde (EX_pacald_e), both over-represented in H paths (Fig. 5C), have been shown to inhibit the growth of fungal pathogens associated with replant disease30,31.

      Phenylacetaladehyde has also been reported to have nematicidal qualities32."

      On line 453 we discuss the correspondence of specific exudate uptakes and compound secretions via specific subnetwork motifs (PM) and their literature/experimental evidence 

      "Combining both exudate uptake data and metabolite secretion data, the full H-classified PM path 4-Hydroxybenzoate; GSMM_091; catechol (Fig. 4C; the consumed exudate, the GSMM, and the secreted compound, respectively) provides an exemplary model for how the proposed framework can be used to guide the design of strategies which support specific, advantageous exchanges within the rhizobiome. The root exudate 4-Hydroxybenzoate is metabolized by GSMM_091 (class Verrucomicrobiae, order Pedosphaerales) to catechol. Catechol is a precursor of a number of catecholamines, a group of compounds which was recently shown to increase apple tolerance to ARD symptoms when added to orchard6,33. This analysis (PM; Fig 4C), leads to formulating the testable prediction that 4-Hydroxybenzoate can serve as a selective enhancer of catecholamine synthesizing bacteria associated with reduced ARD symptoms, and therefore serve as a potential source for indigenously produced beneficial compounds."

      Moreover, we perceive our analysis as a strategy for integrating high throughput genomic data into testable predictions allowing narrowing the solution space while acknowledging potential inaccuracies that are inherent to the analysis. We have revised the text in order to clearly acknowledge this limitation.

      On line 497 we write: 

      "The framework we present is currently conceptual."

      On line 520 we write: 

      "For these reasons, among others, the framework presented here is not intended to be used as a stand-alone tool for determining microbial function. The framework presented is designed to be used as a platform to generate educated hypotheses regarding bacterial function in a specific environment in conjunction with actual carbon substrates available in the particular ecosystem under study. The hypotheses generated provide a start point for experimental testing required to gain actual, targeted and feasibly applicable insights6,7."

      On line 532 we add: 

      "In the current study, the root environment was represented by a single pool of resources (metabolites). As genuine root environments are highly dynamic and responsive to stimuli, a single environment can represent, at best, a temporary snapshot of the conditions. Conductance of simulations with several sets of resource pools (e.g., representing temporal variations in exudation profile) can add insights regarding their effect on trophic interactions and community dynamics. In parallel, confirming predictions made in various environments will support an iterative process that will strengthen the predictive power of the framework and improve its accuracy as a tool for generating testable hypotheses. Similarly, complementing the genomicsbased approaches used here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."

      Recommendations for the authors:

      Reviewer #1( Recommendations for the authors):

      (1) Line 219: "Feasibility" - this term/concept may be difficult to understand for readers unfamiliar with GSMMs. I would recommend either clarifying or rephrasing, perhaps as "simulations confirmed the existence of a feasible solution space for all the 243 models, as well as their capacity to predict growth in the respective environment."

      Thanks, done. We have modified this section as suggested (line 221). 

      (2) Line 244: How does MCSM fit within/build upon existing frameworks that simulate patterns of niche construction and cross-feeding with constraint-based modeling?

      This is now addressed. On line 250 we write:  

      "Unlike tools designed for modelling microbial interactions34,35, MCSM bypasses the need for defining a community objective function as the growth of each species is simulated individually. Trophic interactions are then inferred by the extent to which compounds secreted by bacteria could support the growth of other community members."

      (3) Figure 4A: While illustrating the general complexity of the predicted trophic interactions, the density of the network makes it very difficult to interpret specific exchanges. Moreover, the naming conventions of the metabolites make it difficult to understand what they represent. I would recommend either restructuring the graph such that the label of each node is legible, or removing the labels altogether.

      Thanks, done. Labels were removed and a zoom-in-window to the exchanges highlighted in Figure 4C were added. Caption was revised to indicate that node colors correspond to differential abundance classification of GSMMs in the different plots (H, S, NA are Healthy, Sick, Not-Associated, respectively).

      Reviewer #2 (Recommendations for the authors):

      CarveMe solves a Mixed Integer Linear Program (MILP) that enforces network connectivity, thus requiring gapless pathways. It's puzzling how to deal with such a great number of GSMMs that is for sure, especially when coming from such an environment as soil and the vast majority of their corresponding MAGs represent most likely novel taxa. One alternative approach for using CarveMe might be to use the rich medium as a medium to gap-fill during the reconstruction. In this case, the gene annotation scores that CarveMe calculates in its initial step, are used to prioritise the reactions selected for gap-filling. This would lead to a new series of challenges but might be a useful comparison with the current GSMMs of the study.

      Though indeed CraveMe includes a gap-filling option, here we have purposely avoided the gapfilling option as we aimed to adhere to genomic content of the corresponding genomes and to avoid masking their metabolic dependencies emerging due to their incompleteness. This is noted in the Methods section, which we revised to emphasize the adherence to the genomic content of the models: 

      On line 615 we now write:

      "All GSMMs were drafted without gap filling in order to adhere to genomic content and to avoid masking metabolic co-dependencies51"

      More generally, we now refer to the limitation of automatic reconstruction in the context of the current analysis. On line 507 we write:

      "Moreover, the use of an automatic GSMM reconstruction tool (CarveMe8), though increasingly used for depicting phenotypic landscapes, is typically less accurate than manual curation of metabolic models9. This approach typically neglects specialized functions involving secondary metabolism10 and introduces additional biases such as the overestimation of auxotrophies11,12. Nevertheless, manual curation is practically non-realistic for hundreds of MAGs, an expected outcome considering the volume of nowadays sequencing projects. As the primary motivation of this framework is the development of a tool capable of transforming high-throughput, low-cost genomic information into testable predictions, the use of automatic, semi-curated, metabolic network reconstruction tools was favored, despite their inherent limitations, in pursuit of developing pipelines for the systematic analysis of metagenomics data."

      Thermodynamically infeasible loops have been a challenge in constraint-based analysis [1].

      However, for the case of FBA and FVA time efficient implementations are already available. Therefore, I would suggest using the loopless flag of the cobrapy package when performing FVA. 

      Also, it would be nice to show/discuss how many exchange reactions each GSMM includes and what is the number of those with at least a non-zero minimum or maximum in the FVA using each of the three media.

      Done. In Supplementary Figure 4, we added a graphic summary of active FVA ranges for each GSMM in the three different environments (exchange reactions, non-zero flux). Additionally, we analyzed a subset of models and compared their regular FVA results vs loopless FVA results.

      On line 217 we write:

      "The number of active exchange fluxes in each medium corresponds with the respective growth performances displaying noticably higher number of potentially active fluxes in the rich environment (also when applying loopless FVA) (Supp. Fig. 4). Overall, Simulations confirmed the existence of a feasible solution space for  all the 243 models as well as their capacity to predict growth in the respective environemnt (Supp. Data 5)."

      "Supplementary Figure 4. FVA performances of GSMMs in different environments (Supp. Fig.

      3; Supp. Data 5). A. Distribution of potentially active exchange reactions (non-zero minimum FVA flux) in the different environments. Solid line inside each violin indicates the interquartile range (IQR). White point in IQR indicates the median value. Whiskers extending from the IQR indicate the range within 1.5 times the IQR from the quartiles. Violin width at a given value represents the density of data points at that value. B. Loopless FVA scores compared to regular FVA for models in the 3 different environments. Bars indicate the count of active fluxes (nonzero minimum FVA flux). Only a subset of models was used for this analysis."

      This brings us to the main challenge of your framework in my opinion: FVA returns the minimum and the maximum a flux may get. However, it does not ensure that when a metabolite is being secreted, another does the same too. That could lead to an overrepresentation of secreted metabolites after each iteration. To my understanding, unbiased methods focusing on metabolite exchanges would be a much better alternative for such questions. Unbiased constraint-based methods are known for requiring essential computational requirements, yet when focusing on specific parts of the models, recent implementations support them. A great showcase of such techniques is presented in [2].

      Indeed, FVA solutions return all potential metabolic fluxes in GSMMs (ranges of all fluxes satisfying the objective function, which by default is set to biomass increase) but they do not ensure that all fluxes actually co-occur (i.e., when a metabolite is secreted necessarily another metabolite is secreted too). However, though FVA solutions do not necessarily ensure cooccurrence regarding secretion and uptake, they provide a broader metabolic picture (the full set of potential solutions), unlike the arbitrary single solution provided by FBA, which is limited in providing information about potential secretions and uptakes in a specific environment. Here, we tried to elucidate the connection between a specific environment (root exudates) and the growth and metabolic capabilities of native bacteria. To the best of our understanding,  unbiased approaches (such as the one displayed in Wedmark et al.36) are not environment dependent but rather calculate all possible metabolic elements and routes within a metabolic network. Therefore, using FVA is well adapted to explore environment-dependent growth. The sensitivity of FVA predicted active fluxes to the environments is now also implied by Sup. Fig. 3B demonstrating the number of potential active fluxes is proportional to growth performances.  In addition, inquiring all possible metabolic routes across a large dataset of hundreds of MAGS, is central to the current analysis, thus the easy implementation of FVA further justifies its use in the current study.

      An alternative strategy to reduce inflated FVA predictions and further constrain the solution space of predicted active fluxes can be the incorporation of additional layers of `omics data, as for example was done in the work of Zampieri et al5. Such approach could allow for instance removing reactions from the network reconstructions if not coming to play in situ, and therefore impose further constraints and narrow down the solution space. Currently, the complexity of the soil community might impede or at least constrain a high coverage recovery of transcriptomic data, though future works utilizing additional layers of `omics data are expected to significantly reduce the number of potential solutions and thus improve the accuracy of GEMs predictions. 

      This is now discussed in the text. In line 541 we write:

      "Similarly, complementing the genomic-based approaches done here, with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."  

      In case it was the first version of CheckM used, the authors could consider repeating this check with CheckM2. As they state in line 293, Archaea may play an essential role in the community. Yet, among the high-quality MAGs only one corresponded to Archaea. However, that is quite possible to be the case because CheckM underestimates the completeness of archaeal genomes. If CheckM2 suggests that archaeal MAGs could be used, these would probably benefit a lot for the aim of the study.

      The analysis was conducted with the first version of CheckM to assess MAGs quality. In future analyses we will use CheckM2. However, also before MAG recovery, we already know from the work of Beirhu et al., that Archaea species have a very low representation in the metagenomics data used here (Berihu et al., Additional data 2. Supp. fig. 4; "others" group)6, with less than 0.5% of the contigs mapped to archaeal genomes. The overall taxonomic distribution of the high-quality MAGs was compared to the distribution inferred from the non-binned data (contigs) and amplicon sequencing and the three different data sets are very similar (Fig. 2). 

      On line 130 we write:

      "Overall, the taxonomic distribution of the MAG collection corresponded with the profile reported for the same samples using alternative taxonomic classification approaches such as 16S rRNA amplicon sequencing and gene-based taxonomic annotations of the non-binned shotgun contigs

      (Fig. 2B)."

      The visualisation of the network in Figure 4A is hard to follow. An alternative could be a 5partite plot having taxa in columns one, three, and five and compounds in the other two. An alternative visualisation is necessary.

      The full list of the 5 and 3 partite graphs is provided in supplementary data 10 (also noted in the figure legend now). Figure 4 was revised to improve its visualization. Labels were removed and a zoom in to 5 and 3 partite plots were added (PMM and PM subnetworks, respectively). 

      Line 509: If I get the point of the authors right, they refer to the "from shotgun data to GEMs" approach. I would suggest skipping this statement. Here is a recent study implementing this: https://doi.org/10.1016/j.crmeth.2022.100383.

      Thank you for your comment and reference. The intention behind the phrase in line 509 (in previous version) was to refer to going from metagenomics data to GEMs in soil-rhizosphere microbiome while linking environmental inputs (crop-plants exudates metabolomics data) and the agricultural-related metabolic function of bacteria. This phrase has been modified to clearly make a more modest claim while acknowledging other related studies.

      On line 548 we write

      "Where recent studies begin to apply GSMM reconstruction and analysis starting from MAGs5,37 , this work applies the MAGs to GSMMs approach to conduct a large-scale CBM analysis over highquality MAGs derived from a native rhizosphere and explore the complex network of interactions in light of the functioning of the respective agro-ecosystem. "

      Line 820: Reference format is broken.

      Corrected.

      In the caption of Figure 4, please add the meaning of H, S, and NA so it is selfexplanatory.

      Done. In Figure 4 legend we added:

      "Node colors correspond to differential abundance classification of GSMMs in the different plots; H, S, NA are Healthy, Sick, Not-Associated, respectively."

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 4A is unreadable. It is not clear what insight the reader could gain by examining this figure.

      Thanks. Figure was revised. Labels were removed and a zoom-in-window to the exchanges highlighted in Figure 4C were added. Caption was revised to indicate that node colors correspond to differential abundance classification of GSMMs in the different plots (H, S, NA are Healthy, Sick, Not-Associated, respectively).

      (2) In Figure 5, it is not apparent what the units of "prevalence" are, that is, what is the scale. What does 140 mean? How does that compare to 350?

      Thanks. Prevalence in the context of Figure. 5B,C refers to the count of the compounds in each category (significantly affiliated with either healthy or symptomized soils) in sub-network motifs corresponding to this DA classification. We revised the figures (Y axes) and legend to be more specific (B: # of exudates; C: # of secreted compounds).

      "B. Bar plot indicating the number of exudates significantly associated with H or S-classified PM sub-networks (Hypergeometric test; FDR <= 0.05; green: healthy-H, red: sick-S). C. Bar plots indicate the number of secreted compounds in PM sub-networks, which are significantly associated with H-classified (upper, colored green), or S-classified (lower, colored red) (Hypergeometric test; FDR <= 0.05)."

      References

      (1) Buée, M., de Boer, W., Martin, F., van Overbeek, L. & Jurkevitch, E. The rhizosphere zoo: An overview of plant-associated communities of microorganisms, including phages, bacteria, archaea, and fungi, and of some of their structuring factors. Plant Soil 321, 189– 212 (2009).

      (2) Bardgett, R. D. & Van Der Putten, W. H. Belowground biodiversity and ecosystem functioning. Nature 515, 505–511 (2014).

      (3) Opatovsky, I. et al. Modeling trophic dependencies and exchanges among insects’ bacterial symbionts in a host-simulated environment. BMC Genomics 19, 1–14 (2018).

      (4) Kato, S., Haruta, S., Cui, Z. J., Ishii, M. & Igarashi, Y. Stable coexistence of five bacterial strains as a cellulose-degrading community. Appl. Environ. Microbiol. 71, 7099–7106 (2005).

      (5) Zampieri, G., Campanaro, S., Angione, C. & Treu, L. Metatranscriptomics-guided genomescale metabolic modeling of microbial communities. Cell Reports Methods 3, 100383 (2023).

      (6) Berihu, M. et al. A framework for the targeted recruitment of crop ‑ beneficial soil taxa based on network analysis of metagenomics data. Microbiome 1–21 (2023) doi:10.1186/s40168-022-01438-1.

      (7) Dhakar, K. et al. Modeling-Guided Amendments Lead to Enhanced Biodegradation in Soil. mSystems 7, (2022).

      (8) Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).

      (9) Henry, C. S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982 (2010).

      (10) Freilich, S. et al. Competitive and cooperative metabolic interactions in bacterial communities. Nat. Commun. 2, (2011).

      (11) Price, M. Erroneous predictions of auxotrophies by CarveMe. Nat. Ecol. Evol. 7, 194–195 (2023).

      (12) Machado, D. & Patil, K. R. Reply to: Erroneous predictions of auxotrophies by CarveMe. Nat. Ecol. Evol. 7, 196–197 (2023).

      (13) Kulichevskaya, I. S. et al. Acidicapsa borealis gen. nov., sp. nov. and Acidicapsa ligni sp. nov., subdivision 1 Acidobacteria from Sphagnum peat and decaying wood. Int. J. Syst. Evol. Microbiol. 62, 1512–1520 (2012).

      (14) Depart-, M. & Building, L. S. Lignocellulose-degrading actinomycetes. 46, 145–163 (1987).

      (15)Thomas, F., Hehemann, J. H., Rebuffet, E., Czjzek, M. & Michel, G. Environmental and gut Bacteroidetes: The food connection. Front. Microbiol. 2, 1–16 (2011).

      (16) Dow, J. M. & Daniels, M. J. Pathogenicity determinants and global regulation of pathogenicity of Xanthomonas campestris pv. campestris. Curr. Top. Microbiol. Immunol. 192, 29–41 (1994).

      (17) Bergmann, G. T. et al. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil Biol. Biochem. 43, 1450–1455 (2011).

      (18) Zhalnina, K. et al. Dynamic root exudate chemistry and microbial substrate preferences drive patterns in rhizosphere microbial community assembly. Nat. Microbiol. 3, 470–480 (2018).

      (19) Uzun, M. et al. Recovery and genome reconstruction of novel magnetotactic Elusimicrobiota from bog soil. ISME J. 1–11 (2022) doi:10.1038/s41396-022-01339-z.

      (20) Lei, S. et al. Analysis of the community composition and bacterial diversity of the rhizosphere microbiome across different plant taxa. Microbiologyopen 8, 1–10 (2019).

      (21) Ghosh, S. K., Banerjee, S. & Sengupta, C. Bioassay, characterization and estimation of siderophores from some important antagonistic fungi. J. Biopestic. 10, 105–112 (2017).

      (22) Lu, X., Heal, K. R., Ingalls, A. E., Doxey, A. C. & Neufeld, J. D. Metagenomic and chemical characterization of soil cobalamin production. ISME J. 14, 53–66 (2020).

      (23) Mee, M. T., Collins, J. J., Church, G. M. & Wang, H. H. Syntrophic exchange in synthetic microbial communities. Proc. Natl. Acad. Sci. U. S. A. 111, (2014).

      (24) Justin, K., Edmond, S., Ally, M. & Xin, H. Plant Secondary Metabolites: Biosynthesis, Classification, Function and Pharmacological Properties. J. Pharm. Pharmacol. 2, 377–392 (2014).

      (25) Yang, W. et al. A Genomic Analysis of Bacillus megaterium HT517 Reveals the Genetic Basis of Its Abilities to Promote Growth and Control Disease in Greenhouse Tomato. Genet. Res. (Camb). 2022, (2022).

      (26) Balbín-Suárez, A. et al. Root exposure to apple replant disease soil triggers local defense response and rhizoplane microbiome dysbiosis. FEMS Microbiol. Ecol. 97, 1–14 (2021).

      (27) Weiß, S., Liu, B., Reckwell, D., Beerhues, L. & Winkelmann, T. Impaired defense reactions in apple replant disease-Affected roots of Malus domestica ‘M26’. Tree Physiol. 37, 1672–1685 (2017).

      (28) Weiß, S., Bartsch, M. & Winkelmann, T. Transcriptomic analysis of molecular responses in Malus domestica ‘M26’ roots affected by apple replant disease. Plant Mol. Biol. 94, 303– 318 (2017).

      (29) Sun, N. et al. Effects of Organic Acid Root Exudates of Malus hupehensis Rehd. Derived from Soil and Root Leaching Liquor from Orchards with Apple Replant Disease. Plants 11, (2022).

      (30) Howell, C. R. Seed Treatment with L-Sorbose to Control Damping-Off or Cotton Seedlings by Rhizoctonia solani. Phytopathology 68, 1096 (1978).

      (31) Zou, C. S., Mo, M. H., Gu, Y. Q., Zhou, J. P. & Zhang, K. Q. Possible contributions of volatile-producing bacteria to soil fungistasis. Soil Biol. Biochem. 39, 2371–2379 (2007).

      (32) Gomes, V. A. et al. Activity of papaya seeds (Carica papaya) against Meloidogyne incognita as a soil biofumigant. J. Pest Sci. (2004). 93, 783–792 (2020).

      (33) Gao, T. et al. Exogenous dopamine and overexpression of the dopamine synthase gene MdTYDC alleviated apple replant disease. Tree Physiol. 41, 1524–1541 (2021).

      (34) Diener, C., Gibbons, S. M. & Resendis-Antonio, O. MICOM: Metagenome-Scale Modeling To Infer Metabolic Interactions in the Gut Microbiota. mSystems 5, (2020).

      (35) Dukovski, I. et al. A metabolic modeling platform for the computation of microbial ecosystems in time and space (COMETS). Nat. Protoc. 16, 5030–5082 (2021).

      (36) Katarina Wedmark, Y., Olav Vik, J. & Øyås, O. A hierarchy of metabolite exchanges in metabolic models of microbial species and communities. bioRxiv 1–19 (2023).

      (37) Zorrilla, F., Buric, F., Patil, K. R. & Zelezniak, A. MetaGEM: Reconstruction of genome scale metabolic models directly from metagenomes. Nucleic Acids Res. 49, (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The researchers demonstrated that when cytokine priming is combined with exposure to pathogens or pathogen-associated molecular patterns, human alveolar macrophages and monocyte-derived macrophages undergo metabolic adaptations, becoming more glycolytic while reducing oxidative phosphorylation. This metabolic plasticity is greater in monocyte-derived macrophages than in alveolar macrophages.

      Strengths:

      This study presents evidence of metabolic reprogramming in human macrophages, which significantly contributes to our existing understanding of this field primarily derived from murine models.

      Weaknesses:

      The study has limited conceptual novelty.

      We acknowledge that the study has limited conceptual novelty, however, the current manuscript provides the field with evidence of the changes in the phenotype and functions of human macrophages in response to IFN-γ or IL-4 which is currently lacking in the literature. Moreover, our data shows for the first time that human airway macrophages change their function in response to IFN-γ.  

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to functionally characterize primary human airway macrophages and monocytederived macrophages, correlating their glycolytic shift in metabolism. They conducted this macrophage characterization in response to type II interferon and IL-4 priming signals, followed by different stimuli of irradiated Mycobacterium tuberculosis and LPS.

      Strengths:

      (1) The study employs a thorough measurement of metabolic shift in metabolism by assessing extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) of differentially polarized primary human macrophages using the Seahorse XFe24 Analyzer.

      (2) The effect of differential metabolic shift on the expression of different surface markers for macrophage activation is evaluated through immunofluorescence flow cytometry and cytokine measurement via ELISA.

      (3) The authors have achieved their aim of preliminarily characterizing the glycolysis-dependent cytokine profile and activation marker expression of IFN-g and IL-4 primed primary human macrophages.

      (4) The results of the study support its conclusion of glycolysis-dependent phenotypical differences in cytokine secretion and activation marker expression of Ams and MDMs.

      Weaknesses:

      (1) The data are presented in duplicates for cross-analyses.

      (2) The data presented supports a distinct functional profile of airway macrophages (Ams) compared to monocyte (blood)-derived macrophages (MDMs) in response to the same priming signals. However, the study does not attempt to explore the underlying mechanism for this difference.

      (3) The study is descriptive in nature, and the results validate IFN-g-mediated glycolytic reprogramming in primary human macrophages without providing mechanistic insights.

      (1) We acknowledge the data is presented in duplicate for cross-analyses. This duplication allowed us to examine both (A) the effect of IFN-γ or IL-4 on primary human airway and monocyte derived macrophages in the presence or absence of distinct stimulations and (B) to directly compare the fold change in function occurring in the AM with the changes in the MDM.

      (2 & 3) We acknowledge that our study is descriptive however, by inhibiting glycolysis using 2DG we have demonstrated that increased flux through glycolysis is mechanistically required to mediate enhanced cytokine responses in both primary human AM and MDM primed with IFN-γ. However, we acknowledge that we have not determined the differential molecular mechanisms downstream of IFNγ in the AM versus the MDM. IFN-γ promotes both pro- and anti-inflammatory cytokines in AM and this was reduced by inhibiting glycolysis with 2DG. This identifies glycolysis as a key mechanistic pathway which can be therapeutically targeted in AM to modulate inflammation. Mechanistic studies on human AM are limited due to low number of AM retrieved from BAL samples. Nevertheless, the differences between AM and MDM identified in the current study indicate that future mechanistic studies are warranted to identify why IFN-γ promotes IL-10 in AM and not MDM, and, why TNF is differentially regulated by glycolysis in the two macrophage subpopulations, for example.  

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors explore the contribution of metabolism to the response of two subpopulations of macrophages to bacterial pathogens commonly encountered in the human lung, as well as the influence of priming signals typically produced at a site of inflammation. The two subpopulations are resident airway macrophages (AM) isolated via bronchoalveolar lavage and monocyte-derived macrophages (MDM) isolated from human blood and differentiated using human serum. The two cell types were primed using IFNγ and Il-4, which are produced at sites of inflammation as part of initiation and resolution of inflammation respectively, followed by stimulation with either irradiated Mycobacterium tuberculosis (Mtb) or LPS to simulate interaction with a bacterial pathogen. The authors use human cells for this work, which makes use of widely reported and thoroughly described priming signals, as well as model antigens. This makes the observations on the functional response of these two subpopulations relevant to human health and disease. To examine the relationship between metabolism and functional response, the authors measure rates of oxidative phosphorylation and glycolysis under baseline conditions, primed using IFNγ or IL-4, and primed and stimulated with Mtb or LPS.

      Strengths:

      • The data indicate that both populations of macrophages increase metabolic rates when primed, but MDMs decrease their rates of oxidative phosphorylation after IL-4 priming and bacterial exposure while AMs do not.

      • It is demonstrated that glycolysis rates are directly linked to the expression of surface molecules involved in T-cell stimulation and while secretion of TNFα in AM is dependent on glycolysis, in MDM this is not the case. IL-1β is regulated by glycolysis only after IFN-γ priming in both MDM and AM populations. It is also demonstrated that Mtb and LPS stimulation produces responses that are not metabolically consistent across the two macrophage populations. The Mtb-induced response in MDMs differed from the LPS response, in that it relies on glycolysis, while this relationship is reversed in AMs. The difference in metabolic contributions to functional outcomes between these two macrophage populations is significant, despite acknowledgement of the reductive nature of the system by the authors.

      • The observations that AM and MDM rely on glycolysis for the production of cytokines during a response to bacterial pathogens in the lung, but that only MDM shift to Warburg Metabolism, though this shift is blocked following exposure to IL-4, are supported by the data and a significant contribution the study of the innate immune response.

      Weaknesses:

      • It is unclear whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. ECAR and OCR analyses were therefore difficult to interpret.

      All data sets have been presented and analysed relative to both unprimed unstimulated to show both the effect of priming and subsequent stimulation. A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change. Therefore, each of unprimed, IFN-γ and IL-4 primed cells were set to 100% in order to assess the effect of stimulation independent of the baseline priming effect. For clarity we have removed the following line:

      “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”

      We have amended the text in the manuscript (lines 164-173) to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D).  These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure 1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”

      • The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.

      Our data suggests that the MDM are more phenotypically plastic (in terms of their ability to alter expression of cell surface markers in response to cytokine cues), whereas AM have a greater ability to alter cytokine production, our measure of functional plasticity. We have now defined the use of the terms ‘functional plasticity’ and ‘phenotypic plasticity’ in the context of our paper in lines 6063. To consider different culture and plating requirements of MDM versus AM, cytokine production was analysed relative to the average of the unprimed Mtb or LPS control of the respective MDM or AM. This allowed us to draw more accurate comparisons between the two macrophage populations by examining their relative ability to increase their cytokine production (expressed as fold change) rather than defining this functional plasticity only in terms of concentrations of cytokine produced in culture.  

      We have therefore added the following sentence into the conclusion of the manuscript. “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to modulate cytokine production after exposure Th1 and Th2 cytokines.”

      We have edited the discussion (lines 421-423) to clarify the following "have increased ability to produce all cytokines assayed in response to Mtb stimulation" and changed it to “stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”   

      • The claim that AM are better for "innate training" via IFNγ may not be consistent with increased IL1β and a later claim that MDM have increased production and are "associated with optimal training."

      We have removed the word “better” and now simply state that AM are a tractable target to induce innate training in the human lung.

      • Statistical analyses may not appropriately support some of the conclusions.

      We have consulted with a statistician. Please see response to reviewer 3 recommendations for authors point 1 below.  

      • AM populations would benefit from further definition-presumably this is a heterogenous, mixed population.

      AM are routinely >97% CD68+CD14+ used in the current study (Author response image 1). However, we acknowledge that tissue resident macrophages represent a spectrum of phenotypes. Given limitations in cell numbers from primary human AM derived from BALF, we have not attempted to define the function of discreet subpopulations of AM.

      • The term "functional plasticity" could also be more stringently defined for the purposes of this study.

      We are terming functional plasticity to be the macrophages’ ability to alter their production of cytokines in response to external cues like IFN-γ and IL-4 whereas phenotypic plasticity is measured based on ability to alter the cell surface expression of activation markers.  We have now defined this in the manuscript (lines 60-63).

      Author response image 1.

      Expression of macrophage markers on AM. 

      Conclusion:

      Overall, the authors succeed in their goals of investigating how inflammatory and anti-inflammatory cytokine priming contributes to the metabolic reprogramming of AM and MDM populations. Their conclusions regarding the relationship between cytokine secretion and inflammatory molecule expression in response to bacterial stimuli are supported by the data. The involvement of metabolism in innate immune cell function is relevant when devising treatment strategies that target the innate immune response during infection. The data presented in this paper further our understanding of that relationship and advance the field of innate immune cell biology.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1)  Authors are suggested to provide rationale for their choice of cytokines as IFN-gamma and IL-4. This will be useful for the readers.

      We have updated the following sentence (line 44-46) in the manuscript to add more rationale for the choice of IFN-γ and IL-4.  “There is a paucity of data on the role of metabolism in response to Th1 or Th2 microenvironments induced by cytokines-such as IFN-γ or IL-4 respectively, in human macrophages, especially in tissue resident macrophages, such as AM.”

      (2)  Authors have shown the final outcome of metabolic reprogramming in terms of expression of HLADR and CD-40, and cytokine release. What pathways/receptors are activated or associated with IL-4 and IFN-gamma priming as a first line of response?

      The relationship between IFN-γ or IL-4 induced expression of CD40 is established in haematological cell lines and fibroblasts as well as APC, with roles for the JAK/STAT pathways and upregulation of IRFs defined (1-3). Similarly, the relationship between exogenous IFN-γ and upregulation of HLA-DR expression on human monocytes or endothelial cells is established (4, 5). Whist our work does not outline the signalling pathways downstream of Th1 or Th2 cytokine priming, we have shown for the first time that glycolysis mechanistically underpins the shift in phenotype and function observed in human macrophages upon priming with IFN-γ or IL-4.

      (3)  What are the intracellular signals leading to glycolytic shift?

      One of the most likely mechanisms that under pin the shift to glycolytic metabolism is the stabilisation of HIF-1α mediated by activation of mTOR (see response below and rebuttal figure 2).  

      (4)  Additional evidence is required to show Warburg effect such as stabilization and activation of HIF1alpha.

      We acknowledge that we have not shown the activation and stabilisation of HIF-1α, however, we have provided functional evidence of increased glycolysis with concomitant decreased oxidative phosphorylation indicative of Warburg metabolism.

      In order to address this gap in evidence we have reworded the manuscript to describe this functional change to “Warburg-like metabolism” throughout the manuscript. In addition, we have undertaken Western Blotting to provide evidence of mTOR activation when cells are primed with IFN-γ (Author response image 2).

      Author response image 2.

      IFN-γ activates mTOR in primary human monocytes. Monocytes were isolated from healthy donor PBMC using magnetic separation. Monocytes were left untreated (-), stimulated with rapamycin as a negative control (Rap; 50 nM), IFN-γ (10 ng/ml) or IFN-γ and rapamycin simultaneously (IFN-γ + Rap) for 15 minutes. Phosphorylation of S6 was used as a readout of mTOR activation and measured by western blot using β-actin as a control with a blot (A) and (b) densitometry results are shown as the relative expression of pS6: β-actin from. Graphs show data of n=1 of unprimed (black dot) vs IFN-γ primed (red) with and without rapamycin. ImageLab (Bio-Rad) software was used to perform densitometric analysis. 

      (5)  What is the importance of showing percentage change vs fold change in figure 1 (1C vs 1A)?

      All data sets have been presented and analysed relative to both unprimed unstimulated to show the effect of first priming and subsequent stimulation (Figure 1A). A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change (Figure 1C). Therefore, each of unprimed, IFN-γ or IL-4 primed cells were set to 100% to assess the effect of stimulation independent of the pre-existing effect of priming on the baseline metabolism. For clarity we have removed the following line:

      “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”

      We have amended the text (lines 164-173) in the manuscript to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D).  These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”

      (6)  Why IL-4 primed cells have lower glycolysis than unprimed control cells even in absence of pathogen in Figure 1A?

      IL-4 primed AM do not have statistically significant changes in glycolysis compared with unprimed control cells in the absence of stimulation.  

      Reviewer #2 (Recommendations For The Authors):

      The manuscript entitled "Human airway macrophages are metabolically reprogrammed by IFN-γ resulting in glycolysis dependent functional plasticity" by Cox et al., characterizes glycolytic-linked cytokine secretion and surface receptor expression of primary human airway macrophages (AM) and monocyte-derived macrophages (MDM). The authors primed the primary macrophages with type II interferon (IFN-γ) or interleukin-4 (IL-4) into Th1 and Th2 polarized states. This was followed by measurement of the shift in macrophage metabolism to glycolysis (ECAR measurement) and/or oxidative phosphorylation (OCR measurement) in response to lipopolysaccharide and irradiated Mycobacterium tuberculosis. The authors then utilize 2-DG (an inhibitor of glycolysis) to show the reliance of glycolytic shift in metabolism to drive the expression of different macrophage activation markers in MDMs and cytokine secretion in AMs.

      Significance:

      The study provides important validation of IFN-γ-mediated glycolytic shift and its correlated functionalities in primary human macrophage populations.

      Highlights: The study characterizes glycolytic-linked cytokine secretion and expression of macrophage activation markers in primary human resident (lung) and monocyte (blood)-derived macrophages. The study also shows data in support of IFN-γ alone in mediating glycolytic reprogramming of human primary macrophages.

      Limitations:

      The study lacks novelty and does not provide any new or different information in relation to IFN-γmediated glycolytic shift in the metabolism of human macrophages.

      Major comments:

      (1) The authors have relied on irradiated Mycobacterium tuberculosis (Mtb) and LPS stimulation to measure different correlates of macrophage functions. Additionally, the authors have discussed their results with irradiated Mtb with that of infection with live Mtb. There are also recent reports that show Mtb infection limiting glycolytic reprogramming in murine and human macrophages (PMID: 31914380) in contrast to their observation with irradiated Mtb. The authors should also include live Mtb infection or other replicative live bacterium for the induction of surface activation markers and cytokine release in their setup.

      We thank the reviewer for this suggestion; however, this is beyond the scope of the current study which was to assess AM and MDM in the context of immune stimulation in a reductive manner using TLR4 ligand LPS and a more complete whole bacteria stimulation. The selected bacterial ligands were employed in the study to allow us to model an optimal macrophage host response. This minimises the confounding variable of live bacteria which can perturb cellular metabolism and immune responses, which we have highlighted in the discussion. Since both LPS and irradiated Mtb induced similar metabolic and phenotypic profiles, it is likely that the effects of priming are maintained with diverse stimuli.  

      (2) The authors should add a quantitative measure (like extracellular lactate secretion or ECAR level) for the extent of glycolytic inhibition by the use of 5 mM 2-DG in their setup.

      We would like to draw the attention of the reviewer to the data represented in supplementary figure 2B, demonstrating that 2DG lowers ECAR at 5mM at both 1 and 24 h post stimulation with iH37Rv by an average of approximately 40%. In addition, we have acknowledged that inhibition with 5 mM 2DG does not fully inhibit glycolysis as outlined in the study limitations (lines 477-480).  

      (3) Percent change and fold change have been used to show the same or similar result in Fig. 1 and 2. Whereas, supplementary Fig. 1 shows absolute ECAR/OCR values in addition to fold change. The authors can plot either fold change or percent change in different measurements to avoid confusion. For example, do ECAR changes upon LPS stimulation in Fig. 1A and 1C come from the same dataset? One of the data points in percent change shows a decrease in percent ECAR change under no cytokine control, whereas all the data points in fold change show an increase.

      We have addressed this comment above in response to reviewer 1 point 5 (recommendations for the authors).

      We thank the reviewer for highlighting this single error in the data points for percent change. We have fixed this data point which was a result of a calculation error. All data throughout the manuscript has now been rechecked.   

      Minor comments:

      (1) The manuscript for review should be line-marked for referencing and commenting during review.

      We have now included line-marking on the manuscript.  

      (2) The authors can depict marker legends differently for all figures. In all figures, circles to squares or triangles represent treatment/stimulation with iH37Rv or LPS. The authors can depict this as circles to squares/triangles in contrast to different legends.

      We have changed the legend to include a more detailed description of data represented inserting additional information regarding the colours and symbols represented in the figures.  

      (3) Describe bars in supplementary figure 1A - 1H in its legend?

      We thank the reviewer for highlighting this oversight, we have amended the legend to state “error bars represent standard deviation”

      (4) Discuss the significant increase in CD86 expression in IFN-γ and IL-4 primed unstimulated AMs in Fig. 3E.

      We have updated the results section to state that IFN-γ increased the expression of CD86 when isolated in the absence of bacterial stimulations in Fig. 3E (lines 271-272). There is no significant increase in CD86 by IL-4 primed unstimulated AM. IL-4 primed human AM only upregulated CD86 when treated with 2DG or in the presence of stimulation.  

      (5) Contrary to Fig. 2, the data points of unstimulated cells in Fig. 4 vary for different treatment conditions (no cytokine, IFN-γ, and IL-4) for each cytokine measurement. What is the difference between unstimulated cells in Fig. 4 (for each cytokine) from that of Fig. 2 (for each receptor MFI)?

      Unstimulated cells change their surface activation markers and phenotype in response to IFN-γ and IL-4 in Fig. 2. For Fig. 4, IFN-γ and IL-4 are not sufficient to induce cytokine secretion in the absence of stimulation with bacterial ligands.  

      (6) The methodology for seeding and treatment of cells is reemphasized for almost all results. Defining macrophage priming and stimulation of macrophages in the method section and once at the start of results should be fine.

      Plating happens differently for Seahorse compared to the flow cytometric phenotyping and ELISA for cytokine production. For clarity we have stated and reemphasized the seeding and treatment of cells throughout the results section.  

      (7) Clarify "IL-4 reduced glycolysis in response to LPS stimulation" in relation to the results depicted in Fig. 1A and 1C. Similarly, clarify "IL-4 resulting in reduced IL-1β and IL-10 production" in relation to Fig. 4E.

      For clarity we have added the following lines (157-160, 164-170) to the manuscript:  

      “IL-4 primed iH37Rv stimulated AM increased ECAR to similar extent as unprimed controls (Figure 1A; left). Conversely, IL-4 primed AM stimulated with LPS AM did not increase their ECAR to the same extent as controls (Figure 1A; right), suggesting that IL-4 reduces the AM ability to increase ECAR in response to LPS stimulation.”   

      “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D). These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C).”

      For clarity we have amended the sentence the reviewer has highlighted (lines 214-215): “IL-4 primed AM had reduced fold change in glycolysis upon stimulation with LPS compared with controls”.

      Since IFN-γ priming induced large effect sizes, we statistically analysed the IL-4 primed and unprimed data sets in the absence of the IFN-γ primed data sets to determine how IL-4 influenced macrophage function. The only data where this resulted in any statistical significance was in response to cytokine production. We have now clarified this in the methods and relevant figure legends by stating, “Statistically significant differences were determined using two-way ANOVA with a Tukey post-test (AD); *P≤0.05, **P≤0.01, ***P≤0.001, ****P≤0.0001 or #P≤0.05, ##P≤0.01 (where IFN-γ primed data sets were excluded for post-test analysis to analyse statistical differences between no cytokine and IL4 treated data sets).

      To further clarify this, we have amended the text of the manuscript (lines 307-310) to reflect this. “All stimulated AM secreted IL-10 regardless of priming (Figure 4E). IFN-γ significantly enhanced iH37Rv induced IL-10 in AM compared to unprimed or IL-4 primed comparators (Figure 4E). IL-4 priming of human AM significantly reduced IL-10 production in response to iH37Rv compared with unprimed AM (Figure 4E). LPS strongly induced IL-10 production in unprimed MDM, which was significantly attenuated by either IFN-γ or IL-4 priming (Figure 4F).”  

      (8) Clarify whether data points in unstimulated, iH37Rv stimulated, and LPS-stimulated control cells in Fig. 3A - 3F are from independent experiments from those in Fig. 2A - 2F? The distribution of data points of control (no 2-DG treatment) in Fig. 3 is highly similar to the corresponding data points in Fig. 2. Similarly, provide clarification for similarity in Fig. 5A - 5F and Fig. 4A - 4F.

      The data illustrated in figure 2 and 3 are from one very large dataset, as are the data in figures 4 and 5. This large experiment was designed to test the effect of priming macrophages with IFN- or IL-4 (in the presence or absence of stimulation), and also to determine if the differential responses elicited due to priming were dependent on glycolysis (by inhibiting with 2DG). For clarity and transparency, the same stimulated dataset is repeated in both figures. Given the size and complexity of the experiment, we chose to present the data this way to aid the reader.  

      (9) Clarify the statement "where data was reanalyzed in the absence of IFN-γ" in the section pertaining to Statistical analysis. The authors should clearly mention nature of biological and technical replicates for each experiment in its figure legend. The authors should also confirm multiple comparison correction in all 2-way ANOVA tests done in each figure legend."

      We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated data sets.”  

      Figures represent biological replicates (which are the average of technical replicates, presented as a single data point). This is indicated by the following sentence in each figure legend: “Each linked data point represents the average of technical duplicates for one individual biological donor”.  

      Each legend has been amended to include the multiple comparison post-test applied.

      (10) Discuss the differences and similarities of IFN-γ driven metabolic reprogramming of primary murine macrophages with the results of this study relative to cytokine secretion and activation marker expression.

      We have added additional discussion and detail comparing human and murine macrophages in lines 381-382, 403, 407 and 412-415 of the manuscript.

      (11) The repetitive data plots of similar results can be significantly reduced to improve the interpretation of the results.

      The benefit of the plotting the data in this way is for a clearer understanding and representation of the data. The repetitive data plots allow the benefit of being able to first delineate the effect of priming and priming plus stimulation and then, separately, to further examine the differences in AM versus MDM. The repetition of the primed data points then allows of the reader to determine the effect of inhibiting glycolysis with 2DG on unprimed and primed macrophages (with and without stimulation).   

      Reviewer #3 (Recommendations For The Authors):

      The methods used and data reported in this manuscript contribute to our understanding of the role of metabolism in programming of macrophages during priming. Suggestions for improving the presentation and interpretation of results include:

      • Consult with a statistician regarding analyses of the multiple conditions used during these assays. The use of repeated statistical analyses with different comparison groups in the same figure/data set seems atypical and should either be amended or fully justified in the text. Also, use of two-way vs. one-way ANOVA should be evaluated and clarified.

      We have now consulted a statistician. We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated groups.”  

      There are two variables in the data sets; cytokine priming as well as stimulation status therefore we opted for a two-way ANOVA rather than a One-way ANOVA. There are three stimulation groups: unstimulated, Mtb-stimulated and LPS-stimulated. Cytokine priming also has three groups: no cytokine, IFN-y, or IL-4. There are two variables (priming and stimulation), each with 3 groups i.e., six treatment conditions in total, therefore two-way AVOVA with multiple comparisons tests help pinpoint exactly which groups (e.g., the 6 different levels of the 'stimulation' and 'cytokine' treatments) are significantly different from each other. This was important for understanding the specific effects of our treatments. The reader can therefore also deduce how these six treatment conditions compare to each other.

      In contrast, performing multiple single comparisons independently of the rest of the dataset (e.g. t tests), increases the risk of false positives (type 1 error). Multiple comparisons ANOVA with post-tests adjust for this, helping to reduce the likelihood of a type 1 error. These stats are more stringent, and it is therefore harder to get P values <0.05. Hence, if we compared all six treatment groups without adjustment, you increase the chance of finding false positives due to the sheer number of comparisons, leading to biased and incorrect conclusions.

      In our case, multiple comparisons tests were essential after the two-way ANOVA because they helped to objectively identify specific treatment group differences and control the overall error rate when we were extracting our conclusions, thereby reducing any risk of biases in our conclusions.

      A one-way ANOVA is used to test the effect of a single variable with more than two groups contained in the dataset. For example, in our case if you only want to test how different 'stimulation' groups affect ECAR or OCR, only in unprimed macrophages, a one-way ANOVA would be used.

      The current study used two-way ANOVA to test the effects of two variables (priming and stimulation, or in some cases priming and inhibition) each containing 3 groups, and see if there is any interaction between the two factors. For example, in our case this allowed us to examine how the 'stimulation' and the 'cytokine' priming affect ECAR/OCR levels and to determine if the effect of 'stimulation' depends on the 'cytokine' priming.

      • More justification could be given for the dose of IFNγ used for priming. Inflammatory priming is typically performed with a "low-dose" treatment (e.g., ~1 ng/ml), whereas the authors use 10 ng/ml, which would be considered a high dose. It would be useful to repeat select experiments with a more standard low-dose treatment of IFNg to demonstrate that this is also sufficient to induce the observed metabolic changes.

      Previous work has identified little difference in the response of AM and peripheral monocytes to low versus high doses of IFN-γ (6). We have inserted the following into the study limitations (lines 479-481).  

      “Furthermore, only one dose of IFN-γ was utilised due to limitations in AM yield, however, recently both low and high doses of IFN-γ have been shown to have similar effects on AM in vitro (6).”

      • Check for accuracy of the Fig.4 legend. Also check that 4G and 4B math is consistent.

      The legend for Figure 4 has been amended for incorrect A,B to state G,H. The math has been double checked for accuracy and is correct. 3 out of 10 MDM donors produced IL-1β in the absence of IFN-γ in Figure 4B, therefore the average used to calculate the data represented in Figure 4G was brought down markedly by donors who produced little or no IL-1β.  

      • Functional plasticity is a vague term and difficult to interpret in this context. It is stated that AM have greater functional plasticity, but MDMs appear to have greater capacity to secrete IL-1β and respond more robustly to IL-4 in terms of T cell stimulation. On that note, the claims regarding antigen presentation would be more impactful if a direct comparison of antigen presentation capacity was made between AM and MDM.

      Our data suggests that AM have a greater ability to alter cytokine production, such as IL1β. To consider different culture and plating requirements of MDM v AM cytokine concentration was normalised and expressed in terms of fold change.  This gives a more controlled and accurate comparison of the ability of IFN-γ or IL-4 to modulate cytokine production in AM compared with MDM.  

      The terms ‘functional plasticity’ and phenotypic plasticity’ have now been defined in the manuscript in lines 60-63.  

      We have therefore added the following sentence into the conclusion of the manuscript (lines 490-493). “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to produce cytokine after exposure Th1 and Th2 cytokines.”

      However, we acknowledge that the MDM may be regarded as more plastic because of their ability to respond robustly to IL-4, whereas the phenotypic and functional changes in the AM in response to IL4 are more limited. Whilst the focus of our work was to determine if AM are a tractable target to promote immunity in the lungs through upregulation of pro-inflammatory effector function, their ability to downregulated inflammation in response to IL-4 is comparatively less profound compared with MDM.  

      We acknowledge the shortcomings of our work which did not allow us to directly measure antigen processing in the AM, due to limitations in the cellular yield from BALF. We have edited the text (lines 251-252 and 286) to clarify this for the reader.  

      • Inconsistent normalization complicates interpretation of metabolic data. For example, it is unclear, for example, whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. Check harmony of methods for analysis of "metabolic assays" with Fig.1 data, axis, and legend.

      We have addressed this comment, which is similar to points made by the other reviewers and amended the manuscript to increase clarity. These changes are outlined in the response to reviewer 1, point 5 (recommendations for the author). In addition, we have amended the metabolic assay method (lines 111-112) to state that “Post stimulation the ECAR and OCR were continually sampled at 20-minute intervals for times indicated.”

      • A direct comparison of cytokine production after priming and stimulation with Mtb or LPS is limited by inconsistent axes. The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.

      We have amended the text to clarify this issue (lines 313-315). “These data suggest that the AM have greater functional plasticity in terms of their ability to upregulate cytokine production in response to IFN-γ, compared with the MDM. IFN-γ primed AM have enhanced IL-10 and TNF production in response to Mtb and LPS, respectively.”  

      We have amended the manuscript and have replaced “IFN-γ primed AM have increased ability to produce all cytokines assayed in response to Mtb stimulation” with the following (lines 421-423) “IFNγ primed AM stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”

      • AM populations could be defined experimentally.

      Airway macrophages were adherence purified from bronchoalveolar lavage fluid defined as CD68+CD14+ as per rebuttal figure 1. The purpose of this study was to examine if human peripherally derived or lung resident macrophages were plastic in response to the classical polarising cytokines IFNγ and IL-4. We have identified that the AM and MDM do indeed have different functional and metabolic responses to these cytokines. However, determining functional differences within the AM subpopulations is beyond the scope of the current study and hampered by low cell numbers in human BALF.  

      References

      (1) Conzelmann M, Wagner AH, Hildebrandt A, Rodionova E, Hess M, Zota A, Giese T, Falk CS, Ho AD, Dreger P, Hecker M, Luft T. IFN-γ activated JAK1 shifts CD40-induced cytokine profiles in human antigen-presenting cells toward high IL-12p70 and low IL-10 production. Biochemical pharmacology 2010; 80: 2074-2086.

      (2) Fries KM, Sempowski GD, Gaspari AA, Blieden T, Looney RJ, Phipps RP. CD40 Expression by human fibroblasts. Clinical Immunology and Immunopathology 1995; 77: 42-51.

      (3) Gu W, Chen J, Yang L, Zhao KN. TNF-α promotes IFN-γ-induced CD40 expression and antigen process in Myb-transformed hematological cells. TheScientificWorldJournal 2012; 2012: 621969.

      (4) Hershman MJ, Appel SH, Wellhausen SR, Sonnenfeld G, Polk HC, Jr. Interferon-gamma treatment increases HLA-DR expression on monocytes in severely injured patients. Clinical and experimental immunology 1989; 77: 67-70.

      (5) Maenaka A, Kenta I, Ota A, Miwa Y, Ohashi W, Horimi K, Matsuoka Y, Ohnishi M, Uchida K, Kobayashi T. Interferon-γ-induced HLA Class II expression on endothelial cells is decreased by inhibition of mTOR and HMG-CoA reductase. FEBS open bio 2020; 10: 927-936.

      (6) Thiel BA, Lundberg KC, Schlatzer D, Jarvela J, Li Q, Shaw R, Reba SM, Fletcher S, Beckloff SE, Chance MR, Boom WH, Silver RF, Bebek G. Human alveolar macrophages display marked hyporesponsiveness to IFN-γ in both proteomic and gene expression analysis. PLoS One 2024; 19: e0295312.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      It is suggested that for each limb the RG (rhythm generator) can operate in three different regimes: a non-oscillating state-machine regime, and in a flexor driven and a classical half-center oscillatory regime. This means that the field can move away from the old concept that there is only room for the classic half-center organization

      Strengths:

      A major benefit of the present paper is that a bridge was made between various CPG concepts ( "a potential contradiction between the classical half-center and flexor-driven concepts of spinal RG operation"). Another important step forward is the proposal about the neural control of slow gait ("at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs for phase transitions, which can come from limb sensory feedback and/or volitional inputs (e.g. from the motor cortex").

      Weaknesses:

      Some references are missing

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional text to meet the specific Reviewer’s recommendations and several references suggested by the Reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The biologically realistic model of the locomotor circuits developed by this group continues to define the state of the art for understanding spinal genesis of locomotion. Here the authors have achieved a new level of analysis of this model to generate surprising and potentially transformative new insights. They show that these circuits can operate in three very distinct states and that, in the intact cord, these states come into successive operation as the speed of locomotion increases. Equally important, they show that in spinal injury the model is "stuck" in the low speed "state machine" behavior.

      Strengths:

      There are many strengths for the simulation results presented here. The model itself has been closely tuned to match a huge range of experimental data and this has a high degree of plausibility. The novel insight presented here, with the three different states, constitutes a truly major advance in the understanding of neural genesis of locomotion in spinal circuits. The authors systematically consider how the states of the model relate to presently available data from animal studies. Equally important, they provide a number of intriguing and testable predictions. It is likely that these insights are the most important achieved in the past 10 years. It is highly likely proposed multi-state behavior will have a transformative effect on this field.

      Weaknesses:

      I have no major weaknesses. A moderate concern is that the authors should consider some basic sensitivity analyses to determine if the 3 state behavior is especially sensitive to any of the major circuit parameters - e.g. connection strengths in the oscillators or?

      We thank the Reviewer for the thoughtful and constructive comments. The sensitivity analysis has been included as Supplemental file.

      Reviewer #3 (Public Review):

      Summary:

      This work probes the control of walking in cats at different speeds and different states (split-belt and regular treadmill walking). Since the time of Sherrington there has been ongoing debate on this issue. The authors provide modeling data showing that they could reproduce data from cats walking on a specialized treadmill allowing for regular and split-belt walking. The data suggest that a non-oscillating state-machine regime best explains slow walking - where phase transitions are handled by external inputs into the spinal network. They then show at higher speeds a flexor-driven and then a classical halfcenter regime dominates. In spinal animals, it appears that a non-oscillating state-machine regime best explains the experimental data. The model is adapted from their previous work, and raises interesting questions regarding the operation of spinal networks, that, at low speeds, challenge assumptions regarding central pattern generator function. This is an interesting study. I have a few issues with the general validity of the treadmill data at low speeds, which I suspect can be clarified by the authors.

      Strengths:

      The study has several strengths. Firstly the detailed model has been well established by the authors and provides details that relate to experimental data such as commissural interneurons (V0c and V0d), along with V3 and V2a interneuron data. Sensory input along with descending drive is also modelled and moreover the model reproduces many experimental data findings. Moreover, the idea that sensory feedback is more crucial at lower speeds, also is confirmed by presynaptic inhibition increasing with descending drive. The inclusion of experimental data from split-belt treadmills, and the ability of the model to reproduce findings here is a definite plus.

      Weaknesses:

      Conceptually, this is a very useful study which provides interesting modeling data regarding the idea that the network can operate in different regimes, especially at lower speeds. The modelling data speaks for itself, but on the other hand, sensory feedback also provides generalized excitation of neurons which in turn project to the CPG. That is they are not considered part of the CPG proper. In these scenarios, it is possible that an appropriate excitatory drive could be provided to the network itself to move it beyond the state-machine state - into an oscillatory state. Did the authors consider that possibility? This is important since work using L-DOPA, for example, in cats or pharmacological activation of isolated spinal cord circuits, shows the CPG capable of producing locomotion without sensory or descending input.

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional texts, references, and discussed the issues raised by the Reviewer. Particularly, in section “Model limitations and future directions” we now admit that afferent feedback can provide some constant level excitation to the RG circuits after spinal transection which can partly compensate for the lack of supraspinal drive and hence affect (shift) the timing of transitions between the considered regimes. We mentioned that this is one of the limitations of the present model. The potential effects of neuroactive drugs, like DOPA, on CPG circuits after spinal transection were left out because they are outside the scope of the present modeling studies.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      specific feedback to the authors:

      Nevertheless, there are some minor points, worth considering.

      Link to HUMAN DATA

      Here the authors may be interested to know that human data supports their proposal. This is relevant since there is ample evidence for the operation of spinal CPG's in humans (Duysens and van de Crommert,1998). The present model predicts that the basic output of the CPG remains even at very slow speeds, thus leading to similarity in EMG output. This prediction fits the experimental data (den Otter AR, Geurts AC, Mulder T, Duysens J. Speed related changes in muscle activity from normal to very slow walking speeds. Gait Posture. 2004 Jun;19(3):270-8). To investigate whether the basic CPG output remains basically the same even at very slow speeds (as also predicted by the current model), humans walked slowly on a treadmill (speeds as slow as 0.28 m s−1). Results showed that the phasing of muscle activity remained relatively stable over walking speeds despite substantial changes in its amplitude. Some minor additions were seen, consistent with the increased demands of postural stability. Similar results were obtained in another study: Hof AL, Elzinga H, Grimmius W, Halbertsma JP. Speed dependence of averaged EMG profiles in walking. Gait Posture. 2002 Aug;16(1):78-86. doi:

      10.1016/s0966-6362(01)00206-5. PMID: 12127190.

      These authors wrote: "The finding that the EMG profiles of many muscles at a wide range of speeds can be represented by addition of few basic patterns is consistent with the notion of a central pattern generator (CPG) for human walking". The basic idea is that the same CPG can provide the motor program at slow and fast speeds but that the drive to the CPG differs. This difference is accentuated under some conditions in pathology, such as in Parkinson's Kinesia Paradoxa. It was argued that the paradox is not really a paradox but is explained as the CPGs are driven by different systems at slow and at fast speeds (Duysens J, Nonnekes J. Parkinson's Kinesia Paradoxa Is Not a Paradox. Mov Disord. 2021 May;36(5):1115-1118. doi: 10.1002/mds.28550. Epub 2021 Mar 3. PMID: 33656203.)

      These ideas are well in line with the current proposal ("Based on our predictions, slow (conditionally exploratory) locomotion is not "automatic", but requires volitional (e.g. cortical) signals to trigger stepby-step phase transitions because the spinal network operates in a state-machine regime. In contrast, locomotion at moderate to high speeds (conditionally escape locomotion) occurs automatically under the control of spinal rhythm-generating circuits receiving supraspinal drives that define locomotor speed, unless voluntary modifications or precise stepping are required to navigate complex terrain").

      As mentioned in the present paper, other examples exist from pathology ("...Another important implication of our results relates to the recovery of walking in movement disorders, where the recovered pattern is generally very slow. For example, in people with spinal cord injury, the recovered walking pattern is generally less than 0.1 m/s and completely lacks automaticity 77-79. Based on our predictions, because the spinal locomotor network operates in a state-machine regime at these slow speeds, subjects need volition, additional external drive (e.g., epidural spinal cord stimulation) or to make use of limb sensory feedback by changing their posture to perform phase transitions"). As mentioned above, another example is provided by Parkinson's disease. The authors may also be interested in work on flexible generators in SCI: Danner SM, Hofstoetter US, Freundl B, Binder H, Mayr W, Rattay F, Minassian K. Human spinal locomotor control is based on flexibly organized burst generators. Brain. 2015 Mar;138(Pt 3):577-88. doi: 10.1093/brain/awu372. Epub 2015 Jan 12. PMID: 25582580; PMCID: PMC4408427.

      We thank the reviewer for these additional and interesting insights. We added a new paragraph in the Discussion to bolster the link with human data that includes references suggested by the Reviewer.

      CHAIN OF REFLEXES

      It reads: "... in opposition to the previously prevailing viewpoint of Charles Sherrington 21,22 that locomotion is generated through a chain of reflexes, i.e., critically depends on limb sensory feedback (reviewed in 23)." This is correct but incomplete. The reference cited (23: Stuart, D.G. and Hultborn, H, "Thomas Graham Brown (1882--1965), Anders Lundberg (1920-), and the neural control of stepping," Brain Res. Rev. 59(1), 74-95 (2008)) actually reads: "Despite the above findings, the doctrinaire position in the early 1900s was that the rhythm and pattern of hind limb stepping movements was attributable to sequential hind limb reflexes. According to Graham Brown (1911c) this viewpoint was largely due to the arguments of Sherrington and a Belgian physiologist, Maurice Philippson (1877-1938). Philippson studied stepping movements in chronically maintained spinal dogs, using techniques he had acquired in the Strasbourg laboratory of the distinguished German physiologist, Friedrich Goltz (1834-1902). He also analyzed kinematically moving pictures of dog locomotion, which had been sent to him by the renowned French physiologist, Etienne-Jules Marey (1830-1904). Philippson (1905) certainly presented arguments explaining his perception of how sequential spinal reflexes contributed to the four phases of the step cycle (see Fig. 1 in Clarac, 2008). In retrospect, it is likely that Graham Brown was correct in attributing to Philippson and Sherrington the then-prevailing viewpoint that reflexes controlled spinal stepping. It is puzzling, nonetheless, that far less was said then and even now about Philippson's belief that the spinal control was due to a combination of central and reflex mechanisms (Clarac, 2008),4,5 4 We are indebted to François Clarac for drawing to our attention Philippson's statement on p. 37 of his 1905 article that "Nos expériences prouvent d'une part que la moelle lombaire séparée du reste de l'axe cérébro-spinal est capable de produire les mouvements coordonnés dans les deux types de locomotion, trot et gallop. [Our experiments prove that one side of the spinal cord separated from the cerebro-spinal axis is able to produce coordinated movements in two types of locomotion, trot and gallop]." Then, on p. 39 Philippson (1905) states that "Nous voyons donc, en résumé que la coordination locomotrice est une fonction exclusivement médullaire, soutenue d'une part par des enchainements de réflexes directs et croisés, dont l'excitant est tantot le contact avec le sol, tantot le mouvement même du membre. [In summary, we see that locomotor coordination is an exclusive function of the spinal cord supported by a sequencing of direct and crossed reflexes, which are activated sometimes by contact with the ground and sometimes even by leg movement]. A coté de cette coordination basée sur des excitations périphériques, il y a une coordination centrale provenant des voies d'association intra-médullaires. [In conjunction with this peripherally excited coordination, there is a central coordination arising from intraspinal pathways]." (The English translations have also been kindly supplied by François Clarac.) Clearly, Philippson believed in both a central spinal and a reflex control of stepping! 5 In part 1 of his 1913/1916 review Graham Brown discussed Philippson's 1905 article in much detail (pp. 345-350 in Graham Brown, 1913b). He concludes with the statement that "... Philippson die wesentlichen Factoren des Fortbewegungsaktes in das exterozeptive Nervensystem verlegt. Er nimmt an, dass die zyklischen Bewegungen automatisch durch äussere Reize erhalten werden, welche in sich selbst thythmisch als Folge der Reflexakte welche sie selbst erzeugen, wiederholt werden. [Philippson assigns the important factors of the act of locomotion to the exteroceptive nervous system. He assumes that the cyclic movements are automatically maintained by external stimuli which, by themselves, are rhythmically repeated as a consequence of the reflexive actions that they generate themselves]." (English translation kindly supplied by Wulfila Gronenberg). This interpretation clearly ignores Philippson's emphasis on a central spinal component in the control of stepping....). "

      Hence it is a simplification to give all credits to Sherrington and ignoring the role of Philippson concerning the chain of reflexes idea.

      We again thank the Reviewer for these additional and interesting insights. We added the Philippson (1905) and Clarac (2008) references. The important contribution of Philippson is now indicated.

      GTO Ib feedback

      It reads: "This effect and the role of Ib feedback from extensor afferents has been demonstrated and described in many studies in cats during real and fictive locomotion 2,57-59."

      These citations are appropriate but it is surprising to see that the Hultborn contribution is limited to the Gossard reference while the even more important earlier reference to Conway et al is missing (Conway BA, Hultborn H, Kiehn O. Proprioceptive input resets central locomotor rhythm in the spinal cat. Exp Brain Res. 1987;68(3):643-56. doi: 10.1007/BF00249807. PMID: 3691733).

      Yes, the Conway et al. reference has been added.

      Other species

      The authors may also look at other species. The flexible arrangement of the CPGs, as described in this article, is fully in line with work on other species, showing cpg networks capable to support gait, but also scratching, swimming ..etc (Berkowitz A, Hao ZZ. Partly shared spinal cord networks for locomotion and scratching. Integr Comp Biol. 2011 Dec;51(6):890-902. doi: 10.1093/icb/icr041. Epub 2011 Jun 22. PMID: 21700568. Berkowitz A, Roberts A, Soffe SR. Roles for multifunctional and specialized spinal interneurons during motor pattern generation in tadpoles, zebrafish larvae, and turtles. Front Behav Neurosci. 2010 Jun 28;4:36. doi: 10.3389/fnbeh.2010.00036. PMID: 20631847; PMCID: PMC2903196.)

      Similar ideas about flexible coupling can also be found in: Juvin L, Simmers J, Morin D. Locomotor rhythmogenesis in the isolated rat spinal cord: a phase-coupled set of symmetrical flexion extension oscillators. J Physiol. 2007 Aug 15;583(Pt 1):115-28. doi: 10.1113/jphysiol.2007.133413. Epub 2007 Jun 14. PMID: 17569737; PMCID: PMC2277226. Or zebrafish: Harris-Warrick RM. Neuromodulation and flexibility in Central Pattern Generator networks. Curr Opin Neurobiol. 2011 Oct;21(5):685-92. doi: 10.1016/j.conb.2011.05.011. Epub 2011 Jun 7. PMID: 21646013; PMCID: PMC3171584.

      We added a sentence in the Discussion along with supporting references.

      Standing

      In the view of the present reviewer, the model could even be extended to standing in humans. It reads: "at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs"; similarly (personal experience) when going from sit to stand: as soon as weight is over support, extension is initiated and the body raises, as one would expect when the extensor center is activated by reinforcing load feedback, replacing GTO inhibition (Faist M, Hoefer C, Hodapp M, Dietz V, Berger W, Duysens J. In humans Ib facilitation depends on locomotion while suppression of Ib inhibition requires loading. Brain Res. 2006 Mar 3;1076(1):87-92. doi:

      Yes, we agree that the model could be extended to standing and the transition from standing to walking is particularly interesting. However, for this paper, we will keep the focus on locomotion over a range of speeds.

      Reviewer #2 (Recommendations For The Authors):

      The presentation is exceedingly well done and very clear.

      A moderate concern is that the authors do not make use of the capacity of computer simulations for sensitivity analyses. Perhaps these have been previously published? In any case, the question here is whether the 3 state behavior is especially sensitive to excitability of one of the main classes of neurons or a crucial set of connections.

      The sensitivity analysis has been made and included as Supplemental file.

      Minor point. I have but two minor points. A bit more explanation should be provided for the use of the terms "state machine" to describe the lowest speed state. Perhaps this is a term from control theory? In any case, it is not clear why this is term is appropriate for a state in which the oscillator circuits are "stuck" in a constant output form and need to be "pushed" by sensory input.

      Yes, we now provide a definition in the Introduction.

      Minor point: it is of course likely that neuromodulation of multiple types of spinal neurons occurs via inputs that activate G protein coupled receptors. These types of inputs are absent from the model, which is fine, but some sort of brief discussion should be included. One possibility is to note that the circuit achieves transitions between different states without the need for neuromodulatory inputs. This appears to me to be a very interesting and surprising insight.

      In section “Model limitations and future directions” in the Discussion, we now mention that the term “supraspinal drive” in our model is used to represent supraspinal inputs providing both electrical and neuromodulator effects on spinal neurons increasing their excitability, which disappear after spinal transection.” We think that it is so far too early to simulate the exact effects of the descending neuromodulation, since there is almost no data on the effect of different modulators on specific types of spinal interneurons.

      Reviewer #3 (Recommendations For The Authors):

      Minor Comments  

      Page numbers would be useful.

      Abstract

      Following spinal transection, the network can only operate in a state-machine regime. This is a bit strong since it applies to computational data. Clarify this statement.

      We agree. Sentence has been changed to: “Following spinal transection, the model predicts that the spinal network can only operate in the state-machine regime.”

      Introduction

      Intro - "This is somewhat surprising...". It gives the impression that spinal cats are autonomously stable on the belt. They are stabilized by the experimenter.

      The text has been changed to: “This is somewhat surprising because intact and spinal cats rely on different control mechanisms. Intact cats walking freely on a treadmill engage vision for orientation in space and their supraspinal structures process visual information and send inputs to the spinal cord to control locomotion on a treadmill that maintains a fixed position of the animal relative to the external space. Spinal cats, whose position on the treadmill relative to the external space is fixed by an experimenter, can only use sensory feedback from the hindlimbs to adjust locomotion to the treadmill speed.”

      "Cannot consistently perform treadmill locomotion" - likely a context-dependent result. Certainly, cats can do this easily off a treadmill - stalking, for example. Perhaps somewhere, mention that treadmill locomotion is not entirely similar to overground locomotion.

      We completely agree. Stalking is an excellent example showing that during overground locomotion slow movements (and related phase transitions) can be controlled by additional voluntary commands from supraspinal structures, which differs from simple treadmill locomotion, performing out of specific goalor task-dependent contexts. Based on this, we suggest a difference between a relatively slow (exploratory-type, including stalking) and relatively fast (escape-type) overground locomotion. We added the following sentence to the introduction:” This is evidently context dependent and specific for the treadmill locomotion as cats, humans  and other animals can voluntarily decide to perform consistent overground locomotion at slow speeds.”

      The authors introduce the concept of the state machine regime. In my opinion, this could use some more explanation and citations to the literature. Was it a term coined by the authors, or is there literature reinforcing this point?

      This is a computer science and automata theory term that has already been used in descriptions of locomotion (see our references in the 2nd paragraph of Discussion). We added a definition and corresponding references in the Introduction.

      In terms of sensory feedback, particularly group II input, it would be interesting to calculate if the conduction delay to the spinal cord at higher speeds would have a certain cutoff point at which it would no longer be timed effectively for phase transitions. This could reinforce your point.

      This is an interesting proposition but it is unlikely to be a factor over the range of speeds that we investigated (0.1 to 1.0 m/s). Assuming that group II afferents transmit their signals to spinal circuits at a latency of 10-20 ms, this is more than enough time to affect phase transitions, even at the highest speed considered. This might be a factor at very high speeds (e.g. galloping) or in small animals with high stepping frequencies.

      Results.

      The assertion that intact cats are inconsistent in terms of walking at slow speeds needs to be bolstered. For example, if a raised platform were built for a tray of food, would the intact cat consistently walk at slower speeds and eat? I suspect so. By the same token, would they walk slowly during bipedal walking? It is pretty easy to check this. Also, reports from the literature show differential effects of runway versus treadmill gait analysis, specifically when afferent input is removed.

      The Reviewer is correct that raising a platform for a food tray or even having intact cats walk with their hindlimbs only (with forelimbs on a stationary platform) may allow for consistent stepping at slow speeds (0.1 – 0.3 m/s). However, this effectively removes voluntary control of locomotion and makes the pattern more automatic (spinal + limb sensory feedback). These examples provide additional specific contexts, and we have already mentioned (see above) that slow locomotion of intact cat is context dependent. 

      "We believe that intact animals walking on a treadmill..." Citations for this? Certainly, this is not a new point.

      No, this is not new. We changed the sentence and added a reference to the statement: “Intact animals walking on a treadmill use visual cues and supraspinal signals to adjust their speed and maintain a fixed position relative to the external space with reference to Salinas et al. (Salinas, M.M., Wilken, J M, and Dingwell, J B, "How humans use visual optic flow to regulate stepping during walking," Gait. Posture. 57, 15-20, 2017).

      The presentation of the results is somewhat disjointed. The intact data is presented for tied and splitbelt results, but this is not addressed explicitly until figure 4. Would it not be better to create a figure incorporating both intact and modelling data and present the intact data where appropriate?

      We tried to do this initially, but this way required changing the style of the whole paper and we decided against this idea. Therefore, we prefer to keep the presentation of results as it is now. 

      Regarding the role of sensory feedback being especially important at low speeds, it is interesting that egr3+ mice (lacking spindle input) show an inability to walk at high speeds >40 cm/s but can walk at lower speeds (up to 7 cm/s) (Takeoka et al 2014). Similar findings were found with a lesion affecting Group I afferents in general (Takeoka and Arber 2019). Also, Grillner and colleagues show that cats can produce fictive locomotion in the absence of sensory input.

      In the Takeoka experiments it is difficult to assess the effect of removing somatosensory feedback because animals can simply decide to not step at higher speeds to avoid injury. Their mice deprived of somatosensory feedback can walk at slow speeds, likely thanks to voluntary commands, and cannot do so at higher speeds because (1) maybe somatosensory feedback is indeed necessary and/or (2) because they feel threatened because of impaired posture and poor control in general. In other words, they choose to not walk at faster speeds to avoid injury.

      Fictive locomotion by definition is without phasic somatosensory feedback as the animals are curarized or studies are performed in isolated spinal cord preparations. Depending on the preparation, pharmacology or brainstem stimulation is required to evoke fictive locomotion. If animals are deafferented, pharmacology or brainstem stimulation are required to induce fictive locomotion to offset the loss of spinal neuronal excitability provided by primary afferents. At the same time, our preliminary analysis of old fictive locomotion data in the University of Manitoba Spinal Cord center (Drs. Markin and Rybak had an official access to these data base during our collaboration with Dr. David McCrea) has shown that the frequency of stable fictive locomotion in cats usually exceeded 0.6 - 0.7 Hz, which approximately corresponds to the speed above 0.3 - 0.4 m/s. These data and estimation are just approximate; they have not been statistically analyzed and published and hence have not been included in our paper.

      Discussion. The statement that sensory feedback is required for animals to locomote may need to be qualified. Animals need some sensory feedback to locomote is perhaps better. For example, lesion studies by Rossignol in the early 2000s showed that cutaneous feedback from the paw was seemingly quite critical (in spinal cats). Also, see previous comments above.

      We changed this to: “… requires some sensory feedback to locomote, …”

      Figures

      Figure 1C. This figure is somewhat confusing. If intact cats do not walk (arrow), how are the data for swing and stance computed? Also raw traces would be useful to indicate that there is variability. Also, while duration is useful, would you not want to illustrate the co-efficient of variation as well as another way to show that the stepping pattern was inconsistent?

      This is probably a misunderstanding. The left panel of Fig. 1C superimposes data of intact cats from panel A (with speed range from 0.4 m/s to 1.0 m/s) and data from spinal cats from panel B (with speed range from 0.1 m/s and 1.0 m/s). Therefore, the left part of this left panel 1C (with speed range from 0.1 m/s to 0.4 m/s (pointed out by the arrow) corresponds only to spinal cats (not to intact cats). The standard deviations of all measurements are shown. All these figures were reproduced from the previous publications. We did not apply new statistical analysis to these previously published data/figures.

      Figure 4. 'All supraspinal drives (and their suppression of sensory feedback) are eliminated from the schematic shown in A. ' However, it is labelled 'brainstem drives,' which is confusing. Moreover, many of the abbreviations are confusing. Do you need l-SF-E1 in the figure, or could you call it 'Feedback 1' and then refer to l-SF-E1 in the legend? The same goes for βr, etc. Can they move to the legend?

      In the intact model (Fig. 4A), we have supraspinal drives (𝛼𝐿 and 𝛼𝑅, and  𝛾𝐿 and 𝛾𝑅 ), some of which provide presynaptic inhibition of sensory feedback (SF-E1 and SF-E2) as shown in Fig. 4A. In spinaltransected model (Fig. 4B), the above brainstem drives and their effects (presynaptic inhibition) on both feedback types are eliminated (therefore, there is no label “Brainstem drives in Fig. 4B). Also, we do not see a strong reason to change the feedback names, since they are explained in the text.

      I appreciate the detail of these figures, but they are difficult to conceptualize. They are useful in the context of 3C. Perhaps move this figure to supplementary and then show the proposed schematics for the system operating at slow, medium, and fast speeds in a replacement figure?

      We apologize for the resistance, but we would like to keep the current presentation.

      There is a lack of raw data (models or experimental) data reinforcing the figures. I would add these to all figures, which would nicely complement the graphs.

      These raw data can be found in the cited manuscripts. It would be the same figures.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1) Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2) Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3) Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4) Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5) Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author response:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The conclusions of the in vitro experiments using cultured hippocampal slices were well supported by the data, but aspects of the in vivo experiments and proteomic studies need additional clarification.

      (1) In contrast to the in vitro experiments in which a γ-secretase inhibitor was used to exclude possible effects of Aβ, this possibility was not examined in in-vivo experiments assessing synapse loss and function (Figure 3) and cognitive function (Figure 4). The absence of plaque formation (Figure 4B) is not sufficient to exclude the possibility that Aβ is involved. The potential involvement of Aβ is an important consideration given the 4-month duration of protein expression in the in vivo studies.

      Response: We appreciate the reviewer for raising this question. While our current data did not exclude the potential involvement of Aβ-induced toxicity in the synaptic and cognitive dysfunction observed in mice overexpressing β-CTF, addressing this directly remains challenging. Treatment with γ-secretase inhibitors could potentially shed light on this issue. However, treatments with γ-secretase inhibitors are known to lead to brain dysfunction by itself likely due to its blockade of the γ-cleavage of other essential molecules, such as Notch[1, 2]. As a result, this approach is unlikely to provide a definitive answer, which also prevents us from pursuing it further in vivo. We hope the reviewer understands this limitation and agrees to a discussion of this issue in the revised manuscript instead.

      (2) The possibility that the results of the proteomic studies conducted in primary cultured hippocampal neurons depend in part on Aβ was also not taken into consideration.

      Response: We thank the reviewer for raising this interesting question. In the revised manuscript, we plan to address this experimentally by using a γ-secretase inhibitor to investigate the potential contribution of Aβ in this study.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors' use of sparse expression to examine the role of β-CTF on spine loss could be a useful general tool for examining synapses in brain tissue.

      Response: We thank the reviewer for these comments. Indeed, it is a very robust assay and we would like to share this method with the scientific community as soon as possible.

      Additional context that might help readers interpret or understand the significance of the work:

      The discovery of BACE1 stimulated an international effort to develop BACE1 inhibitors to treat Alzheimer's disease. BACE1 inhibitors block the formation of β-CTF which, in turn, prevents the formation of Aβ and other fragments. Unfortunately, BACE1 inhibitors not only did not improve cognition in patients with Alzheimer's disease, they appeared to worsen it, suggesting that producing β-CTF actually facilitates learning and memory. Therefore, it seems unlikely that the disruptive effects of β-CTF on endosomes plays a significant role in human disease. Insights from the authors that shed further light on this issue would be welcome.

      Response: We would like to express our gratitude to the reviewer for raising this interesting question. It remains puzzling why BACE1 inhibition has failed to yield benefits in AD patients, while amyloid clearance via Aβ antibodies has been shown to slow disease progression. One possible explanation is that pharmacological inhibition of BACE1 may not be as effective as genetic removal. Indeed, genetic depletion of BACE1 leads to the clearance of existing amyloid plaques[3], whereas its pharmacological inhibition slows plaque growth and prevents the formation of new plaques but does not stop the growth of the existing ones[4]. We think the negative results of BACE1 inhibitors in clinical trials may not be sufficient to rule out the potential contribution of β-CTF to AD pathogenesis. Given that cognitive function continues to deteriorate rapidly in plaque-free patients after 1.5 years of treatment with Aβ antibodies in phase three clinical studies[5], it is important to consider the possible role of other Aβ-related fragments, such as β-CTF. We will include some further discussion in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigate the potential role of other cleavage products of amyloid precursor protein (APP) in neurodegeneration. They combine in vitro and in vivo experiments, revealing that β-CTF, a product cleaved by BACE1, promotes synaptic loss independently of Aβ. Furthermore, they suggest that β-CTF may interact with Rab5, leading to endosomal dysfunction and contributing to the loss of synaptic proteins.

      Response: We would like to thank the reviewer for his/her insightful suggestions. We have addressed the specific comments in following sections.

      Weaknesses:

      Most experiments were conducted in vitro using overexpressed β-CTF. Additionally, the study does not elucidate the mechanisms by which β-CTF disrupts endosomal function and induces synaptic degeneration.

      Response: We would like to thank the reviewer for this insightful comment. While a significant portion of our experiments were conducted in vitro, the main findings were also confirmed in vivo (Figures 3 and 4). Repeating all the experiments in vivo would be challenging and may not be necessary. Regarding the use of overexpressed β-CTF, we acknowledge that this is a common issue in neurodegenerative disease studies. These diseases progress slowly over many years, sometimes even decades in patients. To model this progression in cell or mouse models within a time frame feasible for research, overexpression of certain proteins is often required. While not ideal, it is sometimes unavoidable. Since β-CTF levels are elevated in AD patients[6], its overexpression is a reasonable approach to investigate its potential effects.

      We did not further investigate the mechanisms by which β-CTF disrupted endosomal function because our preliminary results align with previous findings. Kim et al. demonstrated that β-CTF recruits APPL1 (a Rab5 effector) via the YENPTY motif to Rab5 endosomes, where it stabilizes active GTP-Rab5, leading to pathologically accelerated endocytosis, endosome swelling and selectively impaired transport of Rab5 endosomes[6]. In our manuscript, we observed that co-expression of Rab5S34N with β-CTF effectively mitigated β-CTF-induced spine loss in hippocampal slice cultures (Figures 6I-J), indicating that Rab5 overactivation-induced endosomal dysfunction contributed to β-CTF-induced spine loss, which was consistent with their conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Most previous studies have focused on the contributions of Abeta and amyloid plaques in the neuronal degeneration associated with Alzheimer's disease, especially in the context of impaired synaptic transmission and plasticity which underlies the impaired cognitive functions, a hallmark in AD. But processes independent of Abeta and plaques are much less explored, and to some extent, the contributions of these processes are less well understood. Luo et all addressed this important question with an array of approaches, and their findings generally support the contribution of beta-CTF-dependent but non-Abeta-dependent process to the impaired synaptic properties in the neurons. Interestingly, the above process appears to operate in a cell-autonomous manner. This cell-autonomous effect of beta-CTF as reported here may facilitate our understanding of some potentially important cellular processes related to neurodegeneration. Although these findings are valuable, it is key to understand the probability of this process occurring in a more natural condition, such as when this process occurs in many neurons at the same time. This will put the authors' findings into a context for a better understanding of their contribution to either physiological or pathological processes, such as Alzheimer's. The experiments and results using the cell system are quite solid, but the in vivo results are incomplete and hence less convincing (see below). The mechanistic analysis is interesting but primitive and does not add much more weight to the significance. Hence, further efforts from the authors are required to clarify and solidify their results, in order to provide a complete picture and support for the authors' conclusions.

      Response: We would like to thank the reviewer for the constructive suggestions. We have addressed the specific comments in following sections.

      Strengths:

      (1) The authors have addressed an interesting and potentially important question

      (2) The analysis using the cell system is solid and provides strong support for the authors' major conclusions. This analysis has used various technical approaches to support the authors' conclusions from different aspects and most of these results are consistent with each other.

      Response: We would like to thank the reviewer for these comments.

      Weaknesses:

      (1) The relevance of the authors' major findings to the pathology, especially the Abeta-dependent processes is less clear, and hence the importance of these findings may be limited.

      Response: We would like to thank the reviewer for pointing this out. Phase 3 clinical trial data for Aβ antibodies show that cognitive function continues to decline rapidly, even in plaque-free patients, after 1.5 years of treatment[5]. This suggests that plaque-independent mechanisms may drive AD progression. Therefore, it is crucial to consider the potential contributions of other Aβ species or related fragments, such as alternative forms of Aβ and β-CTF. While it is too early to definitively predict how β-CTF contributes to AD progression, it is notable that β-CTF, rather than Aβ, induced synaptic deficits in mice, which recapitulates a key pathological feature of AD. Ultimately, the true role of β-CTF in AD pathogenesis can only be confirmed through clinical studies.

      (2) In vivo analysis is incomplete, with certain caveats in the experimental procedures and some of the results need to be further explored to confirm the findings.

      Response: We would like to thank the reviewer for this suggestion. We plan to correct these caveats in the revised manuscript.

      (3) The mechanistic analysis is rather primitive and does not add further significance.

      Response: We would like to thank the reviewer for this comment. We did not delve further into the underlying mechanisms because our analysis indicates that Rab5 dysfunction underlies β-CTF-induced endosomal dysfunction, which is consistent with another study and has been addressed in detail there[6]. We hope the reviewer could understand that our focus in this paper is on how β-CTF triggers synaptic deficits, which is why we did not investigate the mechanisms of β-CTF-induced endosomal dysfunction further.

      References:

      1. GüNER G, LICHTENTHALER S F. The substrate repertoire of γ-secretase/presenilin [J]. Seminars in cell & developmental biology, 2020, 105: 27-42.
      2. DOODY R S, RAMAN R, FARLOW M, et al. A phase 3 trial of semagacestat for treatment of Alzheimer's disease [J]. The New England journal of medicine, 2013, 369(4): 341-50.
      3. HU X, DAS B, HOU H, et al. BACE1 deletion in the adult mouse reverses preformed amyloid deposition and improves cognitive functions [J]. The Journal of experimental medicine, 2018, 215(3): 927-40.
      4. PETERS F, SALIHOGLU H, RODRIGUES E, et al. BACE1 inhibition more effectively suppresses initiation than progression of β-amyloid pathology [J]. Acta Neuropathol, 2018, 135(5): 695-710.
      5. SIMS J R, ZIMMER J A, EVANS C D, et al. Donanemab in Early Symptomatic Alzheimer Disease: The TRAILBLAZER-ALZ 2 Randomized Clinical Trial [J]. Jama, 2023, 330(6): 512-27.
      6. KIM S, SATO Y, MOHAN P S, et al. Evidence that the rab5 effector APPL1 mediates APP-βCTF-induced dysfunction of endosomes in Down syndrome and Alzheimer's disease [J]. Molecular psychiatry, 2016, 21(5): 707-16.
    1. Author Response

      eLife assessment

      Tilk and colleagues present a computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding – an important finding with general implications for our understanding of cancer evolution.

      We thank the reviewers for these kind words. The summary statement and public review highlight our work in understanding how human tumors phenotypically respond to mutational load by assessing changes in gene expression. This work provides a mechanistic underpinning to our previous finding that the accumulation of passenger mutations in tumors creates a substantial cost because even substantially damaging passenger mutations can fix in non-recombining clonal tumor lineages. At the same time, we believe the summary statement and the public review do not mention a key remaining part of our paper that validates our findings and establishes causal connections between protein misfolding due to coding passenger mutations and tumor fitness. Specifically, we replicate and cross-validate our findings in human tumors by examining expression responses in an independent dataset of cancer cell lines (CCLE), where we demonstrate similar expression responses to an accumulation of mutations, indicating generic, cell intrinsic responses. We then establish a causal link by demonstrating that mitigation of protein misfolding through protein degradation and re-folding is necessary for high mutational load cancer cells to maintain viability through perturbation experiments via shRNA known-down and treatment with targeted agents. These analyses and results are important because they show that the adaptive responses we observe are evidence of a generic, cell intrinsic phenomenon that cannot be explained by organismal effects, such as aging, changes in the immune system or microenvironment. 

      Joint Public Review:

      Tilk and colleagues present a computational investigation of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental and that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding.

      The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and additional effects for tumor homogeneity and type. This analysis identified a large number of genes (5000) that are more highly expressed at high mutational load at a FDR of 0.05. These genes are enriched in many core categories, most prominently in the proteasome, translation, and mitochondral translation. The authors then proceed to investigate specific categories of upregulated genes further.

      The individual reviews, and the discussion among the reviewers, raised several issues that could potentially undermine or weaken some of the findings presented in this paper.

      1) Systematic differences in expression of some genes from one tumor class to another might generate spurious associations with mutational load (ML), which would affect the results presented in Figs 1 and 3. The case of a causal link between ML and over-expression of genes that mitigate deleterious effects of misfolding would be stronger if these results were replicated within single cancer types with many samples with different ML (similar to how Fig S6 relates to Fig 3). A related concern might be an association between increased variance of expression and ML. The compositional nature of expression data could generate trends like the ones shown in Fig. 2 with changing variance.

      We agree with the reviewers that possible confounders should be considered since TCGA data is heterogeneous. In this paper, we investigated possible confounders such as multicollinearity with different mutational types (SNVs and CNVs), controlled for expression responses within cancer types in the GLMM, and used the jackknifing procedure to ensure that no one cancer type dominates the signal. However, in principle unknown hidden confounders could remain, which is why a large part of our paper was focused on validating these effects in an independent dataset (CCLE) where many other covariates are not relevant (immune system, donor variability, stage, age, sex, etc.). Importantly, we also used data from perturbation screens that are completely orthogonal to expression responses in CCLE to get at a cause and effect. 

      Our reasoning for using all of the data in Figure 1 while controlling for differences due to cancer type in the GLMM was to maximize the variation in mutational load across all of the samples in this dataset to identify what genes increase in expression as mutational load increases over 5 orders of magnitude. As noted here, we also already further validated that the signal we observe in Figure 1 is still robust for our gene sets of interest within cancer types in Supplemental Figure 6.

      2) Fig 4, Fig S5 and Fig S8 show results for the regression coefficient of expression on ML after leaving out one cancer at a time. All of us initially read this as results for 'one cancer at a time', rather than 'leave-one-out'. These figures are used to argue that the results are not driven by specific cancer types. However, this analysis would not reveal if the signal was driven by a (small) subset of cancer types. To justify claims like "significant negative relationship between mutational load and cell viability across almost all cancer types", one needs to analyze individual cancer types. Results for specific genes, rather than broad groups would also help interpret these results.

      Our reasoning for grouping together genes in Figure 4 was because the shRNA screen was done on a single gene at a time, and we were interested in measuring the joint effect on viability after knocking down all of the genes in a given complex. 

      Given that the expression responses in Figure 3 already validate within cancer types in TCGA in Supplemental Figure 6, we believe that it’s very unlikely that the signal we observe is driven by individual cancer types or smaller groups of cancer types. In addition, we did not perform a within cancer analysis in CCLE for Figure 4, because not all available cancer types in CCLE were profiled evenly in the shRNA screen (Total < 300). The vast majority of cancer types in CCLE for the shRNA screen (23/26) have sample sizes <20 within each group that we believe are unlikely to lead to meaningful results that are not driven by noise.

      3) You use different model architecture for the TCGA and CCLE analysis because you suspect that the sample size imbalance in the latter might mean that a GLMM can not capture the different variance components accurately. Did you test this? Could you downsample to avoid this? Cancer type is likely a strong confounder of ML.

      That was indeed our reasoning, that within group sample sizes in CCLE are too low to robustly estimate variance within cancer types. Given that many cancer types have <20 samples within each group, we don’t think that evenly downsampling would enable us to get an estimate not driven by noise. As noted above, our approach to control for this was to perform a jackknifing procedure that eliminates a single cancer type at a time and re-estimates the effect. 

      4) In the splicing analysis (Fig 2 and Fig S4), you report a 10% variation in splicing for a 100-fold variation in ML. This weak trend is replicated in very similar ways for many different types of alternative splicing events. It is not clear why different events (exon skipping, intron retention, etc) should respond in the same way to ML. A weak but homogeneous effect like the one shown here might result from some common confounder (see point 1). Similarly, it is not clear why with increasing intron retention PSI threshold the fraction of under-expressed transcripts would decrease and not increase.

      We agree that the effects of all the different alternative splicing effects are complex. Our focus was on intron retention, which is known to occur in cancer (Lindeboom, et. al 2016, Nature Genetics), and our analysis is consistent with the idea that damaging passenger mutations can shift cellular phenotypic states that require the use of many different mechanisms to mitigate protein misfolding.

      For Figure S4, as the PSI threshold for calling an alternative splicing event increases, fewer samples are called as having an intron retention event in the gene. This uniformly decreases the numerator across all the mutational load bins, so that when the threshold is increased the fraction of under-expressed transcripts with intron retention events is lower.

    1. Author Response

      We thank the reviewers for their positive comments and constructive feedback following their thorough reading of the manuscript. In this provisional reply we will briefly address the reviewer’s comments and suggestions point by point. In the forthcoming revised manuscript, we will more thoroughly address the reviewer’s comments and provide additional supporting data.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we will revise the text to clarify that the random clustering is random with respect to any future, novel environment. The cause of clustering could be prior experiences (e.g. Bourjaily M & Miller P, Front. Comput. Neurosci. 5:37, 2011) or developmental programming (e.g. Perin R, Berger TK, & Markram H, Proc. Natl. Acad. Sci. USA 108:5419, 2011).

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (2.1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2.2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Based on our finding that preplay occurs only in networks that sustain cluster activity over multiple decoding time bins (Figure 5d-e), our understanding of the model’s function is consistent with the reviewers first explanation. We will provide additional analysis in the forthcoming revised manuscript in order to directly test the first explanation and will also test the intriguing possibility that the reviewer’s second suggestion contributes to above-chance preplay.

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We will revise the text and add illustrative figures.

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We will test another type of small-world network if time permits.

      Our discussion of “cluster overlap” is specific to our type of small-world network in which there is no pre-determined spatial dimension (unlike the ring network of Watts and Strogatz). Therefore, because clusters map randomly to location once a particular spatial context is imposed, the random overlap between clusters produces long-range connections in that context (and any other context) so one can think of the amount of overlap between clusters as representing the number of long-range connections in a Watts-Strogatz model, except, we wish to iterate, such models involve a spatial topology within the network, which we do not include.

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Yes, we are highly confident that the clusters in our network would correspond to the functional assemblies that have been studied through assembly analysis and will present the relevant data in a revision.

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally and will test this if time permits.

      Reviewer # 2

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We agree that this is an important question, and we plan to run further simulations where we test the effects of varying the simulated speed. We will present results in the resubmission.

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to different models of feedforward input is important and we plan to do this in our revised manuscript for the linear track and W-track.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 7b the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson RE et al, Hippocampus 6:281, 1996; Pavlides C, et al, Neurobiol Learn Mem 161:122, 2019). Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated. In our revised manuscript we will address this point more carefully and cite the relevant experimental support.

      Reviewer # 3

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input so will present such additional results in our revised version. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      First, we apologize for a lack of clarity if we have caused confusion about the type of inputs (linear and cluster-dependent as we had attempted to portray prominently in Figure 1, where it is described in the caption, l. 156-157, and Results, l. 189-190 & l. 497-499, as well as in the Methods, l. 671-683) and if we implied an absence of spatially-tuned information in the network. In the revision we will clarify that for reliable place fields to appear, the network must receive spatial information and that one point of our paper is that the information need not arrive as peaks of external input already resembling place cells or grid cells. We chose linearly ramping boundary inputs as the minimally place-field like stimulus (that still contains spatial information) but in our revision we will include alternatives. We should note that during sleep, when “preplay” occurs, there is no such spatial bias (which is why preplay can equally correlate with place field sequences in any context). In the revision, we will update Figure 1 to show more clearly the cluster-dependent linearly ramping input received by some specific cells with both similar and different place fields.

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells.

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we will revise the text to specifically address CA3 connectivity (Guzman et al., Science 353 (6304), 1117-1123 2016) and the small-world structure therein due to the presence of “assemblies”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable insights into how chromatin-bound PfMORC controls gene expression in the asexual blood stage of Plasmodium falciparum. By interacting with key nuclear proteins, PfMORC appears to affect expression of genes relating to host invasion and subtelomeric var genes. Correlating transcriptomic data with in vivo chromatin insights, the study provides solid evidence for the central role of PfMORC in epigenetic transcriptional regulation through modulation of chromatin compaction.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study provides valuable insights into the role of PfMORC in Plasmodium's epigenetic regulation, backed by a comprehensive methodological approach. The overarching goal was to understand the role of PfMORC in epigenetic regulation during asexual blood stage development, particularly its interactions with ApiAP2 TFs and its potential involvement in the regulation of genes vital for Plasmodium virulence. To achieve this, they conducted various analyses. These include a proteomic analysis to identify nuclear proteins interacting with PfMORC, a study to determine the genome-wide localization of PfMORC at multiple developmental stages, and a transcriptomic analysis in PfMORCHA-glmS knockdown parasites. Taken together, this study suggests that PfMORC is involved in chromatin assemblies that contribute to the epigenetic modulation of transcription during the asexual blood stage development.

      Strengths:

      The study employed a multi-faceted approach, combining proteomic, genomic, and transcriptomic analyses, providing a holistic view of PfMORC's role. The proteomic analysis successfully identified several nuclear proteins that may interact with PfMORC. The genome-wide localization offered valuable insights into PfMORC's function, especially its predominant recruitment to subtelomeric regions. The results align with previous findings on PfMORC's interaction with ApiAP2 TFs. Notably, the authors meticulously contextualized their findings with prior research, including pre-prints, adding credibility to their work.

      Weaknesses:

      While the study identifies potential interacting partners and loci of binding, direct functional outcomes of these interactions remain an inference. The authors heavily rely on past research for some of their claims. While it strengthens some assertions, it might indicate a lack of direct evidence in the current study for particular aspects. The declaration that PfMORC may serve as an attractive drug target is substantial. While the data suggests its involvement in essential processes, further studies are required to validate its feasibility as a drug target.

      Reviewer #2 (Public Review):

      Summary:

      This is a paper entitled "Plasmodium falciparum MORC protein modulates gene expression through interaction with heterochromatin" describes the role of PfMORC during the intra-erythrocytic cycle of Plasmodium falciparum. Garcia et al. investigated the PfMORC-interacting proteins and PfMORC genomic distribution in trophozoites and schizonts. They also examined the transcriptome of the parasites after partial knockdown of the transcript.

      Strengths:

      This study is a significant advance in the knowledge of the role of PfMORC in heterochromatin assembly. It provides an in-depth analysis of the PfMORC genomic localization and its correlation with other chromatin marks and ApiAP2 transcription factor binding.

      Weaknesses:

      However, most of the conclusions are based on the function of interacting proteins and the genomic localization of the protein. The authors did not investigate the direct effects of PfMORC depletion on heterochromatin marks. Furthermore, the results of the transcriptomic analysis are puzzling as 50% of the transcripts are downregulated, a phenotype not expected for a heterochromatin marker.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      • Figure 1A and Table 1: the authors should incorporate a volcano plot in their proteomic results presentation. This graphical representation can provide a more intuitive grasp of the most relevant proteins associated with PfMORC in terms of both their abundance and significance. It will aid in swiftly pinpointing proteins with the most notable differential associations. This will complement the comprehensive overview provided by the authors, referencing past research where PfMORC was detailed.

      We thank the reviewer for the suggestion. We agree with the reviewer that the volcano plot we now provide does indeed bring comprehensive information on associations between PfMORC and other cellular proteins. The volcano plot presented in the revised manuscript as Figure 1A, was generated using the normalized MS/MS counts from the anti-GFP and 3D7 (control) proteomics datasets (n=3). The potential PfMORC interacting proteins were determined using the fold changes and p-values between the two datasets, as provided in Table 1.

      Several protein interactors were strongly supported by statistical analysis (p-value), while others showed weaker p-value due to variability between replicates. Indeed, the total number of proteins identified in the three replicates, shown in the Venn diagram (Supplemental Figure 1D), exhibits a good overlap between the replicates but a lower number of identified proteins in the GFP-E1 sample. This variability was observed also in the statistical analysis. Indeed, by analyzing the GFP/3D7 ratios, some proteins have a significant difference in abundance (fold change greater than 1.5x) in one of the groups but do not meet the statistical threshold. For more clarity, we have included the -log p-value for the proteins listed in Table 1.

      Overall, these results demonstrate that many ApiAP2 proteins and several chromatin-associated factors interact with PfMORC.

      • Given the plethora of proteins detected in the PfMORC eluate, it raises the question of how many are genuine MORC interactors versus those that are merely nearby molecules acting adjacently. These might incidentally end up in the immunoprecipitate due to unintended interactions with DNA or chromatin. While the M&M section mentions that the beads were thoroughly washed, there is no specification about the washing buffer or its stringency (i.e., salinity level). At higher salinities, one could isolate core complexes of interactors associated with DNA or even RNA carryover.

      We apologize for this omission and have now added the buffer composition used to wash the beads. This section now reads "To perform the co-immunoprecipitation we followed the manufacturer protocol (ChromoTek, gta-20). Samples were lysed in modified RIPA buffer (50 mM Tris, pH 7.5, 150 mM NaCl, 0.5% sodium deoxycholate, 1% Nonidet P-40, 10 µg/ml aprotinin, 10 µg/ml leupeptin, 10 µg/ml, 1 mM phenylmethylsulfonyl fluoride, benzamidine) for 30 min on ice. The lysate was precleared with 50 µl of protein A/G-Agarose beads at 4°C for 1 h and clarified by centrifugation at 10,000 × g for 10 min. The precleared lysate was incubated overnight with an anti-GFP antibody using anti-GFP-Trap-A beads (ChromoTek, gta-20). The magnetic beads were then pelleted using a magnet (Invitrogen) and washed 3 times with wash buffer (10 mM Tris/Cl pH 7.5, 150 mM NaCl, 0.05 % Nonidet™ P40 Substitute, 0.5 mM EDTA)."

      We used the same salt concentration for immunoprecipitation as was used in the lysis buffer to minimize the binding of non-specific proteins. The wash buffer composition is updated in the revised manuscript. The immunoprecipitations were done in biological triplicates to ensure reproducibility and statistical support. A number of proteins are common across all three replicates. We also used wild-type parasites (non-GFP) as a negative control to eliminate non-specific hits, and we used a log2-fold change ≥1.5 relative to wild type parasites as our cutoff between the comparison groups.

      We believe that these conditions provide the stringency required to identify high confidence PfMORC interacting proteins, although this still leaves a possibility for additional lower affinity interactions. Future studies will certainly follow up candidate interaction partners to better define this complex. However, the complexity of the complex resembles that reported previously in Toxoplasma gondii (Farhat et al. 2020, Nat Microbiol) as well another report on the PfMORC complexes: https://elifesciences.org/reviewed-prepri nts/92499

      • The authors demonstrate that PfMORC creates distinct peaks in and around HP1-bound areas (Figure 2F), hinting at a specific role for PfMORC in heterochromatin compaction, boundary definition, and gene silencing. This pattern is clearly depicted in an example in Figure 2F. It would be beneficial to know if this enrichment profile is replicated elsewhere and, if so, it would be worthwhile to quantify it.

      This is an excellent point. Yes, this pattern is seen across the entire genome, where PfMORC is apposed to PfHP1-bound heterochromatic regions. As indicated in the manuscript, we have quantified this effect genome-wide; however, since we already display compiled data for Chromosome 2 (at both chromosome ends) pertaining to the position of PfMORC relative to PfHP1 we do not feel it is essential to provide such a figure for the entire genome as it does not alter the central message of our manuscript. Figure 2F is representative of the genome-wide distribution of PfMORC relative to PfHP1. The raw genome-wide data are available in Supplementary Information for further inspection of specific loci on other chromosomes.

      Recommendations for improving the writing and presentation.

      MAIN TEXT

      Panel e, referenced both in the main text and legend, is missing from Figure 4. This missing panel represents a significant finding of the study, highlighting according to the authors a low correlation between ChIP-seq gene targets and RNA-seq DEGs. This observation implies that PfMORC's global occupancy is more aligned with shaping chromatin architecture than directly regulating specific gene targets. In light of this, the authors should rephrase parts of their manuscript (including abstract and title) to avoid suggesting that PfMORC acts primarily (directly) as a gene regulator, emphasizing instead its role in influencing the topological structure of chromosomes.

      We have modified the title as suggested by the reviewer to more accurately reflect that PfMORC modulates chromatin architecture rather than acting as a direct regulator of specific genes. Our new title is: A Plasmodium falciparum MORC protein complex modulates epigenetic control of gene expression through interaction with heterochromatin

      We apologize for the omission of Figure 4e, which is now included in the revised manuscript. We found PfMORC occupancy on all chromosomes at subtelomeric regions, which are known to harbor genes related to immune evasion and antigenic variation (including most of the var genes). This study is also in agreement with Bryant et al. (PMID 32816370) which reported PfMORC occupancy along with PfISW1 at var gene promoters. PfMORC has also been identified in complexes with various ApiAP2 proteins in a proteome-wide study (Hillier et al. Cell Rep, PMID 31390575), as well as in immunoprecipitations of PfAP2-G2 (Singh et al., Mol Micro, PMID 33368818) and PfAP2-P (Subudhi et al., Nat Microbiol, PMID 37884813). The recent study by Subudhi et al. reports that PfAP2-P is involved in the regulation of var gene expression, antigenic variation, trophozoite development and parasite egress. It is therefore possible that PfMORC may have different effects on transcriptional regulation through interactions with different ApiAP2 transcription factors. Our comparison of PfMORC with known ApiAP2 protein occupancy reveals a high level of overlap, indicating that PfMORC may affect gene expression in various ways throughout the asexual cycle. Additionally, Hillier et al. show that PfMORC interaction is not limited to ApiAP2 but also implicates several other chromatin remodellers, which is consistent with our own results. We do not imply direct regulation of transcription via PfMORC in our manuscript. To the contrary, we suggest that it interacts with heterochromatin and thereby plays a role in the epigenetic control of asexual blood stage transcriptional regulation which is also clarified in the revised abstract.

      Another limitation of differential gene expression was use of the glmS ribozyme system, which resulted in only 50% depletion of the PfMORC transcript. There may still be enough PfMORC to rescue the gene expression we could not detect correctly. Therefore, it is challenging to interpret the function of PfMORC in only chromatin architecture but not in gene expression.

      If we believe that PfMORC in Plasmodium isn't mainly adjusting gene expression, the authors' suggestion that MORC is targeted by some AP2s becomes puzzling. How do we make sense of these different ideas? The authors need to clarify this to maintain consistency in their findings.

      Based on our data, we hypothesize that PfMORC acts as an accessory protein for ApiAP2 transcription factors. In a number of studies, including ours and the concurrent publication in eLife (https://elifesciences.org/reviewed-preprints/92499), PfMORC co-IPed with several ApiAP2 proteins, suggest it has multiple functions. In our previous study we showed that PfMORC expression is highest in mid and late asexual stages. A comparison of the PfMORC occupancy with 6 ApiAP2 (having different expression profile) suggest plasticity in PfMORC function. We have revised our discussion to make this hypothesis more transparent for the readers.

      The authors should cite Farhat et al. 2020 (Extended Data Fig. 1a), as it similarly identified 3 different ELM2-containing proteins in Toxoplasma MORC-associated complexes. This previous work provides context and supports the observations made with PfMORC in this study.

      Thank you for the suggestion and pointing out this omission. We have indeed cited the work of the Farhat group in the original manuscript and have now included this additional reference to corroborate the text and provide further support to our conclusions.

      Minor corrections to the text and figures.

      • Panel e is missing from Figure 4.

      As mentioned above Panel e is now included in Figure 4.

      • The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. As it stands, this is not really up to standard.

      We have elaborated the captions with more detailed descriptions, and we now provide additional information where further clarification was necessary.

      Reviewer #2 (Recommendations For The Authors):

      • The study lacks a direct correlation between the inferred function of PfMORC and the heterochromatin state of the genome after its depletion. It would be interesting to perform chip-seq on known heterochromatin markers such as H3K9me3, HP1 or H3K36me2/3 to measure the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      While the proposed experiments are certainly interesting, they are beyond the scope of this study. The current manuscript is focused on PfMORC occupancy, its interacting partners, and its impact on differential gene regulation after PfMORC depletion in asexual parasites. Nonetheless, we did in fact compared the PfMORC occupancy with that of various heterochromatin markers (H2A.Z, H3K9ac, H3K4me3, H3K27ac, H3K18ac, H3K9me3, H3K36me2/3, H4K20me3, and H3K4me1) at 30hpi and 4hpi time points. These data are presented in Supplemental Figure 9. We did not find any significant colocalization, but documented the presence of PMORC in H3K36me2 depleted regions.

      • The PfMORC depletion was performed using a glms-based genetic system and the reviewer did not find any quantification of the depletion level at 24h or 36h. This is particularly important as the authors present RNA-seq data at these time points.

      We would like to clarify that RNA-seq was performed on 32hpi parasites after approximately 48 h treatment with 2.5 mM GlcN. At the trophozoite and schizont stage, PfMORC expression is high, which is why we selected these time points for RNA-seq (32hpi) and ChIP-seq (30hpi and 40hpi). PfMORC protein expression after GlcN treatment is analyzed in our previous paper (Singh et al., Sci Rep, PMID 33479315), where treatment with 2.5 mM GlcN leads to 50% reduction in PfMORC transcript at 32hpi. This is referenced in the Results section; we decided not to repeat the same experiment in the current manuscript.

      • The authors performed a thorough analysis of the correlations between ApiAP2 binding, histone modification and genomic localization of PfMORC (their chip-seq data). However, they found an inverse relationship between H3K36me2, a known histone repressive mark, and PfMORC genomic localization. This is particularly surprising when PfMORC itself is presented as a heterochromatin marker. The wording of this data is confusing in the results section (lines 257-258) and never discussed further. This important data should at least be discussed to make sense of this apparent contradiction.

      H3K36me2 indeed acts as a global repressive mark in P. falciparum. However, our hypothesis implies that PfMORC not only overlaps with H3K36me2 depleted region, but also interacts with other epigenetic regulators. Therefore, we propose that PfMORC is part of chromatin remodeling complexes involved in heterochromatin dynamics. Moreover, we did not see any overlap between several other heterochromatin markers, suggesting it has a unique binding preference not shared with other heterochromatin markers. Based on this study and parallel work submitted by Chahine et al. (https://elifesciences.org/reviewed-preprints/92499#abstract), it is evident that PfMORC is crucial for gene regulation and chromatin structure maintenance as shown in other organisms. Currently, we do not know what the apparent mutual exclusion between H3K36me2 and PfMORC implies mechanistically or how PfMORC interaction with heterochromatin aids in chromatin integrity. In Arabidopsis thaliana, MORC binding leads to chromatin compaction and reduces DNA accessibility to transcription factors, thereby repressing gene expression. In P. falciparum, overlap in the binding region of PfMORC with different transcription factors suggests several possibilities that require further investigation. Since there is only one gene encoding a PfMORC protein in P. falciparum, it is possible that PfMORC function is not limited to chromatin integrity, but it may also function to modulate gene expression at different stages. To fully explore the function of PfMORC will require investigating the functional role of the other interacting partners we and others have identified.

      We have modified the result section per the reviewer's suggestion, and we now also discuss this finding in more detail in the discussion section.

      • The ChIP-seq data are central to this manuscript. However, the presentation of this data in Figure 2A suggests that it is very noisy (particularly for Chr1). It would be of interest to present the called peaks together with the normalized data so that the reader can assess the quality of the ChIP-seq data.

      Our results clearly demonstrate the enrichment of PfMORC in sub-telomeric regions and internal heterochromatic islands. These results are consistent across all of our replicates taken at two independent time points of parasite asexual blood stage development and correlate well with the results of Le Roch: https://elifesciences.org/reviewed-preprints/92499. The raw data files have been provided and can be re-analyzed by any user.

      • The RNA-seq data showed that only a few genes are affected after 24 h of PfMORC depletion. Furthermore, there is an equal number of up- and down-regulated genes. It is not clear why depletion of a heterochromatin marker would induce down-regulation of genes. How these data relate to the partial depletion of PfMORC is not discussed.

      We would like to clarify that RNA-seq experiment was performed at 32hpi after GlcN following knockdown as previously described (Singh et al., Sci Rep, PMID 33479315). Briefly, synchronous, early trophozoites stage (24hpi) PfMORCglmS-HA parasites were treated with 2.5 mM GlcN until they reached the trophozoite stage (32 hpi) in the next cycle. These parasites were then collected for analysis by RNA-seq. We did not detect a substantial log-fold change at this point because only 50% of the transcripts were depleted in the glmS-based PfMORC knockdown system. However, we have seen a distinctive pattern of up (60) and down (103) regulated DEGs that are comprised of egress-related genes or surface antigens. We believe that PfMORC interacts with different ApiAP2 proteins, as shown in Figure 3A, and consequently exhibits multiple functions. This finding has now been corroborated in several other recent studies (See response to Reviewer 1 above).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Thank you for all your recommendations to improve the manuscript. We took them into account and tried to integrate them as much as possible in the paper. I understand that the main issue is the lack of genetic lineage tracing. Unfortunately, I am no longer in a position to perform experiments and as a consequence, we cannot bring these data. However, we previously performed several experiments that attest the ductal origin of the beta cells. As a reminder, we used experiment setting where beta cell regeneration occur from the ducts in the pancreatic tail; we used a genetic approach to over-express CaN specifically in the ducts at the level of the pancreas ; and we investigate the function of CaN under Notch repression, known to trigger beta cell formation from the ducts. Altogether, our data underline the contribution of the ductal cells. In addition, as recommended by the editors, we showed that while the proportion of ductal cells EdU+ increase Figure 5 C-D, the number of ductal cells remain constant  Figure 5A supplemental. We integrate a paragraph in the discussion to remind all these points in the manuscript.  

      We thank you greatly for your time and consideration for this work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) The authors claimed that they examined the arterial and venous identity of the hyperbranched vessels via live imaging analysis of the high glucose-treated Tg(flt1:YFP::kdrl:ras-mCherry) line, and revealed that the hyperbranched ectopic vessels comprised arteries and veins. That's good, of course. However, there are no relevant results in Figure 2. Please revise it.

      Thank you very much for the suggestion. We’ve added this part of the results in Figure 2i and j.

      (2) In Figures 3f and 3g, some of the ECs protruded long and intricate sprouts, and nearly all the ECs within an ISV underwent the outgrowth of filopodia in some extreme cases (Figure 3g), suggesting that the high glucose treatment induced the endothelial differentiation into tip cell-like cells. The findings are surprising and interesting. In order to further confirm the author's conclusion, in situ hybridization experiments are more appropriate to show the expression changes of tip cell-like cell marker genes in the high glucose-treated embryos.

      Thank you very much for your constructive suggestions. We have performed the analysis of single-cell RNA-seq data, and the results showed that the tip cell marker genes such as esm1, apln, and cxcr4a were significantly up-regulated in arterial and capillary ECs after high glucose treatment. The results were integrated into Figure 3 of the revised manuscript.

      (3) Embryos treated with AS1842856 or injected with foxo1a-MO exhibited excessive angiogenesis (Figure 5g-i), suggesting the transcription activity of foxo1 is required to maintain the quiescent state of endothelial cells. Did the downregulation of foxo1a lead to the differentiation of endothelial cells into tip-cell-like cells?

      Thank you very much for the question. We examined our results carefully and marked these tip cell-like cells with arrow heads in Figure 5h of the revised manuscript.

      (4) Foxo1a was significantly downregulated in arterial and capillary ECs after high glucose treatment (Figure 5c-e). More importantly, whether overexpression of foxo1a in the high glucose-treated embryos could eliminate the hyperangiogenic characteristics?

      Thank you for the great questions. We performed rescue experiments, and the results suggested that the overexpression of foxo1a partially mitigated the excessive angiogenesis induced by high glucose treatment. These results were integrated into Figure 6 of the revised manuscript.

      (5) The authors' results found that foxo1a was enriched in both the predicted binding sites of marcksl1a by ChIP-PCR experiments (Figure 7d). This result is reliable. However, whether these two sites are important for marcksl1a gene transcription needs to be confirmed by relevant experiments, such as luciferase reporter assays.

      We’ve performed the luciferase reporter assays and added these data to Figure 8f and g.

      Reviewer #2:

      Suggested major experiments:

      (1) A previous study (Jorgens et al., Diabetes 64, 2015) reported that high tissue glucose levels increased reactive dicarbonyl methylglyoxal (MG) concentrations in zebrafish embryos and triggered the formation of hyperbranched ISVs. Additionally, they illustrated that MG induced the vascular hyperbranching phenotype via enhancing phosphorylated VEGFR and pAKT signaling cascade. The authors must examine whether both pVEGFR and pAKT are increased in noncaloric monosaccharide (NMS)-treated embryos. The authors need also to test the crosstalks between VEGFR/AKT signaling and foxo1a-Marcksl1a pathway in glucose or NMS-treated embryos.

      Thank you very much for your suggestion. We treated the embryos with AS1842856 (foxo1 inhibitor) and Lenvatinib (VEGFR inhibitor), and the results showed that Lenvatinib treatment attenuated the excessive angiogenesis induced by foxo1 inhibition. We also examined the expression level of vegfaa after AS1842856 treatment; the results suggested that foxo1 inhibition did not affect the expression of vegfaa.

      Author response image 1.

      (2) In this manuscript, the authors performed single endothelial cell sequencing in glucose-treated embryos, and found reduced foxo1a expression and upregulated marcksl1a . Based on these data, the authors demonstrated that glucose and NMS-induced excessive angiogenesis through the foxo1a-marcksl1a pathway. The authors must conduct endothelial scRNA-seq in NMS-treated embryos, and analyze and compare the datasets with scRNA-seq datasets from glucose-treated endothelial cells, considering the focus of the paper. In addition, ASBs have been suggested as healthy alternatives to sugar-sweetened beverages. The authors also need to examine carefully whether metabolic gene programs are altered in glucose-treated endothelial cells, which was mentioned in Jorgens et al paper.

      Thank you very much for your constructive suggestions. We have performed the whole embryo transcriptome sequencing after high D-glucose and L-glucose treatment. We analyzed and compared the differentially expressed genes of control, high D-glucose-treated, and high L-glucose-treated embryos. The results revealed that 1259 and 1074 genes were up-regulated significantly in high D-glucose and high L-glucose treated embryos, respectively, compared with control.

      We also analyzed some metabolic-related genes and found that some genes involved in gluconeogenesis, glycolysis, and oxidative phosphorylation were significantly changed. The results were integrated into supplementary Figure12 and 13 of the revised manuscript.

      (3) Glucose or NMS treatments induce the hyperbranched endothelial vessels from the dorsal aorta and ISVs but not cardinal veins. In Figure 4i, the arterial and capillary cell population is increased in glucose-treated embryos, but the venous cell population seems to be reduced. The authors need to check whether arterial/venous differentiation and proliferation are affected in glucose- and NMS-treated embryos.

      Thank you for your suggestions. We examined arterial/venous differentiation based on Tg(flt1BAC:YFP::kdrl:ras-mCherry) zebrafish line, in which the YFP is mainly expressed in arterial Endothelial cells. We found the endothelial cells of excessively formed blood vessels induced by high glucose treatment are mainly arterial (Figure 2j). This might explain why the arterial and capillary cell population was increased in glucose-treated embryos.

      (4) The manuscript proposes that excessively branched vessels within ISVs arise from the ectopic activation of quiescent endothelial cells (ECs) into tip cells. To confirm this process, the authors need to detect some specific tip cell markers to demonstrate their ectopic activation.

      Thank you for your constructive suggestions. We have performed the analysis of single-cell RNA-seq data, and the results showed that the tip cell marker genes such as esm1, apln, and cxcr4a were significantly up-regulated in arterial and capillary ECs after high glucose treatment. The results were integrated into Figure 3 of the revised manuscript.

      (5) Disaccharides such as lactose, maltose, and sucrose did not exhibit a notable induction of excessive angiogenic phenotype. However, the specific treatment concentrations utilized in the study were not delineated. Therefore, further investigation is warranted to determine whether increased disaccharide concentrations can cause vascular hyperbranching phenotype.

      Thank you very much for the suggestions. We’ve described the concentrations of monosaccharides and disaccharides in the materials and methods section of the revised manuscript. Following the suggestion, we treated zebrafish embryos with a higher concentration of the disaccharide. The results showed that higher concentrations of disaccharide treatment also caused excessive angiogenesis in zebrafish embryos. These results were integrated into supplementary Figure 8 of the revised manuscript.

      (6) The authors claim that glucose and NMS (such as L-glucose) induce excessive angiogenesis through the foxo1a-marcksl1a pathway. Following exposure to elevated glucose levels, a substantial down-regulation of foxo1a was observed in arterial and capillary endothelial cells. This down-regulation led to the release of foxo1a inhibition on marccksl1a, subsequently resulting in an augmented expression of marccksl1a and the manifestation of a vascular phenotype. Consequently, it is imperative to investigate whether the foxo1a overexpression can attenuate marccksl1a expression and mitigate the vascular phenotype induced by monosaccharides. Sufficient data support is needed for the conclusion that monosaccharides induce angiogenesis via the foxo1a-marcksl1a pathway.

      Thank you very much for your constructive suggestions.

      We confirmed the expression of marcksl1a in foxo1a-overexpressed embryos. The results indicated that foxo1a overexpression significantly attenuated marcksl1a expression. The results were integrated into Figure 8c. We also performed the rescue experiments, which indicated that overexpression of foxo1a partially mitigated the excessive angiogenesis induced by high glucose treatment. These results were integrated into Figure 6 of the revised manuscript.

      Minor corrections:

      (1) Figure 2i, j has no corresponding graphs.

      We’ve made the change in Figure 2.

      (2) Figure 2h has no vertical coordinates.

      We’ve made the change in Figure 2.

      (3) All Figures should be referenced within the manuscript.

      We’ve checked our manuscript carefully and made the corrections.

      (4) The concentrations of monosaccharides and disaccharides employed in this study must be distinctly elucidated within the manuscript and annotated using the internationally recognized unit notation.

      We’ve checked our manuscript carefully and described the concentrations of monosaccharides and disaccharides in the revised materials and methods section.

      Reviewer #3:

      (1) A possible limitation of the study is that the mechanism leading to angiogenesis in the retinal circulation and in peripheral vasculature is certainly different as diabetes is associated with excessive angiogenesis in the retina and a defect in angiogenesis in the peripheral circulation as shown by a reduced post-ischemic revascularization (see Silvestre et al.: DOI: 10.1152/physrev.00006.2013).

      Thank you very much for your suggestions. As you said, the peripheral blood vessel model in this study does not fully represent individuals with diabetic retinopathy, which is a limitation. However, from a specific view, the phenotype and mechanism of excessive angiogenesis of peripheral blood vessels in the high glucose model may provide a reference for excessive angiogenesis in the retina; they might have similar etiology and regulation mechanisms in excessive angiogenesis.

      (2) Another limitation is that angiogenesis in the embryo is not fully representative of the excessive angiogenesis observed in the diabetic retinal circulation. It would be of interest to analyse the retinal vascular tree in adult fish submitted to high glucose and to ASB.

      In our future study, we will try to observe the angiogenesis phenotype in the diabetic retina and improve the disease model.

      (3) Line 52: "Endothelial cell dysfunction (ECD)" instead of "Endothelial dysfunction (ECD)".

      We’ve made the correction in the revised manuscript.

      (4) The authors should elaborate more on the observation showing that L-glucose, D-mannose, D-ribose, and L-arabinose, which could not be digested by animals, also induce excessive angiogenesis. Is the effect indirect?

      In the current manuscript, we conducted an in vivo live imaging analysis to show the phenotype of excessive angiogenesis caused by those noncaloric monosaccharides. However, we did not find differences in the phenotypes of embryos treated with noncaloric and caloric monosaccharides. Therefore, we supposed that the mechanisms underlying the phenotypes were similar. The effect might be indirect.

    1. Author Response:

      The reviewers suggested that we determine whether the functions of TopAI, YjhQ, and/or YjhP are connected to antibiotic susceptibility. 

      We fully agree with the reviewers that the function of TopAI/YjhQ/YjhP is an important topic. Our preliminary studies (not included in the paper) failed to identify a function connected to antibiotic susceptibility, although these studies were far from exhaustive. There are many environmental stressors that can stall ribosomes, making it challenging to find the functionally relevant stressor(s). We feel that further work on this topic is outside the scope of this manuscript.

      The reviewers suggested that the SHAPE data are inconsistent with our conclusions about translation of toiL.

      We believe the SHAPE data are consistent with our model, although we acknowledge that interpretation of base reactivity is somewhat subjective. We will address the reviewers’ comments on this topic in more detail in our full response.

      The reviewers suggested that published Ribo-Seq data are inconsistent with our data showing that toiL start codon/Shine-Dalgarno mutations have no effect on expression of luciferase reporters in the absence of antibiotics. 

      Our assays with these mutations looked at expression of topAI, not toiL. Our model predicts that mutations that prevent toiL translation will not induce expression of the downstream genes. We did not look at the effect of these mutations on expression of toiL itself.

      The reviewers suggested we use RNA-seq to complement the Ribo-seq data for cells grown +/- tetracycline (Figure 5).

      In principle, RNA-seq data would allow us to determine whether tetracycline specifically induces translation of topAI, as opposed to only increasing the RNA level. We did not generate RNA-seq data because prior work from other groups suggests that topAI is too weakly expressed to accurately measure translation efficiency in non-inducing conditions. However, the major conclusion from Figure 5 is that tetracycline stalls ribosomes at start codons, including the start codon of toiL.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary:

      The question of whether eyespots mimic eyes has certainly been around for a very long time and led to a good deal of debate and contention. This isn't purely an issue of how eyespots work either, but more widely an example of the potential pitfalls of adopting 'just-so-stories' in biology before conducting the appropriate experiments. Recent years have seen a range of studies testing eye mimicry, often purporting to find evidence for or against it, and not always entirely objectively. Thus, the current study is very welcome, rigorously analysing the findings across a suite of papers based on evidence/effect sizes in a meta-analysis.

      Strengths:

      The work is very well conducted, robust, objective, and makes a range of valuable contributions and conclusions, with an extensive use of literature for the research. I have no issues with the analysis undertaken, just some minor comments on the manuscript. The results and conclusions are compelling. It's probably fair to say that the topic needs more experiments to really reach firm conclusions but the authors do a good job of acknowledging this and highlighting where that future work would be best placed.

      Weaknesses:

      There are few weaknesses in this work, just some minor amendments to the text for clarity and information.

      We greatly appreciate Reviewer 1’s positive comments on our manuscript. We also revised our manuscript text and a figure in accordance with Reviewer 1’s recommendations.

      Reviewer #2 (Public Review):

      Many prey animals have eyespot-like markings (called eyespots) which have been shown in experiments to hinder predation. However, why eyespots are effective against predation has been debated. The authors attempt to use a meta-analytical approach to address the issue of whether eye-mimicry or conspicuousness makes eyespots effective against predation. They state that their results support the importance of conspicuousness. However, I am not convinced by this.

      There have been many experimental studies that have weighed in on the debate. Experiments have included manipulating target eyespot properties to make them more or less conspicuous, or to make them more or less similar to eyes. Each study has used its own set of protocols. Experiments have been done indoors with a single predator species, and outdoors where, presumably, a large number of predator species predated upon targets. The targets (i.e, prey with eyespot-like markings) have varied from simple triangular paper pieces with circles printed on them to real lepidopteran wings. Some studies have suggested that conspicuousness is important and eye-mimicry is ineffective, while other studies have suggested that more eye-like targets are better protected. Therefore, there is no consensus across experiments on the eye-mimicry versus conspicuousness debate.

      The authors enter the picture with their meta-analysis. The manuscript is well-written and easy to follow. The meta-analysis appears well-carried out, statistically. Their results suggest that conspicuousness is effective, while eye-mimicry is not. I am not convinced that their meta-analysis provides strong enough evidence for this conclusion. The studies that are part of the meta-analysis are varied in terms of protocols, and no single protocol is necessarily better than another. Support for conspicuousness has come primarily from one research group (as acknowledged by the authors), based on a particular set of protocols.

      Furthermore, although conspicuousness is amenable to being quantified, for e.g., using contrast or size of stimuli, assessment of 'similarity to eyes' is inherently subjective. Therefore, manipulation of 'similarity to eyes' in some studies may have been subtle enough that there was no effect.

      There are a few experiments that have indeed supported eye-mimicry. The results from experiments so far suggest that both eye-mimicry and conspicuousness are effective, possibly depending on the predator(s). Importantly, conspicuousness can benefit from eye-mimicry, while eye-mimicry can benefit from conspicuousness.

      Therefore, I argue that generalizing based on a meta-analysis of a small number of studies that conspicuousness is more important than eye-mimicry is not justified. To summarize, I am not convinced that the current study rules out the importance of eye-mimicry in the evolution of eyespots, although I agree with the authors that conspicuousness is important.

      We understand Reviewer 2’s concerns and have addressed them by adding some sentences in the discussion part (L506- 508, L538-L540). In addition, our findings, which were guided by current knowledge, support the conspicuousness hypothesis, but we acknowledge the two hypotheses are not mutually exclusive (L110-112). We also do not reject the eye mimicry hypothesis. As we have demonstrated, there are still several gaps in the current literature and our understanding (L501-553). Our aim is for this research to stimulate further studies on this intriguing topic and to foster more fruitful discussions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Lines 59/60: "it is possible that eyespots do not involve mimicry of eyes..."

      The sentence was revised (L59). To enhance readability, we have integrated Reviewer 1's suggestions by simplifying the relevant section instead of using the suggested sentence.

      Line 61: not necessarily aposematism. They might work simply through neophobia, unfamiliarity, etc even without unprofitability

      We changed the text in line with the comment from Reviewer 1 (L61-63).

      Lines 62/63 - this is a little hard to follow because I think you really mean both studies of real lepidopterans as well as artificial targets. Need to explain a bit more clearly.

      We provided an additional explanation of our included primary study type (L64-65).

      Lines 93/94 - not quite that they have nothing to do with predator avoidance, but more that any subjective resemblance to eyes is coincidental, or simply as a result of those marking properties being more effective through conspicuousness in their own right.

      Line 94 - similarly, not just aposematism. You explain the possible reasons above on l92 as also being neophobia, etc.

      We agreed with Reviewer 1’s comments and added more explanations about the conspicuousness hypothesis (L96-97). We have also rewritten the sentences that could be misleading to readers (L428).

      Line 96 - this is perhaps a bit misleading as it seems to conflate mechanism and function. The eye mimicry vs conspicuousness debate is largely about how the so-called 'intimidation' function of eyespots works. That is, how eyespots prevent predators from attacking. The deflection hypothesis is a second function of eyespots, which might also work via consciousness or eye mimicry (e.g. if predators try to peck at 'eyes') but has been less central to the mimicry debate.

      The explanations and suggestions from Reviewer 1 are very helpful. We revised this part of our manuscript (L103-108) and Figure 1 and its legend to make it clearer that the eyespot hypothesis and the conspicuousness hypothesis explain anti-predator functions from a different perspective than the deflection hypothesis.

      There is a third function of eyespots too, that being as mate selection traits. Note that Figure 1 should also be altered to reflect these points.

      We wanted to focus on explaining why eyespot patterns can contribute to prey survival. Therefore, we did not state that eyespot patterns function as mate selection traits in this paragraph. Alternatively, we have already mentioned this in the Discussion part (L455-L465) and rewrote it more clearly (L456).

      Were there enough studies on non-avian predators to analyse in any way? 

      We found a few studies on non-avian predators (e.g. fish, invertebrates, or reptiles), but not enough to conduct a meta-analysis.

      Line 171/72 - why? Can you explain, please.

      The reason we excluded studies that used bright or contrasting patterns as control stimuli in our meta-analysis is to ensure comparability across primary studies. We added an explanation in the text (L180-181).

      Line 177 - can you clarify this?

      Without control stimuli, it is challenging to accurately assess the effect of eyespots or other conspicuous patterns on predation avoidance. Control stimuli allow for a comparison of the effect of eyespots or patterns. We added a more detailed explanation to clarify here (L186-188).

      Line 309 - presumably you mean 33 papers, each of which may have multiple experiments? I might have missed it, but how many individual experiments in total? 

      There were 164 individual experiments. We have now added that information in the manuscript (L320).

      Line 320 - paper shaped in a triangle mostly?

      We cannot say that most artificial prey were triangular. After excluding the caterpillar type, 57.4% were triangular, while the remaining 43.6% were rectangular (Figure 2b).

      Line 406: Stevens.

      We fixed this name, thank you (L417).

      Discussion - nice, balanced and thorough. Much of the work done has been in Northern Europe where eyespot species are less common. Perhaps things may differ in areas where eyespots are more prevalent.

      We appreciate Reviewer 1’s kind words and comments. We agree with your comments and reflected them in our manuscript (L542-545).

      Line 477 - True, and predators often have forward-facing eyes making it likely both would often be seen, but a pair of eyes may not be absolutely crucial to avoidance since sometimes a prey animal may only see one eye of a predator (e.g. if the other is occluded, or only one side of the head is visible).

      We were grateful for Reviewer 1's comment. We added a sentence noting that the eyespots do not necessarily have to be in pairs to resemble eyes (L490-L492).

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain. They claim that ORF1p is expressed in the human and mouse brain at a steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is not surprising, but important to document.  

      Thank you for recognizing the importance of this study. The two cited papers have indeed reported the presence of full-length transcripts in the mouse and human brain. However, the first (PMID: 38773348) report has shown evidence of flL1 RNA and ORF1 protein expression in the mouse hippocampus (but not elsewhere) and the second (PMID: 37910626) shows full-length LINE-1 RNA expression and H3K4me3-ChIP data in the frontal and temporal lobe of the human brain, but not protein expression.  

      Strengths:

      Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the evidence for steady-state expression of ORF1p in the mouse brain appears robust. 

      Weaknesses: 

      Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments: 

      (1) The expression of ORF1p in the human brain shown in Figure 1j is not convincing. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not nonspecific labelling? Additional validations and controls are needed to verify the specificity of this signal. 

      We have validated the antibody (Abcam 245249 - https://www.abcam.com/en-us/products/primary-antibodies/line-1-orf1p-antibody-epr22227-6-ab245249), which we use for Western blotting experiments like in Fig1j), by several means. We have done immunoprecipitations (IPs) and co-immunoprecipitations (co-IPs) followed by quantitative mass spectrometry (LC-MS/MS). We efficiently detect ORF1p in IPs (Western blot) and by quantitative mass spectrometry (5 independent samples per IP-ORF1p and IP-IgG: ORF1p/IgG ratio: 40.86; adj p-value 8.7e-07; human neurons in culture). We also did co-IPs followed by Western blot using two different antibodies, the Millipore or the Abcam antibody to immunoprecipitate and the Abcam antibody for Western blotting (the Millipore AB does not work well on WB in our hands) which consistently showed a double band indicating that both bands are ORF1p-derived. We can provide this data to the revised manuscript, although some of it (the MS data) is subject of another study in preparation. Abcam also reports a double band, and they suspect that the lower band is a truncated form (see the link to their website above). ORF1p Western blots done by other labs with different antibodies have detected a second band in human samples

      (1) Sato, S. et al. LINE-1 ORF1p as a candidate biomarker in high grade serous ovarian carcinoma. Sci Rep 13, 1537 (2023) in Figure 1D

      (2) McKerrow, W. et al. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 119, e2115999119 (2022)) showing a Western blot of an inducible LINE-1 (ORFeus) detected by the MABC1152 ORF1p antibody from Millipore Sigma in Figure 7 3) in a publication in eLife (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells with an antibody made in-house from another lab (gift) – Figure 2B

      The lower band might thus be a truncated form of ORF1p or a degradation product which appears to be shared by mouse and human ORF1p. We will mention this in the revised version of the paper. In addition, we have used the very well characterized antibody from Millipore (https://www.merckmillipore.com/CH/en/product/Anti-LINE-1-ORF1p-Antibody-clone-4H1,MM_NF-MABC1152?ReferrerURL=https%3A%2F%2Fwww.google.com%2F) for immunostainings and detect ORF1p staining in human neurons in the very same brain regions (Fig 2H) including the cerebellum (selectively in Purkinje cells as in mice in Fig1B panel 10: human images not shown). 

      Altogether, based on our experimental validations and evidence from the literature, we are very confident that it is ORF1p that we detect on the blots. 

      (2) The data shown in Figure 2g are not convincing. How can the authors be sure that this signal represents ORF1p expression and not non-specific labelling? Extensive additional validations and  controls are needed to verify the specificity of this signal.

      Figure 2g shows a Western blot using an extensively used and well characterized ORF1p antibody from abcam (mouse ORF1p - (https://www.abcam.com/en-us/products/primary-antibodies/line-1-orf1p-antibody-epr21844108-ab216324; cited in at least 11 publications) after FACS-sorting of neurons (NeuN+) of the mouse brain. We have validated this ORF1p antibody ourselves in IPs (see Fig 6A) and co-IP followed by mass spectrometry (LC/MS-MS; see Fig 6, where we detect ORF1p exclusively in the 5 independent ORF1p-IP samples and not at all in 5 independent IgG-IP control samples, see Suppl Table 2). This together makes us very confident that we are looking at a specific ORF1p signal. Please note that in the IP of ORF1p shown in Fig6A, there is a double band as well, strongly suggesting that the lower band might be a truncated or processed form of ORF1p. As stated above, this double band has been detected in other studies (Walter et al. eLife 2016;5:e11418. DOI: 10.7554/eLife.11418) in mouse ES cells using an in-house generated antibody against mouse ORF1p. Thus, with either commercial or in-house generated antibodies in some mouse and human samples, there is a double band corresponding to full-length ORF1p and a truncated or processed version of it.

      We noticed that we have not added the references of the primary antibodies used in Western blot experiments in the manuscript, which will be corrected in the revised version.  

      (3) The data showing a reduction in ORF1p expression in the aged mouse brain is confusing and maybe even misleading. Although there is an increase in the intensity of the ORF1p signal in ORF1p+ cells, the data clearly shows that fewer cells express ORF1p in the aged brain. If these changes indicate an overall loss or gain of ORF1p, expression in the aged brain is not resolved. Thus, conclusions should be more carefully phrased in this section. It is important to show the quantification of NeuN+ and NeuN- cells in young vs aged (not only the proportions as shown in Figure 3b) to determine if the difference in the number of ORF1p+ cells is due to loss of neurons or perhaps a sampling issue. More so, it would be essential to perform WB and/or proteomics experiments to complement the IHC data for the aged mouse samples. 

      The data presented in Fig3 C-I show a modest but widespread and reproducible increase in expression of ORF1p per cell. What decreases is the proportion of ORF1p+/NeuN+ cells (Fig3A, B), indicating that fewer cells might express ORF1p in the brain. However, the proportion or number/mm2 of ORF1p+ cells overall does not decrease significantly, neither does the proportion or number/mm2 of NeuN+ cells (data will be added to the revision). We show data of the % of NeuN+ and NeuN- cells in the ventral midbrain (Suppl Fig3C, quantified on confocal images)) which indeed indicates that in this region, there are less neurons in the aged mouse brain compared to the young. There might thus be a very regional decrease in neurons with age in the midbrain motor region. We will, however, as suggested, plot the number of NeuN+ and NeuN- cells per mm2 for the whole brain as well as the different regions in young vs aged to compare actual cell numbers per volume. While it is true that we cannot say that there is an overall loss or gain of ORF1p expression in the aged mouse brain, we believe that this is not of the highest importance as what most likely matters biologically in the context of aging is the quantity of ORF1p per cell (and possibly full-length LINE-1 RNA and ORF2p) and not “per brain”. 

      We also plan on doing Western blots on mouse brain tissues from young and aged individuals, however, we might run into limits regarding tissue availability of aged mice. 

      (4) The transcriptomic data presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). Given the read length and the unstranded sequencing approach, I would at least ask the authors to add genome browser tracks of the upregulated loci so that we can properly assess the clarity of the results. I would also suggest adding the mappability profile of the elements in question. In addition, since this manuscript focuses on ORF1p, it would be essential to document changes in protein levels (and not just transcripts) in the ageing human brain. 

      We agree that there are limitations to the analysis of TEs with short read sequencing and we will add more text on this aspect in a revised version. The approaches shown in PMID: 38773348 & PMID: 37910626 or even a combination of them, would be ideal of course. However, here we reanalyzed a unique existing dataset (Dong et al, Nature Neuroscience, 2018; http://dx.doi.org/10.1038/s41593-018-0223-0), which contains RNA-seq data of human post-mortem dopaminergic neurons in a relatively high number of brain-healthy individuals of a wide age range including some “young” individuals which is rare in post-mortem studies. Such data is unfortunately not available with long read sequencing or any other more appropriate approach yet. Limitations are evident, but all limitations will apply equally to both groups of individuals that we compare. We will add genome browser tracks of the differentially expressed elements. The general mappability profile of the full-length LINE-1 “UIDs” is shown in Suppl Fig 6A. We will color-highlight the specific elements in this graph and will add genome browser data for these elements in a revised version. 

      We will not be able to document changes in protein levels in aged human dopaminergic neurons as we do not have access to this material. We have tried to obtain human substantia nigra tissues but were not able to get sufficient amounts to do laser-capture microdissection or FACS analyses, especially of young individuals. There are still important limitations to tissue availability, especially of regions of interest like the substantia nigra pars compacta affected in Parkinson’s disease.

      (5) More information is needed on RNAseq of microdissections of dopaminergic neurons from 'healthy' postmortem samples of different ages. No further information on these samples is provided. I would suggest adding a table with the clinical information of these samples (especially age, sex, and cause of death). The authors should also discuss whether this experiment has sufficient power. The human ageing cohort seems very small to me. 

      This is a re-analysis of a published dataset (Dong et al, Nat Neurosci, 2018; doi:10.1038/s41593-018-0223-0), available through dbgap (phs001556.v1.p1). In this original article, the criteria for inclusion as a brain-healthy control were as follows:

      “…Subjects… were without clinicopathological diagnosis of a neurodegenerative disease meeting the following stringent inclusion and exclusion criteria. Inclusion criteria: (i) absence of clinical or neuropathological diagnosis of a neurodegenerative disease, for example, PD according to the UKPDBB criteria47, Alzheimer’s disease according to NIA-Reagan criteria48, or dementia with Lewy bodies by revised consensus criteria49; for the purpose of this analysis incidental Lewy body cases (not meeting clinicopathological diagnostic criteria for PD or other neurodegenerative disease) were accepted for inclusion; (ii) PMI ≤ 48 h; (iii) RIN50 ≥ 6.0 by Agilent Bioanalyzer (good RNA integrity); and (iv) visible ribosomal peaks on the electropherogram. Exclusion criteria were: (i) a primary intracerebral event as the cause of death; (2) brain tumor (except incidental meningiomas); (3) systemic disorders likely to cause chronic brain damage.”

      We do not have access to the cause of death, but we will add available metadata to the manuscript.

      We will perform a post-hoc power analysis and add the result to the revision. 

      (6) The findings in this manuscript apply to both human and mouse brains. However, the landscape of the evolutionarily young L1 subfamilies between these two species is very different and should be part of the discussion. For example, the regulatory sequences that drive L1 expression are quite different in human and mouse L1s. This should be discussed. 

      Indeed, they are very different. We will add this to the discussion.  

      (7) On page 3 the authors write: "generally accepted that TE activation can be both, a cause and consequence of aging". This statement does not reflect the current state of the field. On the contrary, this is still an area of extensive investigation and many of the findings supporting this hypothesis need to be confirmed in independent studies. This statement should be revised to reflect this reality. 

      We agree, this is overstated, we will change this sentence accordingly.  

      Reviewer #2 (Public Review):

      Summary: 

      Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals, and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS. 

      Strengths: 

      A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in the brain is also a strength in that it provides a novel dataset for future studies. 

      Thank you for highlighting the strength of our study. 

      Weaknesses: 

      The main weakness of the study is that cell type specificity of ORF1p expression was not examined beyond neuron (NeuN+) vs non-neuron (NeuN-). Indeed, a recent study (Bodea et al. 2024, Nature Neuroscience) found that ORF1p expression is characteristic of parvalbumin-positive interneurons, and it would be very interesting to query whether other neuronal subtypes in different brain regions are distinguished by ORF1p expression. 

      We agree that this point is important to address. We do provide indications for this in the manuscript. For instance, we detect staining in mouse and human Purkinje cells of the cerebellum in accordance with data from Takahashi et al, Neuron, 2022; DOI: 10.1016/j.neuron.2022.08.011. We also know from previous work, that in the mouse ventral midbrain, dopaminergic neurons (TH+/NeuN+) express ORF1p and that these neurons express higher levels of ORF1p than adjacent non-dopaminergic neurons (TH-/NeuN+; Blaudin de Thé et al, EMBO J, 2018). Others have shown evidence of full-length L1 RNA expression in both excitatory and inhibitory neurons but much less expression in non-neuronal cells (Garza et al, SciAdv, 2023). In sum, although this has not been investigated systematically brain-wide, it does not seem as if ORF1p expression is restricted to PV cells overall. We will deepen the discussion of this aspect in the revised manuscript. To address this question experimentally, we will try to perform ORF1p stainings on different brain regions together with PV stainings and add this data to a revised version, if possible.  

      The data suggesting that ORF1p expression is increased in aged mouse brains is intriguing, although it seems to be based upon modestly (up to 27%, dependent on brain region) higher intensity of ORF1p staining rather than a higher proportion of ORF1+ neurons. Indeed, the proportion of NeuN+/Orf1p+ cells actually decreased in aged animals. It is difficult to interpret the significance and validity of the increase in intensity, as Hoechst staining of DNA, rather than immunostaining for a protein known to be stably expressed in young and aged neurons, was used as a control for staining intensity. 

      It would have been indeed interesting to have another marker than DNA as a control. However, this requires a protein that is indeed stably expressed throughout the brain and throughout age. We are not aware of a protein for which this has been established. DNA staining with Hoechst does control for technical artefacts. We have whole-brain imaging data for the protein Rbfox3 (NeuN) which we used as a marker for cell identity. If this protein turns out to be stable, we could add this data to a revised version. 

      The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue. 

      As stated in the manuscript, the list of previously published datasets does include a mouse dataset with ORF1p interacting proteins in mouse spermatocytes (please see line 434-435: “ORF1p interactors found in mouse spermatocytes were also present in our analysis including CNOT10, CNOT11, PRKRA and FXR2 among others (Suppl_Table4).”) -> De Luca, C., Gupta, A. & Bortvin, A. Retrotransposon LINE-1 bodies in the cytoplasm of piRNA-deficient mouse spermatocytes: Ribonucleoproteins overcoming the integrated stress response. PLoS Genet 19, e1010797 (2023)). We indeed did not validate any interactors for several reasons (economic reasons and time constraints (post-doc leaving)). However, we feel that the significant overlap with previously published interactors highlights the validity of our data and we anticipate that this list of ORF1p protein interactors in the mouse brain will be of further use for the community.  

      The authors achieved the goals of broadly characterizing ORF1p expression across different regions of the mouse brain, and identifying putative ORF1p interactors in the mouse brain. However, findings from both parts of the study are somewhat superficial in depth. 

      This provides a useful dataset to the field, which likely will be used to justify and support numerous future studies into L1 activity in the aging mammalian brain and in neurodegenerative disease. Similarly, the list of ORF1p interacting proteins in the brain will likely be taken up and studied in greater depth. 

      Reviewer #3 (Public Review):

      The question about whether L1 exhibits normal/homeostatic expression in the brain (and in general) is interesting and important. L1 is thought to be repressed in most somatic cells (with the exception of some stem/progenitor compartments). However, to our knowledge, this has not been authoritatively / systematically examined and the literature is still developing with respect to this topic. The full gamut of biological and pathobiological roles of L1 remains to be shown and elucidated and this area has garnered rapidly increasing interest, year-by-year. With respect to the brain, L1 (and repeat sequences in general) have been linked with neurodegeneration, and this is thought to be an aging-related consequence or contributor (or both) of inflammation. This study provides an impressive and apparently comprehensive imaging analysis of differential L1 ORF1p expression in mouse brain (with some supporting analysis of the human brain), compatible with a narrative of non-pathological expression of retrotransposition-competent L1 sequences. We believe this will encourage and support further research into the functional roles of L1 in normal brain function and how this may give way to pathological consequences in concert with aging. However, we have concerns with conclusions drawn, in some cases regardless of the lack of statistical support from the data. We note a lack of clarity about how the 3rd party pre-trained machine learning models perform on the authors' imaging data (validation/monitoring tests are not reported), as well as issues (among others) with the particular implementation of co-immunoprecipitation (ORF1p is not among the highly enriched proteins and apparently does not reach statistical significance for the comparison) - neither of which may be sufficiently rigorous.  

      Thank you for your comments on our manuscript. 

      In a revised version and a more in-depth response, we will address the concerns about the machine learning paradigm. Concerning the co-IP-MS, we can confirm that ORF1p is among the highly enriched proteins as it was not found in the IgG control (in 5 independent samples), only in the ORF1p-IP (in 5 out of 5 independent samples). This is what the infinite sign in Suppl Table 2 indicates and this is why there is no p-value assigned as infinite/0 doesn’t allow to calculate a p-value. We will make this clearer in a revised version of the manuscript.

    1. Author response:

      Thank you for the reviewers’ thoughtful comments and suggestions! We greatly appreciate the feedback and are committed to address all the points raised by the reviewers to strengthen our manuscript.

      We plan to conduct additional local structural analyses to better demonstrate our observations of PROTAC-induced LYS-GLY interactions and lysine associability. Specifically, we will add more in-depth analysis such as computing dihedral entropies and Root Mean Square Fluctuation (RMSF) of nearby side chains and integrating various structural alignments to provide better visualization and understanding of the local structural arrangements. We plan to extend and add simulations when needed. We will review the latest available crystal and cryo-EM structures. If new structures are available, we will incorporate them into our revised analysis and discussion.

      In the revision, additional figures will be included to offer a more comprehensive assessment of local conformational changes. We will also ensure that explanations of technical terminology are clear to non-expert readers and will address the grammatical and terminology errors highlighted by the reviewers. We will refine our language to more accurately describe the focus on structural dynamics in our study.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.

      We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.

      Weaknesses:

      (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.

      We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:

      - Introduction: " However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored."

      - Discussion: " However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. "

      (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.

      We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."

      To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.

      We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.

      (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is inherent to many electrophysiological studies, particularly those conducted in behaving primates, where direct manipulation of specific neural circuits to establish causality is often challenging.

      This limitation becomes even more complex when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could potentially affect auditory responses in downstream pathways, which, in turn, may influence sensory prediction and decision-making processes. Moreover, we hypothesize that the sensory prediction and reward signals observed in the IC may not have direct causal effects but may instead be driven by top-down projections from higher cognitive regions. However, it is important to emphasize that our study provides novel evidence that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research into the underlying mechanisms and broader functional implications of these signals.

      To address this, we have taken the following steps in our revised manuscript:

      (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we now refer to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.

      (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships. We acknowledge the need for further investigation to substantiate the causal links between IC activity and cognitive functions such as sensory prediction, decision-making, and reward processing.

      We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.

      (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.

      We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.

      To address this concern, we have made the following revisions to the manuscript:

      (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.

      (2) Enhanced Section Summaries: We have added summary statements at the end of each major results section to synthesize the findings and highlight their significance. This should help guide the reader through the narrative and emphasize the key takeaways from each part of the study.

      (3) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.

      We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.

      (5) The logic between different sections of Results is not clear.

      We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.

      To address this concern, we have made the following revisions:

      (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.

      (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.

      (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.

      We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.

      (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.

      We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.

      To address these concerns, we have made the following revisions:

      (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.

      (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.

      We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.

      We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.

      We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.

      We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We acknowledge the importance of distinguishing between correlation and causality. As detailed in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the challenges of establishing direct causal links in electrophysiological studies involving behaving primates.

      We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.

      Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?

      We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:

      (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in the 'climbing effect' for the following reason: The 'climbing effect' starts from the second sound in the block (Fig.2D and Fig.3B), before any rhythm or pattern could be fully established, as a rhythm generally requires at least three repetitions to form. Unfortunately, we did not explore variable ISIs in the current study, so we cannot directly address this concern with the data at hand.

      (2) Reward Expectation and Climbing Effect:

      The reviewer raises an important concern about whether the 'climbing effect' could be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise. In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100%), we did not observe a climbing effect in the auditory response. The presence of reward prediction error (Fig. 4D) further suggests that while the monkeys may form reward expectations, these expectations do not directly drive the climbing effect.

      To clarify this point, we have added sentences in the revised manuscript to explicitly discuss the relationship between reward expectation and the climbing effect, emphasizing that our findings indicate the climbing effect is not primarily due to reward expectation.

      We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and address the reviewer's concerns effectively. Thank you for these valuable suggestions.

      "Reward effect" on IC neurons' responses were showed in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.

      We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.

      To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.

      Strengths:

      This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.

      We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.

      Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.

      We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      Major Comments:

      (1) Structural Clarity and Logic Flow:

      The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."

      We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.

      To address the reviewer's concerns, we have made the following revisions:

      (1) Reorganization of Figures and Results:

      We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.

      We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.

      (2) Revised Title:

      In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.

      (3) Improved Logic Flow:

      We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.

      We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.

      (2) Clarification of Data Analysis:

      Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.

      We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:

      (1) Detailed Explanation of Experimental Design:

      We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.

      (2) Cohesive Presentation of Data Analysis:

      Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.

      We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.

      (3) Enhanced Statistical Analysis Details:

      We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. These details are now presented in a clear and accessible format within the Methods section, with relevant information also highlighted in the Result section or the figure legends to facilitate understanding.

      We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.

      (3) Reward Prediction Analysis:

      The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.

      We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.

      In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:

      (1) Inclusion of Population-Level Data for IC Neurons:

      In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.7), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.

      (2) Addition of Data on Key Press Errors and No-Response Trials:

      In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.

      When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons. Additionally, we observed that the reward prediction error in the IC appears to be signed, meaning that IC neurons showed robust responses to unexpected rewards but not to unexpected no-reward scenarios. However, the sign of the reward prediction error should be explored in greater depth with specifically designed experiments in future studies.

      Author response image 1.

      (A) PSTH of the neuron from Figure 6a during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 6a during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 6b.

      We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.

    1. Author Response:

      We would like to thank the reviewers for their constructive feedback and for acknowledging that our approach offers a simple yet powerful framework with the potential to serve as a comprehensive and intuitive tool for analyzing functional activity and connectivity.

      In response to the reviewers’ recommendations, we will aim to improve and clarify the following aspects of our work in an upcoming revision:

      Scope and limitations of the “fcHNN projection” (R#1 and R#2):

      Both reviewers have correctly noted that the interpretability and explanatory power of the simplistic, two-dimensional fcHNN-based projection is limited. In the revised manuscript, we will clarify that, indeed, attractors are in a close mathematical relationship with the principal components of the raw data (i.e., the eigenvectors of the connectome) within our framework. The fcHNN-projection was introduced solely to establish a link between the proposed framework and concepts with which the reader may be more familiar.

      We will enhance the presentation and discussion of our results to emphasize that – as the reviewers also kindly pointed out - the value of our approach lies in modelling how different facets of brain activity dynamically emerge from a common space of functional (ghost) attractors, rather than studying in the static attractor patterns themselves.

      Motivations and Rationale for Using the Functional Connectome (R#2):

      We agree with Reviewer #2 that a deeper mechanistic explanatory power could be achieved by modeling structure-function coupling, and that our framework is well-suited for this challenge. In our revision, we will highlight this as one of the promising future applications of our framework. We will, furthermore, clarify that the scope of the present work was deliberately restricted to functional connectivity to demonstrate that our framework also allows us to “bypass” the significant challenge of structure-function coupling. This enables us to focus on understanding the origins of canonical resting-state networks, the dynamic responses of the system to perturbations and the complex relationship between task-induced activity and resting-state connectivity, without first solving the structure-function coupling problem.

      Additionally, we will mathematically justify the use of linear measures of the functional connectome to reconstruct the underlying non-linear dynamic system, thereby clearly delineating which results can and cannot be considered circular when starting from the functional connectome.

      Improvements in Overall Clarity of Presentation (R#1):

      In line with the above points and in general, we will strive to enhance the overall clarity of the presentation of our results, including figures, wording, and statistical analysis.

    1. Author response:

      Reviewer #2 (Public Review):

      In this manuscript, Kafri and colleagues assess the contribution of protein degradation to the cell size-dependent accumulation of total protein. This is an interesting line of research that has not previously been explored. Most of the focus on the size-dependence of protein accumulation has been on the synthesis part of the equation. As cells get too big, the efficiency of cell growth (mass accumulation per unit mass) decreases. It is argued that this is not due to the loss of the efficiency in protein synthesis, but rather is due to the increased protein degradation in larger cells. It is an interesting hypothesis, that might well be true, but there are some issues with key aspects of the data and other supporting data are quite indirect. More work needs to be done to support the central claims.

      We thank the reviewer for appreciating the work is interesting and previously unexplored.

      The major issue is that the data supporting the proportional increase in protein synthesis with cell size need to be strengthened. Protein synthesis is measured by the amount of a methionine analog that is incorporated in a fixed amount of time. Fig. 2 then plots this amount as a function of cell size, which is presumably measured using a total protein dye (this information is not included; incidentally the axis labels should note what the measurement is 'total protein' or 'forward scatter' rather than the more ambiguous 'cell size'). In any case, something is wrong with the cell size measurements in Figure 2 because many cells basically have almost negligible size (near 0) while others have sizes up to 5 or 6 arbitrary units. It makes no sense that there should be a 10-fold or even 100-fold range in cell sizes. For this reason, I can't interpret the data in Figure 2, which is unfortunate since that is a crucial figure for the authors' argument.

      The data supporting higher rates of protein degradation per unit mass in large cells suffers from a similar problem as Figure 3E has the same issue as Figure 2 with too many tiny 'cells'.

      Yes, the reviewer is correct that we are using a total protein dye (Alexa fluorophore-conjugated succinimidyl ester, abbreviated as SE) to measure cell size. We have included details regarding the methods of cell size (total protein content) measurement in both the Methods (line 463-466) and Results (line 100-102) sections.

      Regarding the reviewer’s concern on the cell size range, we apologize for the confusion the figures may have caused. These cell size measurements are within reasonable range and not 10-fold or 100-fold. Please refer to our detailed response above to essential point #1.

      Moreover, the reliance on cycloheximide to treat cells and measure reduction in mass isn't ideal since shutting off all protein synthesis is a pretty drastic perturbation. It would have been better to shut off synthesis of a specific protein and measure its degradation in large and small cells while keeping the cells otherwise intact.

      We acknowledge that relying on cycloheximide to measure changes in mass has limitations, as acute inhibition in protein synthesis is a significant perturbation. Ideally, we would measure the degradation of specific proteins in large and small cells while keeping the rest of the cellular processes intact. However, this presents considerable technological challenges. While our evidence clearly shows increased protein degradation and compensatory growth slowdown in large cells, we have not yet identified the specific proteins/genes being targeted. Implementing the reviewer's suggestion would require first screening for a suitable protein/gene to serve as a reporter for compensatory degradation. A significant proteomics screen may allow identification of potential targets, but further validation would necessitate substantial effort, including the generation and validation of a reporter system. We agree that this is a valuable experiment to pursue, but it will likely be part of a follow-up study focused on characterizing the specific protein targets and E3 ligases involved in these processes. In the revised manuscript, we discuss these open questions and future directions in line 380-410.

      Reviewer #3 (Public Review):

      The authors report a previously undocumented role for UPS-mediated protein turnover in size control in human cells. The study builds on previous observations made by the Kafri group that large cells undergo size compensation by reducing their rate of growth. In particular, recent published work by Ginzberg et al showed that CDK2 inhibition is accompanied by long term size compensation in the form of reduced cell growth whereas CDK6 inhibition is not. The authors investigate the basis for this effect and demonstrate in both unperturbed and perturbed growth/division contexts, using both fixed cells and time lapse microscopy, that the rate of protein synthesis increases proportionately in large cells that undergo size compensation even though mass accumulation is attenuated. The authors show that this effect appears to be mediated by increased proteasomal activity, as demonstrated by proteasome-dependent K48-ubiquitin chain turnover. Intriguingly, this degradation-mediated size compensation mechanism appears to be most active at the G1/S transition, the primary point at which size control operates. The experiments are well controlled, and the conclusions of the study are in general well supported by the data. The authors present an interesting set of discussion points that relate their observations to size control mechanisms in dividing and non-dividing cells. While specific mechanisms are not pursued, this study nevertheless adds an important new insight into the still unsolved problem of size control.

      We thank the reviewer for appreciating the novelty of the work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper conducted a GWAS meta-analysis for COVID-19 hospitalization among admixed American populations. The authors identified four genome-wide significant associations, including two novel loci (BAZ2B and DDIAS), and an additional risk locus near CREBBP using cross-ancestry meta-analysis. They utilized multiple strategies to prioritize risk variants and target genes. Finally, they constructed and assessed a polygenic risk score model with 49 variants associated with critical COVID-19 conditions.

      Strengths:

      Given that most of the previous studies were done in European ancestries, this study provides unique findings about the genetics of COVID-19 in admixed American populations. The GWAS data would be a valuable resource for the community. The authors conducted comprehensive analyses using multiple different strategies, including Bayesian fine mapping, colocalization, TWAS, etc., to prioritize risk variants and target genes. The polygenic risk score (PGS) result demonstrated the ability of the cross-population

      PGS model for COVID-19 risk stratification.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      (1) One of the major limitations of this study is that the GWAS sample size is relatively small, which limits its power.

      (2) The fine mapping section is unclear and there is a lack of information. The authors assumed one causal signal per locus, and only provided credible sets, but did not provide posterior inclusion probabilities (PIP) for the variants to be causal.

      (3) Colocalization and TWAS used eQTL data from GTEx data, which are mainly from European ancestries. It is unclear how much impact the ancestry mismatch would have on the result. The readers should be cautious when interpreting the results and designing follow-up studies.

      We agree with that the sample size is relatively small. Despite that, it was sufficient to reveal novel risk loci supporting the robustness of the main findings. We have indicated this limitation at the end of the discussion section.

      Thank you for rising this point. As suggested, we have also used SuSIE, which allows to assume more than one causal signal per locus. However, in this case the results were not different from those obtained with the original Bayesian colocalization performed with corrcoverage. Regarding the PIP, at the fine mapping stage we are inclined to put more weight on the functional annotations of the variants in the credible set than on the statistical contributions to the signal. This is the reason why we prefer not to put weight on the PIP of the variants but prioritize variants that were enriched functional annotations.

      This is a good point regarding the lack of diversity in GTEx data. We have also used data from AMR populations (GALA II-SAGE models), although it was only available for blood tissue. Regarding the ancestry mismatch between datasets, several studies have attempted to explore the impact. Gay et al. (PMID: 32912333) studied local ancestry effects on eQTLs from the GTEx consortium and concluded that adjustment of eQTLs by local ancestry only yields modest improvement over using global ancestry (as done in GTEx). Moreover, the colocalization results between adjusting by Local Ancestry and Global Ancestry were not significantly different. Besides, Mogil et al. (PMID: 30096133) observed that genes with higher heritability share genetic architecture between populations. Nevertheless, both studies have evidenced decreased power and poorer predictive performances regarding gene expression because of reduced diversity in eQTL analyses. As consequence of the ancestry mismatch, we now warn the readers that this may compromise signal detection (Discussion, lines 531-533). 

      Reviewer #2 (Public Review):

      This is a genome-wide association study of COVID-19 in individuals of admixed American ancestry (AMR) recruited from Brazil, Colombia, Ecuador, Mexico, Paraguay, and Spain. After quality control and admixture analysis, a total of 3,512 individuals were interrogated for 10,671,028 genetic variants (genotyped + imputed). The genetic association results for these cohorts were meta-analyzed with the results from The Host Genetics Initiative (HGI), involving 3,077 cases and 66,686 controls. The authors found two novel genetic loci associated with COVID-19 at 2q24.2 (rs13003835) and 11q14.1 (rs77599934), and other two independent signals at 3p21.31 (rs35731912) and 6p21.1 (rs2477820) already reported as associated with COVID-19 in previous GWASs. Additional meta-analysis with other HGI studies also suggested risk variants near CREBBP, ZBTB7A, and CASC20 genes.

      Strengths:

      These findings rely on state-of-the-art methods in the field of Statistical Genomics and help to address the issue of a low number of GWASs in non-European populations, ultimately contributing to reducing health inequalities across the globe.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      There is no replication cohort, as acknowledged by the authors (page 29, line 587), and no experimental validation to assess the biological effect of putative causal variants/genes. Thus, the study provides good evidence of association, rather than causation, between the genetic variants and COVID-19. Lastly, I consider it crucial to report the results for the SCOURGE Latin American GWAS, in addition to its meta-analysis with HGI results, since HGI data has a different phenotype scheme (Hospitalized COVID vs Population) compared to SCOURGE (Hospitalized COVID vs Non-hospitalized COVID).

      We essentially agree with the reviewer in that one of the main limitations of the study is the lack of a replication stage because of the use of all available datasets on a one-stage analysis. To contribute to the interpretation of the findings in the absence of a replication stage, we now assessed the replicability of the novel loci using the Meta-Analysis Model-based Assessment of replicability (MAMBA) approach (PMID: 33785739) and included the posterior probabilities of replication in Table 2. We also explored further the potential replicability of signals in other populations. We agree that the results should be interpreted in terms of associations given the lack of functional validation of main findings, so we have slightly modified the discussion.

      As suggested, the SCOURGE Latin American GWAS summary is now accessible by direct request to the Consortium GitHub repository (https://github.com/CIBERER/Scourge-COVID19) (lines 797-799). We have also included the results from the SCOURGE GWAS analysis for the replication of the 40 lead variants in the Supplementary Table 12. Results from the SCOURGE GWAS for the lead variants in the AMR meta-analysis with HGI were already included in the Supplementary Table 2. As note, we have not been able to conduct the meta-analysis with the same hospitalization scheme as in the HGI study since the population-specific results for those analyses were not publicly released. However, sensitivity analyses included within the supplementary material from the COVID-19 Host Genetics Initiative (2021) stated that there were no significant differences in effects (Odds Ratios) between analyses using population controls or just non-hospitalized COVID-19 patients.

      Reviewer #3 (Public Review):

      Summary:

      In the context of the SCOURGE consortium's research, the authors conduct a GWAS meta-analysis on 4,702 hospitalized individuals of admixed American descent suffering from COVID-19. This study identified four significant genetic associations, including two loci initially discovered in Latin American cohorts. Furthermore, a trans-ethnic meta-analysis highlighted an additional novel risk locus in the CREBBP gene, underscoring the critical role of genetic diversity in understanding the pathogenesis of COVID-19.

      Strengths:

      (1) The study identified two novel severe COVID-19 loci (BAZ2B and DDIAS) by the largest GWAS meta-analysis for COVID-19 hospitalization in admixed Americans.

      (2) With a trans-ethnic meta-analysis, an additional risk locus near CREBBP was identified.

      Thank you very much for the positive comments and the willingness to revise this manuscript.

      Weaknesses:

      (1) The GWAS power is limited due to the relatively small number of cases.

      (2) There is no replication study for the novel severe COVID-19 loci, which may lead to false positive findings.

      We agree with that the sample size is relatively small. Despite that, it was sufficient to reveal novel risk loci supporting the robustness of the main findings. We have indicated this limitation at the end of the discussion section.

      Regarding the lack of a replication study, we now assessed the replicability of the novel loci using the Meta-Analysis Model-based Assessment of replicability (MAMBA) approach (PMID: 33785739). We have included the posterior probabilities of replication in Table 2.

      (3) Significant differences exist in the ages between cases and controls, which could potentially introduce biased confounders. I'm curious about how the authors treated age as a covariate. For instance, did they use ten-year intervals? This needs clarification for reproducibility.

      Thank you for rising this point. Age was included as a continuous variable. This has been now indicated in line 667 (within Material and Methods).

      (4)"Those in the top PGS decile exhibited a 5.90-fold (95% CI=3.29-10.60, p=2.79x10-9) greater risk compared to individuals in the lowest decile". I would recommend comparing with the 40-60% PGS decile rather than the lowest decile, as the lowest PGS decile does not represent 'normal controls'.

      Thank you. In the revised version, the PGS categories was compared following the recommendation (lines 461-463).

      (5) In the field of PGS, it's common to require an independent dataset for training and testing the PGS model. Here, there seems to be an overfitting issue due to using the same subjects for both training and testing the variants.

      We are sorry for the misunderstanding. In fact, we have followed the standard to avoid overfitting of the PGS model and have used different training and testing datasets. The training data (GWAS) was the HGI-B2 ALL meta-analysis, in which our AMR GWAS was not included. The PRS model was then tested in the SCOURGE AMR cohort. However, it is true that we did test the combination of the PRS adding the new discovered variants in the SCOURGE cohort. To avoid potential overfitting by adding the new loci, we have excluded from the manuscript the results on which we included the newly discovered variants.

      (6) The variants selected for the PGS appear arbitrary and may not leverage the GWAS findings without an independent training dataset.

      Again, we are sorry for the misunderstanding. The PGS model was built with 43 variants associated with hospitalization or severity within the HGI v7 results and 7 which were discovered by the GenOMICC consortium in their latest study and were not in the latest HGI release. The variants are included within the Supplementary Table 14, but we have now annotated the discovery GWAS.

      (7) The TWAS models were predominantly trained on European samples, and there is no replication study for the findings as well.

      This is a good point regarding the lack of diversity in GTEx data. We have also used data from AMR populations (GALA II-SAGE models), although it was only available for blood tissue. Regarding the ancestry mismatch between datasets, several studies have attempted to explore the impact. Gay et al. (PMID: 32912333) studied local ancestry effects on eQTLs from the GTEx consortium and concluded that adjustment of eQTLs by local ancestry only yields modest improvement over using global ancestry (as done in GTEx). Moreover, the colocalization results between adjusting by Local Ancestry and Global Ancestry were not significantly different. Besides, Mogil et al. (PMID: 30096133) observed that genes with higher heritability share genetic architecture between populations. Nevertheless, both studies have evidenced decreased power and poorer predictive performances regarding gene expression because of reduced diversity in eQTL analyses. As consequence of the ancestry mismatch, we now warn the readers that this may compromise signal detection (Discussion, lines 531-533). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors mentioned the fine mapping method did not converge for the locus in chr 11. I would consider trying a different fine-mapping method (such as SuSiE or FINEMAP). It would be helpful to provide posterior inclusion probabilities (PIP) for the variants in fine mapping results and plot the PIP values in the regional association plots.

      As suggested, we have also used SuSIE, which allows to assume more than one causal signal per locus. However, in this case the results were not different from those obtained with the original Bayesian colocalization performed with corrcoverage. SuSIE’s fine-mapping for chromosome 11 prioritized a single variant, which is likely due to the rare frequency. Thus, we have maintained the fine-mapping as it was originally indicated in the previous version of the manuscript but have now included the credible set in Supplementary Table 6.

      Regarding the PIP, at the fine mapping stage we are inclined to put more weight on the functional annotations of the variants in the credible set than on the statistical contributions to the signal. This is the reason why we prefer not to put weight on the PIP of the variants but prioritize variants that were enriched functional annotations.

      (2) Please provide more detailed information about the VEP and V2G analysis and how to interpret those results. My understanding of V2G is that it includes different sources of information (such as molecular QTLs and chromatin interactions from different tissues/cell types, etc.). It is unclear what sources of information and weight settings were used in the V2G model.

      Thank you for rising this point. As suggested, we have clarified the basis for VEP and V2G and the interpretation (lines 732-743).

      (3) The authors identified multiple genes with different strategies, e.g. FUMA, V2G, COLOC, TWAS, etc. How many genes were found/supported by evidence provided by multiple methods? It could be helpful to have a table summarizing the risk genes found by different strategies, and the evidence supporting the genes. e.g. which genes are found by which methods, and the biological functions of the genes, etc.

      Thank you for rising this point. As suggested, we now added a new figure (Figure 5) to summarize the findings with the multiple methods used.

      (4) It would be helpful to make the code/scripts available for reproducibility.

      As suggested, the SCOURGE Latin American GWAS summary and the analysis scripts (https://github.com/CIBERER/Scourge-COVID19/tree/main/scripts/novel-risk-hosp-AMR-2024) are now accessible in the Consortium GitHub repository (https://github.com/CIBERER/Scourge-COVID19) (lines 806-807).

      (5) The fonts in some of the figures (e.g. Figure 2) are hard to read.

      Thank you. We have now included the figures as SVG files.

      Reviewer #2 (Recommendations For The Authors):

      - The abstract lacks a conclusion sentence.

      Thank you. As suggested, we have included two additional sentences with broad conclusions from the study. We preferred to avoid relying on conclusions related to known or new biological links of the prioritized genes given the lack of functional validation of main findings.

      - Regarding the association analysis (page 27, line 677), I wonder if some of the 10 principal components (PCs) are capturing information about the recruitment areas (countries). It may be relevant to test for multicollinearity among these variables.

      Since we acknowledge that some of the categories might be correlated with a certain PC but not all of them do, we have calculated GVIF values for the main variables to assess the categorical variable as a single entity. The scaled GVIF^1(1/2*Df)) value for the categorical variable is 1.52. Thus, if we square this value, we obtain 2.31, which can be then used for applying usual rule-of-thumb for VIF values.

      - Still on the topic of association analysis, did the authors adjust the logistic model for comorbidities variables from Table 1? Given these comorbidities also have a genetic component and their distribution differs between non-hospitalized vs hospitalized, I am concerned that comorbidities might be confounding the association between genetic variants and COVID.

      We did not adjust by comorbidities since HGI studies were not adjusted either and we aimed to be as aligned as possible with HGI. However, as suggested, we have now tested the association between each of the comorbidities in Table 1 and each of the variants in Table 2, using the comorbidities as dependent variables and adjusting for the main covariables (age, sex, PCs and country of recruitment). None of the variants were significantly associated to the comorbidities (line 333).

      - If I understood correctly, the 49 genetic variants used to develop the polygenic risk score model (PRS) were based on the HGI total sample size (data release 7), which is predominantly of European ancestry. I am concerned about the prediction accuracy in the AMR population (PRS transferability issue).

      We have explored literature in search of other PRS to compare the associated OR in our cohort with ORs calculated in European populations. Horowitz et al. (2022) reported an OR of 1.38 for the top 10% with respect to hospitalization risk in European individuals using a GRS with 12 variants.

      We acknowledge that this might be an issue and is now explained in discussion of the revised version (lines 561-568). However, as this is the first time a PRS for COVID-19 is applied to a relatively large AMR cohort, we believe that this analysis will be of value for further analyses regarding PRS transferability, providing a source for comparison in further studies.    

      - On page 23, line 579, the authors acknowledge their "GWAS is underpowered". This sentence requires a sample/power calculation, otherwise, I suggest using "is likely underpowered".

      Thanks for the input. We have modified the sentence as suggested.

      Reviewer #3 (Recommendations For The Authors):

      I wonder if the authors have an approximate date when the GWAS summary statistic will be available. I reviewed some manuscripts in the past, and the authors claimed they would deposit the data soon, but in fact it would not happen until 2 years later.

      The summary statistics are already available from the SCOURGE Consortium repository https://github.com/CIBERER/Scourge-COVID19 (lines 806-807).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      Major 

      (a) In the study the authors focus on the RALF1 peptide. But according to expression data and the study from Abarca et al., 2021, RALF1 is not the only peptide expressed in the root and also having an impact in root growth effect. Similarly, looking at the primary sequence from RALF1 it does not differ much chemically from other RALFs such as RALF33, RALF23, RALF22, etc. So, does the cell wall pectin methylation status also have an impact on the effect of other RALFs on root growth or is that specific of RALF1? 

      (b) In addition, is the internalization of FER depending only on RALF1 upon the methylation status of cell wall pectins? Or can other RALFs cause a similar effect potentially?

      (c) The authors propose that RALF1 associates with deesterifed pectin, through electrostatic interactions. To do that they perform Biolayer interferometry assays using a buffer with pH 7.4. Is that a relevant pH at the cell wall? Is possible that the authors thought that this may not change the charges of R and K residues, however, it will affect the overall charge of the peptide given the fact that it contains quite some N and Q in the exposed surface. The authors may want to consider that.

      (d) Moreover, the authors do not use their peptide RALF1KR, suggested as a peptide not binding OGs, as a control in their OG binding assays. That biochemical experiment should also be included to validate their results and conclusions.

      We thank reviewer #1 for these comments. In this work, we focused on RALF1 but the majority of AtRALF peptides, when applied exogenously as synthetic peptides, induce RALF1like effects in Arabidopsis (Abarca et al., 2021; PMID: 34608971). Moreover, all RALF peptides display clusters of R and K residues and are negatively charged (Abarca et al., 2021; PMID: 34608971). In comparison to RALF1, we now also use RALF34 because it was suggested to interact also via the Catharanthus roseus receptor-like kinase 1-like (CrRLK1L) THESEUS1 (THE1). Notably, RALF34 also induced the internalization of FER-GFP. Moreover, the interference with PME also disrupted this activity of RALF34. Therefore, we assume that other RALF peptides display the same or similar signalling modalities. Nevertheless, it remains to be addressed if all RALF family members require PME activity. 

      We appreciated these comments and incorporated this aspect in the revised version of the manuscript. The pH was chosen for technical reasons associated with the used BLI buffer. As requested, we also included the RALF1-KR peptide in our OG binding assays. Under these conditions, the mutated peptides were not able to interact with the OGs anymore. Accordingly, we conclude that the K and R residues in RALF1 are crucial for its binding to demethylesterified OGs.  

      (e) Another important aspect is regarding their design RALF1KR mutant and its effect in planta. The authors report the following: "RALF1-KR peptides are not bioactive, because they did neither affect root growth, nor cell wall integrity, nor did they induce the ligand-induced endocytosis of FER in epidermal root cells (Figure 5D-I). These findings suggest that the positively charged residues in RALF1 are essential for its activity in roots." According to the structure published by Xiao at el. 2019, the R in the alpha helix from RALF peptides (YISYQSLKR... in RALF1 seq) is directly involved in the interaction with LLGs. So, a mutation in that R may impair the interaction of RALF1 with LLG and therefore the complex formation with FER. So, it is well possible that the effect that the authors are seeing on FER signaling and endocytosis, using this peptide variant, may not be due to the impaired capacity of the peptide to bind deesterified pectin but to not be able to be sensed by the membrane complex directly. To verify that the authors should test, either biochemically or by CoIP in planta, that their RALF1KR variant can still be perceived by the LLG-FER complex. 

      We agree with reviewer #1 and do not doubt that the positive charges in RALF1 likely interact with several entities. The respective sites were also covered in Liu et al., 2024 (Cell). It would be interesting to understand how the charge-dependent interaction with pectin modulates the RALF binding to the LLG-FER complex, but these experiments are beyond the scope of this manuscript. We confirmed that the negative charges in RALF1 are essential for OG binding as well as for its bioactivity. We however do not rule out that they bear additional structural functions beyond pectin binding. We clarified this aspect in the revised version. It is conceivable that the pectin and receptor complex binding of RALF1 is molecularly and mechanistically related. 

      (f) The authors propose in this study that this effect of RALF1-pectin mode of action on FER is independent from LRXs. That is a very interesting observation which also aligns with similar observations from other independent studies (Moussu et al., 2020; Schoenaers et al. Nat Plants, 2024; Franck et al., 2018). However, that seems to be in conflict with the previous mode of action that the authors had described in Dunser et al., 2019. In that last study the authors had described that FER constitutively interacts with LRX proteins in a direct way to sense cell wall changes. In my view the authors do not critically elaborate to explain these two contradicting results, which are key to understand the mode of action they are describing. This relevant aspect should be addressed more in depth by the authors in their discussion.

      Thank you for the comment. We do not see that our findings contradict our previous work (from Dünser et al., 2019). There we concluded that LRX and FER directly interact to sense cell wall characteristics. However, the loss of LRX function abolished the cell wall sensing mechanism, but the respective loss-of-function and dominant negative lines were still able to detect RALF peptides. We hence proposed that the LRX/FER function is at least partially independent of the FER function in RALF perception. This is in agreement with our current study where we conclude again that FER shows LRX-dependent but also -independent modes of action. 

      Minor

      (g) In the introduction (first page), the authors write the following sentence: "RALF peptides are involved in multiple physiological and developmental processes, ranging from organ growth and pollen tube guidance to modulation of immune responses (Stegmann et al., 2017; Abarca et al., 2021)". RALFs are not involved in pollen tube guidance but pollen tube growth.

      So, that should be changed in the Introduction sentence. Also, in addition, the authors could cite additional references here to support the sentence such as Mecchia et al., 2017 or Ge et al. , 2017, in addition. 

      Thank you for pointing this out and we apologize for our flaw. We corrected the statement in the revised version of the manuscript and added the citations as requested.

      (h) The new study of Schoenaers et al. Nat Plants, 2024 should now be included in the revised version.

      Thank you. We implemented this reference in the revised manuscript.

      Reviewer #2 (Public Review):

      The genetic material used by the authors to strengthen the connection of RALF signalling and

      PME activity might not be as suitable as an acute inhibition of PME activity.  The PMEI3ox line generated by Peaucelle et al., 2008 is alcohol-inducible. Was expression of the PMEI induced during the experiments? As ethanol inducible systems can be rather leaky, it would not be surprising if PME activity would be reduced even without induction, but maybe this would warrant testing whether PMEI3 is actually overexpressed and/or whether PME activity is decreased. On a similar note, the PMEI5ox plants do not appear to show the typical phenotype described for this line. I personally don't think these lines are necessary to support the study. Short-term interference with PME activity (such as with EGCG) might be more meaningful than life-long PMEI overexpression, in light of the numerous feedback pathways and their associated potential secondary effects. This might also explain why EGCG leads to an increase in pH, as one would expect from decreased PME activity, while PMEI expression (caveats from above apply) apparently does not (Fig 3A-D).

      We agree with reviewer #2. The PMEI3ox line from Peaucelle et al., 2008 is ethanolinducible, but we observed a strong phenotype (at seedling and adult stage) without ethanol induction. We performed all experiments (root growth assays and confocal observations) with as well as without induction using ethanol, leading to similar results. We concluded from that, that the line is either leaky or that overexpression of PMEI3 is already induced upon seed sterilisation with ethanol. Accordingly, we did not intend to use the lines as acute inhibition of PME but rather used the lines to genetically confirm our data derived from acute pharmacological inhibition. We do show in Figure 1G that the levels of de-methylesterified pectin is decreased in the PMEI3ox mutant compared to WT seedlings. It is exactly this alteration that we are exploiting to assess the necessity of charged pectin for RALF1 signalling. Since the apoplastic pH in the PMEI3ox line is not altered compared to WT, we can conclude that the observed effect on RALF1 signalling is entirely due to the altered pectin charge.

      We would like to note that the PMEI5ox line indeed shows the reported root-bending phenotype when grown on plates. We started to perform RALF application assays in liquid medium, because EGCG does not show activity on MS plates. Moreover, it allows us to perform the assays with low amounts of synthetic peptides. The seedling images in our root growth assay might be hence misleading since the assay was done in liquid MS medium and the seedlings were carefully straightened on MS plates before imaging. This transfer makes it difficult to observe the root-bending or -curling phenotype, which is typical for PMEI5ox. 

      At least at first sight, the observation that OGs are able to titrate RALF from pectin binding seems at odds with the idea of cooperative binding with low affinity, leading to high avidity oligomers. Perhaps the can provide a speculative conceptual model of these interactions?

      We added a high concentration of OGs in the media and observed a strong repression of RALF1 activity at the root surface. We assume the OGs form oligomers with RALF peptides in the media, preventing them from penetrating the roots.

      I could not find a description of the OG treatment/titration experiments, but I think it would be important to understand how these were performed with respect to OG concentration, timing of the application, etc.

      Thank you for pointing this out. The description of the OG RALF titration is added in the methods section.

      Reviewer #2 (Recommendations for the Authors):

      Page 3: „and can bind to extracellular pectin" Liu et al, 2024 should maybe also be cited here. 

      Amended.

      I am not so sure about the use of "conceptualizing" in the last sentence of the abstract and elsewhere in the manuscript.

      I would suggest adding a few sentences that describe and differentiate what this study and other recently published works (e.g. Dünser, Liu, Mossou, Lin) have revealed about the pectin association of RALFs, LRXs, and FER to help the non-expert reader to navigate this increasingly complex area. May also be worth mentioning that the previously described pectin sensing function of FER is physically separated from the RALF binding domain (Gronnier et al., 2022)

      Thank you for your constructive comments. We followed your suggestions and further improved the discussion in the revised version of our manuscript.

      Reviewer #3 (Recommendations for the Authors): 

      (1) The authors claim that pectin is something like an extracellular signaling scaffold. In other fields, signalling scaffold refers to proteins that tether the signalling components and regulate/are involved in the signal transduction. Here, pectin is a cell wall structural component whose molecular status is sensed and perceived rather than a functional signaling component. To me, it is FERONIA to be called a signalling scaffold in this case. However, this is my view, and the authors may present their concept. 

      RALF peptides as well as FERONIA bind to de-methylesterified pectin, which is essential for its signalling output. Albeit not being a protein, we propose that pectin functions like a scaffold tethering both signalling components and thereby enabling signalling. FERONIA has been indeed also proposed to function as a scaffold when tethering other signalling components.

      (2) I have no problem with authors using the more general term pectin instead of homogalacturonan throughout the text. Still, authors should, at some point in the text, specify that by pectin, they mean homogalacturonan; the authors did not analyze other pectic types on binding. 

      We followed your suggestion.

      (3) The authors show that RALF1 binds to OGs with a high avidity. Given the fact that OGs released from homogalacturonan upon pathogen infection are Damage-Associated Molecular Patterns (DAMPs), this opens the possibility that this particular activity of RALF1 might actually function in modulation of immune response. I suggest that authors should not exclude this possibility. 

      We fully agree to this possibility for FER-dependent signalling.

      (4) Are there any indications that a similar mechanism can be extrapolated to other FERONIA homologs, such as THESEUS or HERCULES? Although it is not essential to comment, I think this could enrich the discussion.

      This is a highly interesting research question, which we may follow up in our upcoming studies. RALF34, which is considered a ligand for THESEUS, also induced FER internalization, which was also sensitive to PME inhibition. While this requires further investigation, this finding hints at a common mechanism for FER- and THE-dependent RALF peptides.

      (5) I suggest using the model scheme currently in the supplement as a main figure to provide an immediate accessible summary of the findings.

      Thank you for the suggestion to add the summary scheme in the main figures. We followed your suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Review #1:

      (1) It would be helpful to explain the criteria for choosing a given number of clusters and for accepting the final clustering solution more clearly. The quantitative results (silhouette plots, Rand index) in Supplementary Figure 2 should perhaps be included in the main figure to justify the parameter choices and acceptance of specific clustering solutions.

      We revised the text and added labels to the original Supplementary Figure 2 (now main Figure 4) to clarify how we arrived at the best settings for random-seed clustering. 

      (2) It would be helpful to show how the activity profiles in Figure 3 would look like for 3 or 5 (or 6) clusters, to give the reader an impression of how activity profiles recovered using different numbers of clusters would differ.

      We added a new figure (Supplementary Figure 4) that shows 5- and 6-cluster results. Note that the same three subpopulations in Figure 3 were reliably identified as distinct clusters even with alternative settings, corroborating the results in the tSNE space (Supplementary Figure 3). 

      (3) The authors attempt to link the microstimulation effects to the presence of functional neuron clusters at the stimulation site. How can you rule out that there were other, session-specific factors (e.g., related to the animal's motivation) that affected both neuronal activity and behavior? For example, could you incorporate aspects of the monkey's baseline performance (mean reaction time, fixation breaks, error trials) into the analysis?

      We tested the potential influences of monkeys’ motivational states on our observations using two sets of analysis. First, we examined whether motivational state modulated the likelihood of observing a specific type of neural activity in STN. We focused on three measurements of motivational states: the rate of fixation break, the overall error rate, and mean RT. We found that none of these measurements differed significantly among sessions when we encountered different subpopulations (new Supplemental Figure 7), suggesting that motivational state alone cannot explain the differences in activity patterns of the four subpopulations. 

      Second, we examined how motivational state may be reflected in the microstimulation results. To clarify, because we interleaved trials with and without microstimulation, the microstimulation effects cannot be solely explained by session-specific factors. However, it is possible that motivational state can modulate the magnitude of microstimulation effects. We performed correlation analysis between microstimulation effects (difference in each fitted DDM parameter between trials with and without microstimulation) and motivational state (fixation break, error rate, mean RT on trials without microstimulation). We did not find significant correlation for any combination (Supplemental Table 1). These results suggest that the motivational state of the monkey had little influence on our recording and microstimulation results. However, because our monkeys operated within a narrow range of strong engagement on the task, we cannot rule out the possibility that STN activity or microstimulation effects could change significantly if the monkeys were not as engaged. We have added these results in a new section titled “Heterogeneous activity patterns and microstimulation effects cannot be explained by variations in motivational state”. 

      (4) Line 84: What was the rationale for not including both coherence and reaction time in one multiple regression model?

      On the task we used, RT depends strongly on coherence in a nonlinear fashion (e.g., example behavior in now Figure 5). We thus performed regressions using coherence and RT separately. We revised the text in Methods to clarify our rationale (lines 470-473):

      “To quantitatively measure each neuron’s task-related modulation, we performed two multiple linear regressions for each running window, separately for coherence and RT because monkeys’ RT strongly depends on coherence on our task:”

      Review #2:

      The interpretation of the results, and specifically, the degree to which the identified clusters support each model, is largely dependent on whether the artificial vectors used as model-based clustering seeds adequately capture the expected behavior under each theoretical model. The manuscript would benefit from providing further justification for the specific model predictions summarized in Figure 1B.

      We added information on the original figure/equations that were the basis of the artificial vectors we constructed for clustering analysis and their abbreviated summary in Figure 1B (first paragraph in section “STN subpopulations can support previously theorized functions”). These vectors were meant to capture prominent features of the predicted activity patterns, in the forms of choice, time, and motion strength dependencies. We also emphasize that we obtained very similar results using random clustering seeds.

      Further, although each cluster's activity can be described in the context of the discussed models, these same neural dynamics could also reflect other processes not specific to the models. That is, while a model attributing the STN's role to assessing evidence accumulation may predict a ramping up of neural activity, activity ramping is not a selective correlate of evidence accumulation and could be indicative of a number of processes, e.g., uncertainty, the passage of time, etc. This lack of specificity makes it challenging to infer the functional relevance of cluster activity and should be acknowledged in the discussion.

      We thank the reviewer for pointing out the alternative interpretation of these modulation patterns. We have added this caveat in the Discussion (lines 398-401): “It is also possible that the ramping activity reflects alternative roles for the STN in the evaluation of the decision process, the tracking of elapsed time, or both. How these possible roles relate to those of caudate neurons awaits further investigation (Fan et al., 2024)”. 

      Additionally, although the effects of STN microstimulation on behavior provide important causal evidence linking the STN to decision processes, the stimulation results are highly variable and difficult to interpret. The authors provide a reasonable explanation for the variability, showing that neurons from unique clusters are anatomically intermingled such that stimulation likely affects neurons across several clusters. It is worth noting, however, that a substantial body of literature suggests that neural populations in the STN are topographically organized in a manner that is crucial for its role in action selection, providing "channels" that guide action execution. The authors should comment on how the current results, indicative of little anatomical clustering amongst the functional clusters, relate to other reports showing topographical organization.

      We thank the reviewer for raising this important point. We have added the following text in the Discussion:

      “The intermingled subpopulations may appear at odds with the conventional idea of topography in how the STN is organized. For example, the “tripartite model” suggests that STN is segregated by motor, associative, and limbic functions (Parent and Hazrati, 1995); afferents from motor cortices and neurons related to different types of movements are largely somatotopically organized in the STN (DeLong et al., 1985; Nambu et al., 1996); and certain molecular markers are expressed in an orderly pattern in the STN (reviewed in Prasad and Wallén-Mackenzie, 2024). Because we focused on STN neurons that were responsive on a single oculomotor decision task, our sampling was likely biased toward STN subdivisions related to associative function and oculomotor movements. As such, our results do not preclude the presence of topography at a larger scale. Rather, our results underscore the importance of activity patternbased analysis, in addition to anatomy-based analysis, for understanding the functional organization of the STN.”

      Figure 3 is referenced when describing which cluster activity is choice/coherence dependent, yet it is unclear what specific criteria and measures are being used to determine whether activity is choice/coherence "dependent." Visually, coherence activity seems to largely overlap in panel B (top row). Is there a statistically significant distinction between low and high coherence in this plot? The interpretation of these plots and the methods used to determine choice/coherence "dependence" needs further explanation.

      We added a new figure (Sup Figure 3) that shows the summary of choice and coherence modulation, based on multiple linear regression analysis, for each subpopulation separately. We also updated the description of these activity patterns in Results (lines 122-130):

      In general, the association between cluster activity and each model could be more directly tested. At least two of the models assume coordination with other brain regions. Does the current dataset include recordings from any of these regions (e.g., mPFC or GPe) that could be used to bolster claims about the functional relevance of specific subpopulations? For example, one would expect coordinated activity between neural activity in mPFC and Cluster 2 according to the Ratcliff and Frank model.

      We agree completely that simultaneous recordings of STN and its afferent/efferent regions (such as mPFC, GPe, SNr, and GPi) would provide valuable insights into the specific roles of STN and the basal ganglia as a whole. Such recordings are outside the scope of the current study but are in our future plans. 

      Additionally, the reported drift-diffusion model (DDM) results are difficult to interpret as microstimulation appears to have broad and varied effects across almost all the DDM model parameters. The DDM framework could, however, be used to more specifically test the relationships between each neural cluster and specific decision functions described in each model. Several studies have successfully shown that neural activity tracks specific latent decision parameters estimated by the DDM by including neural activity as a predictor in the model. Using this approach, the current study could examine whether each cluster's activity is predictive of specific decision parameters (e.g., evidence accumulation, decision thresholds, etc.). For example, according to the Ratcliff and Frank model, activity in cluster 2 might track decision thresholds.

      We thank the reviewer for the suggested analysis. Because including the neural activity in the model substantially increases model fitting time, we performed a preliminary round of model fitting for 15 neurons (5 neurons closest to each of the cluster centroids). For each neuron, we measured the average firing rates in three windows: 1) a 350 ms window starting from dots onset (“Dots”), 2) a 350 ms window ending at saccade onset (“Presac”), and 3) a variable window starting from dots onset and ending at 100 ms before saccade onset (“Fullview”). For each window, the firing rates were z-scored across trials.  We incorporated the firing rates into two model types. In the “DV” type, the firing rates were assumed to influence three DDM parameters related to evidence accumulation: k, me, and z. In the “Bound” type, the firing rates were assumed to influence three DDM parameters related to decision bound: a, B_alpha, and B_d. In total, we fitted six combinations of firing rates and model types to each neuron. For comparison, we also fitted the standard model without incorporating firing rates. 

      As shown in Author response image 1, firing rates of single STN neurons had minimal contributions to the fits. With the exception of one neuron, AIC values were greater for model variants including firing rates than the standard model (Author response image 1A), indicating that including firing rate did not improve the fits. For all neurons, the actual fitted coefficients for firing rates were several degrees of magnitude smaller than the corresponding DDM parameter (Author response image 1B; note the range of y axis), indicating that the trial-by-trial variation in firing rate had little influence on the evidence accumulation- or decision bound-related parameters. Based on these preliminary fitting results, we believe that a single STN neuron does not have strong enough influence on the overall evidence accumulation or decision bound to be detected with the model fitting method.  We therefore did not expand the fitting analysis to all neurons. 

      Author response image 1.

      Firing rates of a single STN neuron did not substantially influence decision-related DDM parameters. A, Differences in AIC between DDM variants that included firing rate-dependent terms and the standard DDM. Red dahsed line: difference = -3. Each column represents results from one unit. B, Fitted coefficients for firing rate-related terms were near zero. Note the range of y axis. Values for the top and bottomw panels were obtained from "DV"- and "Bound"-type models, respectively. See text for more details.

      We emphasize, however, that the apparent negative results do not necessarily argue against a causal role of the STN in decision making, rather, these results more likely reflect the methodological limitation: because we used a single task context, the monkeys’ natural trial-by- trial variations in the DDM components may be too small. A better design would be to manipulate task contexts to induce larger changes in evidence accumulation or decision bounds and then test for a correlation between single-neuron firing rates and these changes. We are currently using such a design in a follow-up study. 

      The table in Figure 1B nicely outlines the specific neural predictions for each theoretical model but it would help guide the reader if the heading for each column also included a few summary words to remind the reader of the crux of each theory, e.g. "Ratcliff+Frank 2012 (adjusted decision-bounds)"

      We thank the reviewer for this suggestion. We considered implementing this but eventually decided not to add more headings to the column, because the predicted STN functions of the three models cannot all be succinctly summarized. We thus prefer to include more detailed descriptions in the main text, instead of in the figure. 

      The authors frequently refer to contralateral vs. ipsilateral decisions but never explicitly state what this refers to, i.e. contralateral relative to what (visual field, target direction, recording site, etc.)? The reader can eventually deduce that this means contralateral to the recording site but this should be explicitly stated for clarity.

      We added in Methods: 

      Line 483: “Contralateral/ipsilateral choices refer to saccades toward the targets contralateral/ipsilateral to the recording sites, respectively.” 

      Line 535: Contralateral/ipsilateral choices refer to saccades toward the targets contralateral/ipsilateral to the microstimulation sites, respectively.”

      Again, for clarity, it would be helpful to explicitly define what the authors mean by "sensitive to choice" when referring to Figure 1B as this could be interpreted to mean left/right or ipsilateral/contralateral.

      In the context of Figure 1B, “sensitive to choice” means showing different responses for the two choices in our 2AFC task, regardless of the task geometry. We added explanation in the figure caption.

      Color bar labels would be helpful to include in all figures that include plots with color bars.

      We apologize for omitting the labels. They are added to Figure 2B and C, Supplemental Fig. 1.  

      The authors should briefly note what a "lapse term" is when describing the logistic function results.

      We revised the text in Results (lines 184-186) and Methods (line 527) to clarify that lapse terms were used to capture errors independent of motion strength.

      Are the 3 example sessions in Figure 4 stimulating the same STN site and/or the same monkey? This information should be noted in the caption or main text.

      We revised the caption: “A-C, Monkey’s choice (top) and RT (bottom) performance for trials with (red) and without (black) microstimulation for three example sessions (A,B: two sites in monkey C; C: monkey F).”

      Figure 3B the authors note that "the last cluster shows little task-related modulation" - what criteria are they using to make this conclusion? By eye, the last cluster and cluster 1 seem to show a similar degree of modulation when locked to motion onset.

      We added a new figure (Suppl Figure 2) that shows the summary of choice and coherence modulation, based on multiple linear regression analysis, for each subpopulation separately. 

      Reviewer #3:

      We have grouped the reviewer’s public and specific comments by content. 

      First, the interpretation of the neural subpopulations' activity patterns in relation to the computational models should be clarified, as the observed patterns may not directly correspond to the specific signals predicted by the models. The authors claim that the first subpopulation of STN neurons reflects the normalization signal predicted by the model of Bogacz and Gurney (2007). However, the observed activity patterns only show choice- and coherence-dependent activity, which may represent the input to the normalization computation rather than its output. The authors should clarify this point and discuss the limitations of their interpretation. 

      We agree with the reviewer that the choice- and coherence-dependent activity pattern does not sufficiently indicate a normalization computation. We interpreted such activity as satisfying a necessary condition for, and therefore consistent with, the theoretical model proposed by Bogacz and Gurney. We have reviewed the text to ensure that we never made the claim that the first subpopulation mediates the normalization.   

      Second, the authors could consider using a supervised learning method to more explicitly model the pattern correlations between the three profiles. The authors used k-means clustering to identify STN subpopulations. Given the clear distinction between the three types of neural firing patterns, a supervised learning method (e.g., a generalized linear model) could be used as a more explicit encoding model to account for the pattern correlations between the three profiles.

      We used two approaches to examine the different response profiles. The “random-seed” approach used non-supervised clustering to probe the functional organization of STN neurons, with no a priori assumption about how many subpopulations may be present. The “model-seed” approach is similar in spirit to what the reviewer suggested: we defined artificial vectors, akin to regressors in a generalized linear model, that showed key modulation features as predicted by previous theoretical models. We then projected the neurons’ activity profiles onto these vectors, akin to performing a regression analysis.   

      Third, a neural population model could be employed to better understand how the STN population jointly contributes to decision-making dynamics. The single-neuron encoding analysis reveals mixed effects from multiple decision-related functions. To better understand how the STN population jointly contributes to the decision-making process, the authors could consider using a neural population model (e.g., Wang et al., 2023) to quantify the population dynamics.

      We agree with the reviewer that a neural population model would be helpful for testing our understanding of the roles of STN. However, we believe that this is premature at the moment because we have no knowledge about how these different subpopulations interact with each other within STN, nor how they interact with other basal ganglia nuclei. We hope our results provide a foundation for future experiments that can provide more specific insights in the roles of each subpopulation, which can then be tested in a neural population model as the reviewer suggested.  

      Finally, the added value of the microstimulation experiments should be more directly addressed in the Results section, as the changes in firing patterns compared to the original patterns are not clearly evident. The microstimulation results (Figure 7A) do not show significant changes in firing patterns compared to the original patterns (Figure 3B). As microstimulation is used to identify the hypothetical role of the STN beyond the correlational analysis, the authors should more directly address the added value of these experiments in the Results section.

      We apologize for the confusion. The average firing rates at the top of original Figure 7A (now Figure 8A) were obtained in recordings just before microstimulation, to document which neuron subpopulation was near the stimulation electrode. We were not able to obtain recordings from the same neurons during microstimulation.  

      The ordering of the three hypotheses in the Introduction (1) adjusting decision bounds, (2) computing a normalization signal, (3) implementing a nonlinear computation to improve decision bound adjustment, is inconsistent with the order in which they are addressed in the Results section (2, 1, 3). To improve clarity and readability, the authors should consider presenting the hypotheses and their corresponding results in a consistent order throughout the manuscript.

      We thank the reviewer for this suggestion. We have reordered the text in Introduction to be consistent.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high resolution cryoEM structures of the WT Kv1.2 voltagegated potassium channel. Along with this structure the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore blocking toxin Dendrotoxin 1. The results shown in the revised version indicate that the mechanism of block is similar to that of related blocking-toxins, in which a lysine residue penetrates in the pore. Surprisingly, in these new structures, the bound toxin results in a pore with empty external potassium binding sites. 

      The quality of the structural data presented in this revised manuscript is very high and allows for unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltage-dependent potassium channel gating. In the revised version, the authors have addressed my previous specific comments, which are appended below. 

      (1) In the main text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      This has been fixed: line 229, p. 9.

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      This is noted in the discussion Lines 497-500, p. 18

      (3) The structures of WT in the absence of K+ shows a narrower selectivity filter, however Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed in such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We have changed Fig. 4B to show the same view as in Fig. 4A. In the legend we explain that opposing subunits are shown. We no longer give distances, in view of the lack of detectable carbonyl densities.

      (4) It would be really interesting to know the authors opinion on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We address this in the Discussion, line 506-523, pp. 18-19.

      Reviewer #2 (Public Review)

      Cryo_EM structures of the Kv1.2 channel in the open, inactivated, toxin complex and in Na+ are reported. The structures of the open and inactivated channels are merely confirmatory of previous reports. The structures of the dendrotoxin bound Kv1.2 and the channel in Na+ are new findings that will of interest to the general channel community. 

      Review of the resubmission: 

      I thank the authors for making the changes in their manuscript as suggested in the previous review. The changes in the figures and the additions to the text do improve the manuscript. The new findings from a further analysis of the toxin channel complex are welcome information on the mode of the binding of dendrotoxin. 

      A few minor concerns: 

      (1) Line 93-96, 352: I am not sure as to what is it the authors are referring to when they say NaK2P. It is either NaK or NaK2K. I don't think that it has been shown in the reference suggested that either of these channels change conformation based on the K+ concentration. Please check if there is a mistake and that the Nichols et. al. reference is what is being referred to. 

      Thank you for noticing the error. We meant NaK2K and we have changed this throughout.

      (2) Line 365: In the study by Cabral et. al., Rb+ ions were observed by crystallography in the S1, S3 and S4 site, not the S2 site. Please correct. 

      Thank you. We have re-written this section, lines 364-381, pp. 13-14.

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a plethora of structural work, and the authors are commended on the breadth of the studies. The structural studies are well-executed. Although the findings are mostly confirmatory, they do add to the body of work on this and related channels. Notably, the authors present structures of DTx-bound Kv1.2 and of Kv1.2 in a low concentration of potassium (which may contain sodium ions bound within the selectivity filter). These two structures add considerable new information. The DTx structure has been markedly improved in the revised version and the authors arrive at well-founded conclusions regarding its mechanism of block. Regarding the Na+ structure, the authors claim that the structure with sodium has "zero" potassium - I caution them to make this claim. It is likely that some K+ persists in their sample and that some of the density in the "zero potassium" structure may be due to K+ rather than Na+. This can be clarified by revisions to the text and discussion. I do not think that any additional experiments are needed. Overall, the manuscript is well-written, a nice addition to the field, and a crowning achievement for the Sigworth lab. 

      Most of this reviewer's initial comments have been addressed in the revised manuscript. Some comments remain that could be addressed by revisions of the text. 

      Specific comments on the revised version: 

      Quotations indicate text in the manuscript. 

      (1) "While the VSD helices in Kv1.2s and the inactivated Kv1.2s-W17'F superimpose very well at the top (including the S4-S5 interface described above), there is a general twist of the helix bundle that yields an overall rotation of about 3o at the bottom of the VSD." 

      Comment: This seemed a bit confusing. I assume the authors aligned the complete structures - the differences they indicate seem to be slight VSD repositioning relative to the pore rather than differences between the VSD conformations themselves. The authors may wish to clarify. As they point out in the subsequent paragraph, the VSDs are known to be loosely associated with the pore. 

      We aligned the VSDs alone, and it is a twist of the VSD helix bundle.

      This is now clarified in lines 269-273, p. 10.

      (2) Comment: The modeling of DTx into the density is a major improvement in the revision. Figure 3 displays some interactions between the toxin and Kv1.2 - additional side views of the toxin and the channel might allow the reader to appreciate the interactions more fully. The overall fit of the toxin structure into the density is somewhat difficult to assess from the figure. (The authors might consider using ChimeraX to display density and model in this figure.) 

      We have added new panels, and stereo pairs, to Figure 3.

      (3) "We obtained the structure of Kv1.2s in a zero K+ solution, with all potassium replaced with sodium, and were surprised to find that it is little changed from the K+ bound structure, with an essentially identical selectivity filter conformation (Figure 4B and Figure 4-figure supplement 1)." 

      Comment: It should be noted in the manuscript that K+ and Na+ ions cannot be distinguished by the cryo-EM studies - the densities are indistinguishable. The authors are inferring that the observed density corresponds to Na+ because the protein was exchanged from K+ into Na+ on a gel filtration (SEC) column. It is likely that a small amount of K+ remains in the protein sample following SEC. I caution the authors to claim that there is zero K+ in solution without measuring the K+ content of the protein sample. Additionally, it should be considered that K+ may be present in the blotting paper used for cryo-EM grid preparation (our laboratory has noted, for example, a substantial amount of Ca2+ in blotting paper). The affinity of Kv1.2 for K+ has not been determined, to my knowledge - the authors note in the Discussion that the Shaker channel has "tight" binding for K+. It seems possible that some portion of the density in the selectivity filter could be due to residual K+. This caveat should be clearly stated in the main text and discussion. More extensive exchange into Na+, such as performing the entire protein purification in NaCl, or by dialysis (as performed for obtaining the structure of KcsA in low K+ by Y. Zhou et al. & Mackinnon 2001), would provide more convincing removal of K+, but I suspect that the Kv1.2 protein would not have sufficient biochemical stability without K+ to endure this treatment. One might argue that reduced biochemical stability in NaCl could be an indication that there was a meaningful amount of K+ in the final sample used for cryo-EM (or in the particles that were selected to yield the final high-resolution structure).

      We now explain in the Methods section, in more detail the steps taken to avoid any residual Na+ contamination during purification, lines 683-687, pp. 24-25. We have changed the text to point out that the ion species cannot be distinguished in the maps, and note results in NaK2K and KcsA (lines 368-381, pp. 13-14).

      We note that the same procedures to remove K+ were used for the Kv1.2sW17’F structure (line 385, p. 14). We qualify the ion replacement to say that Na+ replaces “essentially” all K+ (line 607, p. 21).

      (4) Referring to the structure obtained in NaCl: "The ion occupancy is also similar, and we presume that Kv1.2 is a conducting channel in sodium solution." 

      Comment: Stating that "Kv1.2 is a conducting channel in sodium solution" and implying that conduction of Na+ is achieved by an analogous distribution of ion binding sites as observed for K+ are strong statements to make - and not justified by the experiments provided. Electrophysiology would be required to demonstrate that the channel conducts sodium in the absence of K+. More complete ionic exchange, better control of the ionic conditions (Na+ vs K+), and affinity measurements for K+ would be needed to determine the distribution of Na+ in the filter (as mentioned above). At minimum, the authors should revise and clarify what the intended meaning of the statement "we presume that Kv1.2 is a conducting channel in sodium solution". As mentioned above, it seems possible/likely that a portion of the density in the filter may be due to K+. 

      We now present a more detailed argument (lines 376 to 381, pp. 13-14.)

      Recommendations for the authors: 

      Reviewing Editor: 

      After consultation, the reviewers agree that, although the authors have answered most of the comments raised in the previous review, there remains a concern about the structure obtained in the presence if Na. Given that Kv1.2 is more reluctant to slow inactivation, the conducting structure in Na+ could be due to this fact or that it really has higher affinity for K+ than Na+. In the presence of even a small contamination by K+, this ion could thus occupy the selectivity filter, resulting in an open conformation. The authors should clearly state the steps taken to ensure no contamination by K+. It is also possible that indeed the open structure occurs even in the presence of Na+ in the selectivity filter. This should be also discussed, given that this has been observed in other potassium channel structures. 

      Reviewer #1 (Recommendations For The Authors): 

      In this revised version of the manuscript, the authors have adequately addressed my previous points and improved the clarity and readability of the manuscript. This is a compelling work that shows inactivated structures if the Kv1.2 potassium channel, especially interesting is a structure in the absence of extracellular potassium ions, that can help understand how a reduction in the availability of these ions speed up entrance into the inactivated state in these ion channels. 

      I would just recommend that in the absence of functional data (current recordings) when potassium is removed, the authors just use caution in ascribing this structure to an inactivated state. Also, it should be mentioned that the observed ion densities observed in the pore cannot unambiguously be identified as sodium ions. 

      Reviewer #3 (Recommendations For The Authors): 

      (1)  "The nearby Leu9 is also important as its substitution by alanine also decreases affinity 1000-fold, but we observe no contacts between this residue and residues of the Kv1.2s channel." 

      Comment: It seems early in the text to mention the potential interaction of Leu9 to the channel structure. The authors may wish to discuss Leu9 later in the manuscript - a figure showing the location of Leu9 would strengthen the statement. Any hypothesis on why mutation of it has such a profound effect? 

      Add a figure panel showing Leu9 position.

      We have rewritten the text as suggested, and have identified Leu9 in several panels of Fig. 3.

      (2)  "The X-ray structure of a-DTX (Figure 3A)" 

      Comment: The authors may wish to cite a reference to this X-ray structure. 

      We now cite Skarzynski (1992) on line 321, p. 12.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors assess the accuracy of short variant calling (SNPs and indels) in bacterial genomes using Oxford Nanopore reads generated on R10.4 flow cells from a very similar genome (99.5% ANI), examining the impact of variant caller choice (three traditional variant callers: bcftools, freebayes, and longshot, and three deep learning based variant callers: clair3, deep variant, and nano caller), base calling model (fast, hac and sup) and read depth (using both simplex and duplex reads).

      Strengths:

      Given the stated goal (analysis of variant calling for reads drawn from genomes very similar to the reference), the analysis is largely complete and results are compelling. The authors make the code and data used in their analysis available for re-use using current best practices (a computational workflow and data archived in INSDC databases or Zenodo as appropriate).

      Weaknesses:

      While the medaka variant caller is now deprecated for diploid calling, it is still widely used for haploid variant calling and should at least be mentioned (even if the mention is only to explain its exclusion from the analysis). 

      We have now added Medaka haploid caller to the benchmark. It performs quite well overall (better than the traditional methods), but not as good as Clair3 or DeepVariant.

      Appraisal:

      The experiments the authors engaged in are well structured and the results are convincing. I expect that these results will be incorporated into "best practice" bacterial variant calling workflows in the future. 

      Thank you for the positive appraisal.

      Reviewer #2 (Public Review):

      Summary:

      Hall et al describe the superiority of ONT sequencing and deep learning-based variant callers to deliver higher SNP and Indel accuracy compared to previous gold-standard Illumina short-read sequencing. Furthermore, they provide recommendations for read sequencing depth and computational requirements when performing variant calling.

      Strengths:

      The study describes compelling data showing ONT superiority when using deep learning-based variant callers, such as Clair3, compared to Illumina sequencing. This challenges the paradigm that Illumina sequencing is the gold standard for variant calling in bacterial genomes. The authors provide evidence that homopolymeric regions, a systematic and problematic issue with ONT data, are no longer a concern in ONT sequencing.

      Weaknesses:

      (1) The inclusion of a larger number of reference genomes would have strengthened the study to accommodate larger variability (a limitation mentioned by the authors). 

      Our strategic selection of 14 genomes—spanning a variety of bacterial genera and species, diverse GC content, and both gram-negative and gram-positive species (including M. tuberculosis, which is neither)—was designed to robustly address potential variability in our results. Moreover, all our genome assemblies underwent rigorous manual inspection as the quality of the true genome sequences is the foundation this research is built upon. Given this, the fundamental conclusions regarding the accuracy of variant calls would likely remain unchanged with the addition of more genomes.  However, we do acknowledge that a substantially larger sample size, which is beyond the scope of this study, would enable more fine-grained analysis of species differences in error rates.

      (2) In Figure 2, there are clearly one or two samples that perform worse than others in all combinations (are always below the box plots). No information about species-specific variant calls is provided by the authors but one would like to know if those are recurrently associated with one or two species. Species-specific recommendations could also help the scientific community to choose the best sequencing/variant calling approaches.

      Thank you for highlighting this observation. The precision, recall, and F1 scores for each sample and condition can be found in Supplementary Table S4.

      Upon investigation of the outliers in Figure 2 we discovered three things. First, there was a parameter in Longshot we were using that automatically capped coverage and lead to a number of false negatives, leading to its outlier. This has now been rectified and the figure is updated accordingly. Second, the outlier in the simplex sup SNP panel (top left) was the same E. coli sample for most variant callers (though Medaka had no issues). The reasoning for this was a variant dense repetitive region. We have added an in-depth explanation of this, along with figures illustrating the issue in Supplementary Section S2, with a brief statement in the main text. Third, the outlier in the duplex sup SNP panel (top right) is due to a very low (duplex) depth sample. This has also been added briefly to the main text and fully in Section S2.

      We have now included a species-segregated version of Figure 2 (Suppl. Figures S5-7) for Clair3 with the sup model (best performer) for a clearer interpretation of how each species performs.

      (3) The authors support that a read depth of 10x is sufficient to achieve variant calls that match or exceed Illumina sequencing. However, the standard here should be the optimal discriminatory power for clinical and public health utility (namely outbreak analysis). In such scenarios, the highest discriminatory power is always desirable and as such an F1 score, Recall and Precision that is as close to 100% as possible should be maintained (which changes the minimum read sequencing depth to at least 25x, which is the inflection point).

      We agree that the highest discriminatory power is always desirable for clinical or public health applications. In which case, 25x is probably a better minimum recommendation. However, we are also aware that there are resource-limited settings where parity with Illumina is sufficient. In these cases, 10x depth from ONT would provide enough data.

      The manuscript previously emphasised the latter scenario, but we have revised the text (Discussion) to clearly recommend 25x depth as a conservative aim in settings where resources are not a constraint, ensuring the highest possible discriminatory power.

      (4) The sequencing of the samples was not performed with the same Illumina and ONT method/equipment, which could have introduced specific equipment/preparation artefacts that were not considered in the study. See for example https://academic.oup.com/nargab/article/3/1/lqab019/6193612.

      To our knowledge, there is no evidence that sequencing on different ONT machines or barcoding kits leads to a difference in read characteristics or accuracy. To ensure consistency and minimise potential variability, we used the same ONT flowcells for all samples and performed basecalling on the same Nvidia A100 GPU. We have updated the methods to emphasise this.

      For Illumina and ONT, the exact machines and kits used for each sample have been added as supplementary table S9 We have also added a short paragraph about possible Illumina error rate differences in the ‘Limitations’ section of the Discussion.

      The third limitation is that Illumina sequencing was performed on different models: three samples on the NextSeq 500 and the rest on the NextSeq 2000. While differences in error rates exist between Illumina instruments, no specific assessment has been made between these NextSeq models [42]. However, the absolute differences in error rates are minor and unlikely to impact our study significantly. This is particularly relevant since Illumina's lower F1 score compared to ONT was due to missed calls rather than erroneous ones.

      In summary, while there may be specific equipment or preparation artifacts to consider, we took steps to minimise these effects and maintain consistency across our sequencing methods.

      Reviewer #3 (Public Review):

      Hall et al. benchmarked different variant calling methods on Nanopore reads of bacterial samples and compared the performance of Nanopore to short reads produced with Illumina sequencing. To establish a common ground for comparison, the authors first generated a variant truth set for each sample and then projected this set to the reference sequence of the sample to obtain a mutated reference. Subsequently, Hall et al. called SNPs and small indels using commonly used deep learning and conventional variant callers and compared the precision and accuracy from reads produced with simplex and duplex Nanopore sequencing to Illumina data. The authors did not investigate large structural variation, which is a major limitation of the current manuscript. It will be very interesting to see a follow-up study covering this much more challenging type of variation. 

      We fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      In their comprehensive comparison of SNPs and small indels, the authors observed superior performance of deep learning over conventional variant callers when Nanopore reads were basecalled with the most accurate (but also computationally very expensive) model, even exceeding Illumina in some cases. Not surprisingly, Nanopore underperformed compared to Illumina when basecalled with the fastest (but computationally much less demanding) method with the lowest accuracy. The authors then investigated the surprisingly higher performance of Nanopore data in some cases and identified lower recall with Illumina short read data, particularly from repetitive regions and regions with high variant density, as the driver. Combining the most accurate Nanopore basecalling method with a deep learning variant caller resulted in low error rates in homopolymer regions, similar to Illumina data. This is remarkable, as homopolymer regions are (or, were) traditionally challenging for Nanopore sequencing.

      Lastly, Hall et al. provided useful information on the required Nanopore read depth, which is surprisingly low, and the computational resources for variant calling with deep learning callers. With that, the authors established a new state-of-the-art for Nanopore-only variant, calling on bacterial sequencing data. Most likely these findings will be transferred to other organisms as well or at least provide a proof-of-concept that can be built upon.

      As the authors mention multiple times throughout the manuscript, Nanopore can provide sequencing data in nearly real-time and in remote regions, therefore opening up a ton of new possibilities, for example for infectious disease surveillance.

      However, the high-performing variant calling method as established in this study requires the computationally very expensive sup and/or duplex Nanopore basecalling, whereas the least computationally demanding method underperforms. Here, the manuscript would greatly benefit from extending the last section on computational requirements, as the authors determine the resources for the variant calling but do not cover the entire picture. This could even be misleading for less experienced researchers who want to perform bacterial sequencing at high performance but with low resources. The authors mention it in the discussion but do not make clear enough that the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required. 

      We have provided runtime benchmarks for basecalling in Supplementary Figure S23 and detailed these times in Supplementary Table S7. In addition, we state in the Results section (P9 L239-241) “Though we do note that if the person performing the variant calling has received the raw (pod5) ONT data, basecalling also needs to be accounted for, as depending on how much sequencing was done, this step can also be resource-intensive.”

      Even with super-accuracy basecalling considered, our analysis shows that variant calling remains the most resource-intensive step for Clair3, DeepVariant, FreeBayes, Medaka, and NanoCaller. Therefore, the statement “the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required”, is incorrect. However, we have made this more prominent in the Results and Discussion.

      In the results section we added the underlined section:

      “… FreeBayes had the largest runtime variation, with a maximum of 597s/Mbp, equating to 2.75 days for the same genome. In contrast, basecalling with a single GPU using the super-accuracy model required a median runtime of 0.77s/Mbp, or just over 5 minutes for a 4Mbp genome at 100x depth. …”

      In the discussion we have added the following statement:

      “Basecalling is generally faster than variant calling, assuming GPU access, which is likely considered when acquiring ONT-related equipment.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The colour choices in Figure 3 and Figure 4 c made the illustrations somewhat difficult to read. More substantially, a deeper investigation of the causes of non-homopolymeric-related mistaken indel calls would be useful. 

      We have updated Figure 3 so that each line has a different style to aid in discriminating between colours. The colour scheme for Figure 4c has also been updated.

      In terms of non-homopolymeric false positive (FP) indel calls, we did an investigation of these for Clair3 and DeepVariant on the simplex sup data as these are the two best performing variant callers and deal the best with homopolymers. For Clair3, there were eight FPs across all samples. Five of these were homopolymers. The remaining three occurred within one or two bases of another insertion which inserted a similar sequence to the FP. For DeepVariant, it was much the same story, with 8/11 FP indels being in homopolymers, and the remaining three being within one or two bases of another insertion with a similar sequence. We have added a couple of sentences to the results explaining this finding.

      Reviewer #2 (Recommendations For The Authors):

      The paper is well-written and provides evidence for the conclusions. Some issues should be addressed.

      Include a section in the Results describing species-specific observations, namely if some samples had recurrently lower SNP and INDEL F1 scores (as observed in Figure 2). 

      Please see our response in your second point in the ‘Weaknesses’ section of the public review.

      Please provide more details on how the samples were sequenced. Section "Sequencing" in the methods is confusing and not clear enough to be reproduced (provide a supplementary table/figure with the workflow for each sample). Add information about how many samples were multiplexed in each run and what was the output achieved in each.

      We have now added a Supplementary Table S9 which outlines which instruments, kits, and multiplexing strategies were used for each sample. In addition, the raw pod5 data that we make available has been segregated by sample, so knowledge of the multiplexing strategy is not necessary for someone attempting to reproduce our results.

      The authors acknowledge that structural variation was not evaluated in this manuscript. Since ONT sequencing is often used to reconstruct the sequence of plasmids for outbreak/epidemiology analysis, perhaps they could undertake this analysis on a plasmids dataset (which suffers from constant structural variation).

      As noted in our response to Reviewer 3’s public review, we fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is well organized. However, some sections are a bit long and would benefit from being more concise.

      Thank you for your valuable feedback and for acknowledging the organisation of our manuscript. We appreciate your suggestion regarding the length of certain sections. We have gone back through and made the manuscript more concise.

      Figure 1: Is the Qscore really the same as identity? Isn't the determination of identity only possible after alignment? 

      When we say Qscore we are referring to the Phred-scaled version of the read identity, which is alignment based, not the Qscores of the individual bases in the FASTQ file. We have updated the text and figure legend to make this clearer. “The Qscore is the logarithmic transformation of the read identity,  , where 𝑃 is the read identity.”. We also now explicitly state that read identity is alignment-based.

      Abbreviations/terms mentioned but not introduced: <br /> - kmers, P2L57

      - ANI, P3L93 

      We have updated the text to better introduce these terms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      My main point of concern is the precision of dissection. The authors distinguish cells isolated from the tailbud and different areas in the PSM. They suggest that the cell-autonomous timer is initiated, as cells exit the tailbud.

      This is also relevant for the comparison of single cells isolated from the embryo and cells within the embryo. The dissection will always be less precise and cells within the PSM4 region could contain tailbud cells (as also indicated in Figure 1A), while in the analysis of live imaging data cells can be selected more precisely based on their location. This could therefore contribute to the difference in noise between isolated single cells and cells in the embryo. This could also explain why there are "on average more peaks" in isolated cells (p. 6, l. 7).

      This aspect should be considered in the interpretation of the data and mentioned at least in the discussion. (It does not contradict their finding that more anterior cells oscillate less often and differentiate earlier than more posterior ones.)

      Reviewer #1 rightly points out that selecting cells in a timelapse is more precise than manual dissection. Manual dissection is inherently variable but we believe in general it is not a major source of noise in our experiments. To control for this, we compared the results of 11 manual dissections of the posterior quarter of the PSM (PSM4) with those of the pooled PSM4 data. In general, we did not see large differences in the distributions of peak number or arrest timing that would markedly increase the variability of the pooled data above that of the individual dissections (Figure 1 – supplement figure 7). We have edited the text in the Results to highlight this control experiment (page 6, lines 13-17).

      It is of course possible that we picked up adjacent TB cells when dissecting PSM4, however the reviewer’s assertion that inclusion of TB cells “could also explain why there are "on average more peaks" in isolated cells” is incorrect. Later in the paper we show that cells from the TB have almost identical distributions to PSM4 (mean ± SD, PSM4 4.36 ± 1.44; TB 4.26 ± 1.35; Figure 4 _ supplement 1). Thus, inadvertent inclusion of TB cells while dissecting would in fact not increase the number of peaks.

      Here, the authors focus on the question of how cells differentiate. The reverse question is not addressed at all. How do cells maintain their oscillatory state in the tailbud? One possibility is that cells need external signals to maintain that as indicated in Hubaud et al. 2014. In this regard, the definition of tailbud is also very vague. What is the role of neuromesodermal progenitors? The proposal that the timer is started when cells exit the tailbud is at this point a correlation and there is no functional proof, as long as we do not understand how cells maintain the tailbud state. These are points that should be considered in the discussion.

      The reviewer asks “How do cells maintain their oscillatory state in the tailbud?”. This is a very interesting question, but as recognized by the reviewer, beyond the scope of our current paper.

      We now further emphasize the point “One possibility is that cells need external signals to maintain … as indicated in Hubaud et al. 2014” in the Discussion and added a reference to the review Hubaud and Pourquié 2014 (Signalling dynamics in vertebrate segmentation. Nat Rev Mol Cell Biol 15, 709–721 (2014). https://doi.org/10.1038/nrm3891) (page 18, lines 19-22).

      To clarify the definition of the TB, we have stated more clearly in the results (page 15, lines 8-12) that we defined TB cells as all cells posterior to the notochord (minus skin) and analyzed those that survived

      >5 hours post-dissociation, did not divide, and showed transient Her1-YFP dynamics.

      The reviewer asks: What is the role of neuromesodermal progenitors? In responding to this, we refer to Attardi et al., 2018 (Neuromesodermal progenitors are a conserved source of spinal cord with divergent growth dynamics. Development. 2018 Nov 9;145(21):dev166728. doi: 10.1242/dev.166728).

      Around the stage of dissection in zebrafish in our work, there is a small remaining group of cells characterized as NMPs (Sox2 +, Tbxta+ expression) in the dorsal-posterior wall of the TB. These NMPs rarely divide and are not thought to act as a bipotential pool of progenitors for the elongating axis, as is the case in amniotes, rather contributing to the developing spinal cord. How this particular group of cells behaves in culture is unclear as we did not subdivide the TB tissue before culturing. It would be possible to specifically investigate these NMPs regarding a role in TB oscillations, but given the results of Attardi et al., 2018 (small number of cells, low bipotentiality), we argue that it would not be significant for the conclusions of the current work. To indicate this, we included a sentence and a citation of this paper in the results towards the beginning of the section on the tail bud (page 15, lines 8-12).

      The authors observe that the number of oscillations in single cells ex vivo is more variable than in the embryo. This is presumably due to synchronization between neighbouring cells via Notch signalling in the embryo. Would it be possible to add low doses of Notch inhibitor to interfere with efficient synchronization, while at the same time keeping single cell oscillations high enough to be able to quantify them?

      It is a formal possibility that Delta-Notch signaling may have some impact on the variability in the number of oscillations. However, we argue that the significant amount of cell tracking work required to carry out the suggested experiments would not be justified, considering what has been previously shown in the literature. If Delta-Notch signaling was a major factor controlling the variability of the intrinsic program that we describe, then we would expect that in Delta-Notch mutants the anterior- posterior limits of cyclic gene expression in the PSM would extend beyond those seen in wildtype embryos. Specifically, we might expect to see her1 expression extending more anteriorly in mutants to account for the dramatic increase in the number of cells that have 5, 6, 7 and 8 cycles in culture (Fig. 1E versus Fig. 1I). However, as shown in Holley et al., 2002 (Fig. 5A, B; her1 and the notch pathway function within the oscillator mechanism that regulates zebrafish somitogenesis. Development. 2002 Mar;129(5):1175-83. doi: 10.1242/dev.129.5.1175), the anterior limit of her1 expression in the PSM in DeltaD mutants (aei) is not different to WT. Thus, Delta-Notch signaling may exert a limited control over the number of oscillations, but likely not in excess of one cycle difference.

      In the same direction, it would be interesting to test if variation is decreased, when the number of isolated cells is increased, i.e. if cells are cultured in groups of 2, 3 or 4 cells, for instance.

      This is a great proposal – however the culture setup used here is a wide-field system that doesn’t allow us to accurately follow more than one cell at a time. Cells that adhere to each other tend to crawl over each other, blurring their identity in Z. This is also why we excluded dividing cells in culture from the analysis. Experiments carried out with a customized optical setup will be needed to investigate this in the future.

      It seems that the initiation of Mesp2 expression is rather reproducible and less noisy (+/- 2 oscillation cycles), while the number of oscillations varies considerably (and the number of cells continuing to oscillate after Mesp2 expression is too low to account for that). How can the authors explain this apparent discrepancy?

      The observed tight linkage of the Mesp onset and Her1 arrest argue for a single timing mechanism that is upstream of both gene expression events; indeed, this is one of the key implications of the paper. However, the infrequent dissociation of these events observed in FGF-treated cells suggests that more than one timing pathway could be involved, although there are other interpretations. We’ve added more discussion in the text on one vs multi-timers (page 17, lines 19-23 – page 18, line 1 - 8)., see next point.

      The observation that some cells continue oscillating despite the upregulation of Mesp2 should be discussed further and potential mechanism described, such as incomplete differentiation.

      This is an infrequent (5 out of 54 cells) and interesting feature of PSM4 cells in the presence of FGF. We imagine that this disassociation of clock arrest from mesp on-set timing could be the result of alterations in the thresholds in the sensing mechanisms controlling these two processes. Alternatively - as reviewer 2 argues - it might reflect multiple timing mechanisms at work. We have added a discussion of these alternative interpretations (page 17, lines 19-23 – page 18, line 1 - 8).

      Fig. 3 supplement 3 B missing

      It’s there in the BioRxiv downloadable PDF and full text – but seems to not be included when previewing the PDF. Thanks for the heads up.

      Reviewer #2 (Public Review):

      The authors demonstrate convincingly the potential of single mesodermal cells, removed from zebrafish embryos, to show cell-autonomous oscillatory signaling dynamics and differentiation. Their main conclusion is that a cell-autonomous timer operates in these cells and that additional external signals are integrated to tune cellular dynamics. Combined, this is underlying the precision required for proper embryonic segmentation, in vivo. I think this work stands out for its very thorough, quantitative, single-cell real-time imaging approach, both in vitro and also in vivo. A very significant progress and investment in method development, at the level of the imaging setup and also image analysis, was required to achieve this highly demanding task. This work provides new insight into the biology underlying embryo axis segmentation.

      The work is very well presented and accessible. I think most of the conclusions are well supported. Here a my comments and suggestions:

      The authors state that "We compare their cell-autonomous oscillatory and arrest dynamics to those we observe in the embryo at cellular resolution, finding remarkable agreement."

      I think this statement needs to be better placed in context. In absolute terms, the period of oscillations and the timing of differentiation are actually very different in vitro, compared to in vitro. While oscillations have a period of ~30 minutes in vivo, oscillations take twice as long in vitro. Likewise, while the last oscillation is seen after 143 minutes in vivo, the timing of differentiation is very significantly prolonged, i.e.more than doubled, to 373min in vitro (Supplementary Figure 1-9). I understand what the authors mean with 'remarkable agreement', but this statement is at the risk of being misleading. I think the in vitro to in vivo differences (in absolute time scales) needs to be stated more explicitly. In fact, the drastic change in absolute timescales, while preserving the relative ones, i.e. the number of oscillations a cell is showing before onset of differentiation remains relatively invariant, is a remarkable finding that I think merits more consideration (see below).

      We have changed the text in the abstract (page 1, line 28) to clarify that the agreement is in the relative slowing, intensity increases and peak numbers.

      One timer vs. many timers

      The authors show that the oscillation clock slowing down and the timing of differentiation, i.e. the time it takes to activate the gene mesp, are in principle dissociable processes. In physiological conditions, these are however linked. We are hence dealing with several processes, each controlled in time (and thereby space). Rather than suggesting the presence of ‘a timer’, I think the presence of multiple timing mechanisms would reflect the phenomenology better. I would hence suggest separating the questions more consistently, for instance into the following three:

      a.  what underlies the slowing down of oscillations?

      b.  what controls the timing of onset of differentiation?

      c.  and finally, how are these processes linked?

      Currently, these are discussed somewhat interchangeably, for instance here: “Other models posit that the slowing of Her oscillations arise due to an increase of time-delays in the negative feedback loop of the core clock circuit (Yabe, Uriu, and Takada 2023; Ay et al. 2014), suggesting that factors influencing the duration of pre-mRNA splicing, translation, or nuclear transport may be relevant. Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock.”(page 14). In the first part, the slowing down of oscillations is discussed and then the authors conclude on 'the timer', which however is the one timing differentiation, not the slowing down. I think this could be somewhat misleading.

      To help distinguish the clock’s slowing & arrest from differentiation, we have clarified the text in how we describe our experiments using her1-/-; her7-/- cells (page 10, lines 9-20).

      From this and previous studies, we learn/know that without clock oscillations, the onset of differentiation still occurs. For instance in clock mutant embryos (mouse, zebrafish), mesp onset is still occurring, albeit slightly delayed and not in a periodic but smooth progression. This timing of differentiation can occur without a clock and it is this timer the authors refer to "Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock." (page 14). This 'timer' is related to what has been previously termed 'the wavefront' in the classic Clock and Wavefront model from 1976, i.e. a "timing gradient' and smooth progression of cellular change. The experimental evidence showing it is cell-autonomous by the time it has been laid down,, using single cell measurements, is an important finding, and I would suggest to connect it more clearly to the concept of a wavefront, as per model from 1976.

      We have been explicit about the connection to the clock & wavefront in the discussion (page 17, line 12-17).

      Regarding question a., clearly, the timer for the slowing down of oscillations is operating in single cells, an important finding of this study. It is remarkable to note in this context that while the overall, absolute timescale of slowing down is entirely changed by going from in vivo to in vitro, the relative slowing down of oscillations, per cycle, is very much comparable, both in vivo and in vivo.

      We have now pointed out the relative nature of this phenomenon in the abstract, page 1, line 28.

      To me, while this study does not address the nature of this timer directly, the findings imply that the cell-autonomous timer that controls slowing down is, in fact, linked to the oscillations themselves. We have previously discussed such a timer, i.e. a 'self-referential oscillator' mechanism (in mouse embryos, see Lauschke et al., 2013) and it seems the new exciting findings shown here in zebrafish provide important additional evidence in this direction. I would suggest commenting on this potential conceptual link, especially for those readers interested to see general patterns.

      While we posit that the timer provides positional info to the clock to slow oscillations and instruct its arrest – we do not believe that “the findings imply that the cell-autonomous timer that controls slowing down is, in fact, linked to [i.e., governed by] the oscillations themselves.”. As we show, in her1-/-; her7-/- embryos lacking oscillations, the timing / positional information across the PSM still exists as read-out by Mesp expression. Is this different positional information than that used by the clock? – possibly – but given the tight linkage between Mesp onset and the timing/positioning of clock arrest, both cell-autonomously and in the embryo, we argue that the simplest explanation is that the timing/positional information used by the clock and differentiation are the same. Please see page 10, lines 9-20, as well as the discussion (page 17, lines 19-23; page 18. Lines 1-8 ).

      We agree that the timer must communicate to the clock– but this does not mean it is dependent on the clock for positional information.

      Regarding question c., i.e. how the two timing mechanisms are functionally linked, I think concluding that "Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock." (page 14), might be a bit of an oversimplification. It is correct that the timer of differentiation is operating without a clock, however, physiologically, the link to the clock (and hence the dependence of the timescale of clock slowing down), is also evident. As the author states, without clock input, the precision of when and where differentiation occurs is impacted. I would hence emphasize the need to answer question c., more clearly, not to give the impression that the timing of differentiation does not integrate the clock, which above statement could be interpreted to say.

      As far as we can tell, we do not state that “without clock input, the precision of when and where differentiation occurs is impacted”, and we certainly do not want to give this impression. In contrast, as mentioned above, the her1-/-; her7-/- mutant embryo studies indicate that the lack of a clock signal does not change the differentiation timing, i.e. it does not integrate the clock. Of course, in the formation of a real somite in the embryo, the clock’s input might be expected to cause a given cell to differentiate a little earlier or later so as to be coordinated with its neighbors, for example, along a boundary. But this magnitude of timing difference is within one clock cycle at most, and does not match the large variation seen in the cultured cells that spans over many clock cycles.

      A very interesting finding presented here is that in some rare examples, the arrest of oscillations and onset of differentiation (i.e. mesp) can become dissociated. Again, this shows we deal here with interacting, but independent modules. Just as a comment, there is an interesting medaka mutant, called doppelkorn (Elmasri et al. 2004), which shows a reminiscent phenotype "the Medaka dpk mutant shows an expansion of the her7 expression domain, with apparently normal mesp expression levels in the anterior PSM.". The authors might want to refer to this potential in vivo analogue to their single cell phenotype.

      Thank you, we had forgotten this result. Although we do not agree that this result necessarily means there are two interacting modules, we have included a citation to the paper, along with a discussion of alternative explanations for the dissociation (page 18, lines 2-14).

      One strength of the presented in vitro system is that it enables precise control and experimental perturbations. A very informative set of experiments would be to test the dependence of the cell-autonomous timing mechanisms (plural) seen in isolated cells on ongoing signalling cues, for instance via Fgf and Wnt signaling. The inhibition of these pathways with well-characterised inhibitors, in single cells, would provide important additional insight into the nature of the timing mechanisms, their dependence on signaling and potentially even into how these timers are functionally interdependent.

      We agree and in future experiments we are taking advantage of this in vitro system to directly investigate the effect of signaling cues on the intrinsic timing mechanism.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to identify potential biomarkers for acute myocardial infarction (AMI) through blood metabolomics and fecal microbiome analysis. They found that long chain fatty acids (LCFAs) could serve as biomarkers for AMI and demonstrated a correlation between LCFAs and the gut microbiome. Additionally, in silico molecular docking and in vitro thrombogenic assays showed that these LCFAs can induce platelet aggregation.

      Strengths:

      The study utilized a comprehensive approach combining blood metabolomics and fecal microbiome analysis.

      The findings suggest a novel use of LCFAs as biomarkers for AMI.

      The correlation between LCFAs and the gut microbiome is a significant contribution to understanding the interplay between gut health and heart disease.

      The use of in silico and in vitro assays provides mechanistic insights into how LCFAs may influence platelet aggregation.

      Weaknesses:

      The evidence is incomplete as it does not definitively prove that gut dysbiosis contributes to fatty acid dysmetabolism.

      We appreciate this reviewer’s insightful comment regarding the causal relationship between gut dysbiosis and fatty acid dysmetabolism. We acknowledge that our study primarily demonstrates a strong association rather than causation. While establishing causality was beyond the scope of the current study, we recognize the importance of addressing this point. In our revised manuscript, we will emphasize the observational nature of our findings and discuss the need for future research, including longitudinal studies and interventional trials, to explore the causal links between gut dysbiosis and fatty acid dysmetabolism. We believe that this clarification strengthens the interpretation of our results and aligns with the reviewer's concern.

      The study primarily shows an association between the gut microbiome and fatty acid metabolism without establishing causation.

      We agree with the reviewer that our study presents an association rather than definitive proof of causation between the gut microbiome and fatty acid metabolism. To address this, we plan to expand the discussion section to more clearly outline the limitations of our study in establishing causality. We will also propose future research directions, such as the use of animal models and longitudinal human studies, which could help elucidate the causal pathways. By clarifying this aspect, we aim to provide a more balanced perspective on our findings.

      Reviewer #2 (Public Review):

      Summary:

      Fan et al. investigated the relationship between early acute myocardial infarction (eAMI) and disturbances in the gut microbiome using metabolomics and metagenomics analyses. They studied 30 eAMI patients and 26 healthy controls, finding elevated levels of long-chain fatty acids (LCFA) in the plasma of eAMI patients.

      Strengths:

      The research attributed a substantial portion of LCFA variance in eAMI to changes in the gut microbiome, as indicated by omics analyses. Computational profiling of gut bacteria suggested structural variations linked to LCFA variance. The authors also conducted molecular docking simulations and platelet assays, revealing that eAMI-associated LCFAs may enhance platelet aggregation.

      Weaknesses:

      The results should be validated using different assays, and animal models should be considered to explore the mechanisms of action.

      We appreciate the reviewer’s suggestion to validate our findings using additional assays and animal models. We agree that further validation is crucial to confirm the robustness of our results and to explore the underlying mechanisms in greater detail. While our current study focused on human subjects and in vitro assays to establish initial findings, we acknowledge that additional experimental approaches are necessary. In the revised manuscript, we plan to include a discussion on the potential use of different assays (e.g., advanced metabolomics techniques, multi-omics integration) and animal models to validate and expand upon our findings. Moreover, we are planning to undertake these experiments in future studies to build upon the foundational work presented here.

      We believe that our revised responses and the planned manuscript revisions will address the reviewers’ concerns effectively. We are confident that these changes will enhance the overall contribution of our study to the field. Thank you again for your valuable feedback.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Joint Public Review: 

      The molecular mechanisms that mediate the regulated exocytosis of neuropeptides and neurotrophins from neurons via large dense-core vesicles (LDCVs) are still incompletely understood. Motivated by their earlier discovery that the Rab3-RIM1 pathway is essential for neuronal LDCV exocytosis, the authors now examined the role of the Rab3 effector Rabphilin-3A in neuronal LDCV secretion. Based on multiple live and confocal imaging approaches, the authors provide evidence for a synaptic enrichment of Rabphilin-3A and for independent trafficking of Rabphilin-3A and LDCVs. Using an elegant NPY-pHluorin imaging approach, they show that genetic deletion of Rabphilin-3A causes an increase in electrically triggered LDCV fusion events and increased neurite length. Finally, knock-out-replacement studies, involving Rabphilin-3A mutants deficient in either Rab3- or SNAP25-binding, indicate that the synaptic enrichment of Rabphilin-3A depends on its Rab3 binding ability, while its ability to bind to SNAP25 is required for its effects on LDCV secretion and neurite development. The authors conclude that Rabphilin-3A negatively regulates LDCV exocytosis and propose that this mechanism also affects neurite growth, e.g. by limiting neurotrophin secretion. These are important findings that advance our mechanistic understanding of neuronal large dense-core vesicle (LDCV) secretion. 

      The major strengths of the present paper are: 

      (i) The use of a powerful Rabphilin-3A KO mouse model. 

      (ii) Stringent lentiviral expression and rescue approaches as a strong genetic foundation of the study. 

      (iii) An elegant FRAP imaging approach. 

      (iv) A cutting-edge NPY-pHluorin-based imaging approach to detect LDCV fusion events. 

      We thank the reviewers for their positive evaluation of our manuscript.

      Weaknesses that somewhat limit the convincingness of the evidence provided and the corresponding conclusions include the following: 

      (i) The limited resolution of the various imaging approaches introduces ambiguity to several parameters (e.g. LDCV counts, definition of synaptic localization, Rabphilin-3A-LDCV colocalization, subcellular and subsynaptic localization of expressed proteins, AZ proximity of Rabphilin-3A and LDCVs) and thereby limits the reliability of corresponding conclusions. Super-resolution approaches may be required here. 

      We thank the reviewer for their constructive suggestion. We fully agree that super-resolution imaging would produce a more precise localization of RPH3A and co-localization with DCVs. We have now repeated our (co)-localization experiments with STED microscopy. We find that RPH3A colocalized with the pre-synaptic marker Synapsin1 and, to a lesser extent, with the post synaptic marker Homer and DCV marker chromogranin B (new Figure 1). This indicates that RPH3A is highly enriched in synapses, mostly the pre-synapse, and that RPH3A partly co-localizes with DCVs.  

      (ii) The description of the experimental approaches lacks detail in several places, thus complicating a stringent assessment. 

      We apologize for the lack of detail in explaining the experimental approaches. We have included a more detailed description in the revised manuscript. 

      (iii) Further analyses of the LDCV secretion data (e.g. latency, release time course) would be important in order to help pinpoint the secretory step affected by Rabphilin-3A. 

      We agree. To address this comment, we have now included the duration of the fusion events (new Figure S2D-F). The start time of the fusion events are shown in the cumulative plots in now Figure 3F and I. The kinetics are normal in the RPH3A KO neurons.

      (iv) It remains unclear why a process that affects a general synaptic SNARE fusion protein - SNAP25 - would specifically affect LDCV but not synaptic vesicle fusion. 

      We agree that we have not addressed this issue systematically enough in the original manuscript. We have now added a short discussion on this topic in the Discussion of the revised manuscript (p 15, line 380-386). In brief, we do not claim full selectivity for the DCV pathway. Some effects of RPH3A deficiency on the synaptic vesicle cycle have been observed. Furthermore, because DCVs typically do not mix in the synaptic vesicle cluster and fuse outside the active zone (and outside the synapse), DCVs might be more accessible to RPH3A regulation.

      (v) The mechanistic links between Rabphilin-3A function, LDCV density in neurites, neurite outgrowth, and the proposed underlying mechanisms involving trophic factor release remain unclear. 

      We agree that we have not addressed all these links systematically enough in the original manuscript, although we feel that we have at least postulated the best possible working model to link RPH3A function to DCV exocytosis/neurotrophic factor release and neurite outgrowth (p 15-16, line 396-400). Of course, a single study cannot support all these links with sufficient experimental evidence. We have now added a short text on what we can conclude exactly based on our experiments and how we see the links between RPH3A function, DCV exocytosis/neurotrophic factor release, neurite outgrowth and DCV density in neurites (p 13-14, line 317-325).

      Reviewer #1 (Public Review): 

      Summary:

      The manuscript by Hoogstraaten et al. investigates the effect of constitutive Rabphilin 3A (RPH3A) ko on the exocytosis of dense core vesicles (DCV) in cultured mouse hippocampal neurons. Using mCherry- or pHluorin-tagged NPY expression and EGFP- or mCherry tagged RPHA3, the authors first analyse the colocalization of DCVs and RPH3A. Using FRAP, the authors next analyse the mobility of DCVs and RAB3A in neurites. The authors go on to determine the number of exocytotic events of DCVs in response to high-frequency electrical stimulation and find that RPH3A ko increases the number of exocytotic events by a factor 2-3, but not the fraction of released DCVs in a given cell (8x 50Hz stim). In contrast, the release fraction is also increased in RBP3A KOs when doubling the stimulation number (16x 50Hz). They further observe that RPH3A ko increases dendrite and axon length and the overall number of ChgrB-positive DCVs. However, the overall number of DCVs and dendritic length in ko cells directly correlate, indicating that the number of vesicles per dendritic length remains unaffected in the RPH3A KOs. Lentiviral co-expression of tetanus toxin (TeNT) showed a non-significant trend to reduce axon and dendrite length in RPH3a KOs. Finally, the authors use co-expression of RAB3A and SNAP25 constructs to show that RAB3A but not SNAP25 interaction is required to allow the exocytosis-enhancing effect in RPH3A KOs. 

      While the authors' methodology is sound, the microscopy results are performed well and analyzed appropriately, but their results in larger parts do not sufficiently support their conclusions. Moreover, the experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims. 

      Overall, I thus feel that the manuscript does not provide a sufficient advance in knowledge. 

      Strengths: 

      - The authors' methodology is sound, and the microscopy results are performed well and analyzed appropriately. 

      - Figure 2: The exocytosis imaging is elegant and potentially very insightful. The effect in the RPH3A KOs is convincing. 

      - Figure 4: the logic of this experiment is elegant. It shows that the increased number of DCV fusion events in RPH3A KOs is related to the interaction of RPH3A with RAB3A but not with SNAP25. 

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses: 

      - The results in larger parts do not sufficiently support the conclusions. 

      - The experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims. 

      - Not of sufficient advance in knowledge for this journal 

      - The significance of differences in control experiments WT vs. KO) varies between experiments shown in different figures. 

      - Axons and dendrites were not analyzed separately in Figures 1 and 2. 

      - The colocalization study in Figure 1 would require super-resolution microscopy. 

      To address the reviewers’ comments, we have provided a more detailed explanation of our analysis (p 19-20, line 521-542). In addition, we have repeated our colocalization experiments using STED microscopy, see Joint Public Review item (i).  

      Reviewer #2 (Public Review): 

      Summary: 

      Hoogstraaten et al investigated the involvement of rabphilin-3A RPH3A in DCV fusion in neurons during calcium-triggered exocytosis at the synapse and during neurite elongation. They suggest that RPH3A acts as an inhibitory factor for LDV fusion and this is mediated partially via its interaction with SNAP25 and not Rab3A/Rab27. It is a very elegant study although several questions remain to be clarified. 

      Strengths: 

      The authors use state-of-the-art techniques like tracking NPY-PHluorin exocytosis and FRAP experiments to quantify these processes providing novel insight into LDCs exocytosis and the involvement of RPH3A. 

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses: 

      At the current state of the manuscript, further supportive experiments are necessary to fully support the authors' conclusions. 

      We thank the reviewer for their comments and suggestions. We have performed additional experiments to support our conclusions, see Joint Public Review items (i) – (iv)

      Reviewer #3 (Public Review): 

      Summary: 

      The molecular mechanism of regulated exocytosis has been extensively studied in the context of synaptic transmission. However, in addition to neurotransmitters, neurons also secrete neuropeptides and neurotrophins, which are stored in dense core vesicles (DCVs). These factors play a crucial role in cell survival, growth, and shaping the excitability of neurons. The mechanism of release for DCVs is similar, but not identical, to that used for SV exocytosis. This results in slow kinetic and low release probabilities for DCV compared to SV exocytosis. There is a limited understanding of the molecular mechanisms that underlie these differences. By investigating the role of rabphilin-3A (RPH3A), Hoogstraaten et al. uncovered for the first time a protein that inhibits DCV exocytosis in neurons. 

      Strengths: 

      In the current work, Hoogstraaten et al. investigate the function of rabphilin-3A (RPH3A) in DVC exocytosis. This RAB3 effector protein has been shown to possess a Ca2+ binding site and an independent SNAP25 binding site. Using colocalization analysis of confocal imaging the authors show that in hippocampal neurons RPH3A is enriched at pre- and post-synaptic sites and associates specifically with immobile DCVs. Using site-specific RPH3A mutants they found that the synaptic location was due to its RAB3 interaction site. They further could show that RPH3A inhibits DCV exocytosis due to its interaction with SNAP25. They came to that conclusion by comparing NPY-pHluorin release in WT and RPH3A KO cells and by performing rescue experiments with RPH3A mutants. Finally, the authors showed that by inhibiting stimulated DCV release, RPH3A controlled the axon and dendrite length possibly through the reduced release of neurotrophins. Thereby, they pinpoint how the proper regulation of DCV exocytosis affects neuron physiology. 

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses: 

      Data context 

      One of the findings is that RPH3A accumulates at synapses and is mainly associated with immobile DCVs.

      However, Farina et al. (2015) showed that 66% of all DCVs are secreted at synapses and that these DCVs are immobile prior to secretion. To provide additional context to the data, it would be valuable to determine if RPH3A KO specifically enhances secretion at synapses. Additionally, the authors propose that RPH3A decreases DCV exocytosis by sequestering SNAP25 availability. At first glance, this hypothesis appears suitable. However, due to RPH3A synaptic localization, it should also limit SV exocytosis, which it does not. In this context, the only explanation for RPH3A's specific inhibition of DCV exocytosis is that RPH3A is located at a synapse site remote from the active zone, thus protecting the pool of SNAP25 involved in SV exocytosis from binding to RPH3A. This hypothesis could be tested using super-resolution microscopy. 

      We thank the reviewer for their suggestion. We have now performed super resolution microscopy, see Joint Public Review item (i). However, these new data do not necessarily explain the stronger effect of RP3A deficiency on DCV exocytosis, relative to SV exocytosis. We have added a short discussion on this topic to the revised manuscript, see Joint Public Review item (iv).

      Technical weakness 

      One technical weakness of this work consists in the proper counting of labeled DCVs. This is significant since most findings in this manuscript rely on this analysis. Since the data was acquired with epi-fluorescence or confocal microscopy, it doesn't provide the resolution to visualize individual DCVs when they are clumped. The authors use a proxy to count the number of DCVs by measuring the total fluorescence of individual large spots and dividing it by the fluorescence intensity of discrete spots assuming that these correspond to individual DCVs. This is an appropriate method but it heavily depends on the assumption that all DCVs are loaded with the same amount of NPY-pHluorin or chromogranin B (ChgB). Due to the importance of this analysis for this manuscript, I suggest that the authors show that the number of DCVs per µm2 is indeed affected by RPH3A KO using super-resolution techniques such as dSTORM, STED, SIM, or SRRF. 

      The reviewer is correct that this is a crucial issue, that we have not addressed optimally until now. We have previously devoted a large part of a previous manuscript to this issue, but have not referred to this previous work clearly enough. We have now clarified this (p 7, line 187-190). In brief, we have previously quantified the ratio between fluorescent intensity of ChgB and NPY-pHluorin in confocal microscopy over the number of dSTORM puncta in sparse areas of WT mouse hippocampal neurons (Persoon et al., 2018). This quantification yielded a unitary fluorescence intensity per vesicle that was very stable of different neurons. Although there might be some underestimation of the total number of DCVs when using confocal microscopy, the study of Persoon et al. (2018) has demonstrated that these parameters correlate well and that the estimations are accurate. Considering that the rF/F0 is similar in RPH3A WT and KO neurons (now Figure S2I), meaning that the intensity of NPY-pHluorin of one fusion event is comparable, we can presume that this correlation also applies for the RPH3A KO neurons.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) The authors perform an extensive analysis regarding the colocalization of RPH3A and DCVs (Figure 1 upper part). This analysis is hampered by the fact that the recorded data has in relation to vesicle size limited resolution (> 1 µm) to allow making strong claims here. In my view, super-resolution microscopy would be required for the co-localization studies shown in Figure 1. 

      We fully agree and have now performed super-resolution microscopy, see Joint Public Review item (i)

      (2) The FRAP experiments (Figure 1 lower part) cannot be sufficiently understood from what is presented. The methods say that both laser channels were activated during bleaching but NPY-pHluorin is not bleached in Fig.1E. Explanation of the bleaching is not very circumspect. In 1D, it is rather EGFP-RPH3A that is entering the bleached area than the NPY vesicles. These experiments require a more careful explanation of methodology, observed results, and their interpretation. Overall, the observed effects in the original kymograph traces require a better explanation. 

      We acknowledge that NPY-pHluorin in Figure 1E (now Figure 2C) is not completely bleached. NPY-pHluorin appeared to be more difficult to bleach than NPY-mCherry. However, it is important to clarify that we merely bleached the neurites to remove the stationary puncta and facilitate our analysis of DCV/RPH3A dynamics. This bleaching step does not affect the interpretation of our results. We apologize that this was not clearly stated in the text and have made the necessary adjustments in legend, results- and methods section, (p 6-7, line 162-163; p 5, line 140-142 and p 19, line 508-513). Additionally, we apologize for the accidental switch of the kymographs for NPY-mCherry and EGFP-RPH3A in Figure 1D (now Figure 2B, C). We greatly appreciate identifying this error.  

      (3) Figure 1: The authors need to mention whether axons, dendrites, or both were analyzed throughout the different panels and how they were identified. Is it possible that axons were wrapping around dendrites in their cultures (compare e.g. Shimojo et al., 2015)? Given the limited spatial resolution and because of this wrapping, interpretation of results could be affected. 

      We completely agree with the reviewer’s assessment and conclusion. We are unable to distinguish axons from dendrites using this experimental design. We have made sure to specify in the text that our observation that RPH3A does not co-travel with DCVs is true for both dendrites and axons, (p 5, line 150).

      (4) Figure 2: The exocytosis imaging is elegant and potentially very insightful. The effect in the RPH3A KOs is convincing. However, the authors determine the efficacy of exocytosis from NPY-pHluorin unquenching of DCVs only. This is only one of several possible parameters to read out the efficiency of exocytosis. Kinetics like e.g. delay between stimulation and start of exocytosis events or release time course of NPY after DCV fusion were not determined. Such analysis could give a better insight into what process before or after the fusion of DCVs is affected by RPH3A ko. 

      We fully agree with the reviewer. We have now included the duration of the fusion events (new Figure S2D-F). The start time of the fusion events are shown in the cumulative plots in now Figure 3F and I. The kinetics are normal in the RPH3A KO neurons.

      Moreover, it needs to be mentioned whether 2C and D are from WT or ko cultures. It would be best to show representative examples from both genotypes. 

      We have now adjusted this in the new figure (now Figure 3C, D).

      The number of fusion events is much increased but the release fraction is not significantly changed. While this is consistent with results in Figure 4C it is at variance with 4F. This raises questions about the reliability of the effects in RPH3A KOs. 

      The release fraction indicates the number of fusion events normalized to the total DCV pool. In Figure 4D, we observed a slightly bigger pool size, which explains the lack of significance when analyzing the released fraction. In Figure 4G, however, DCV pool sizes are similar between KO and WT, leading to a statistically significant effect on release fraction in KO neurons. Furthermore, Figures 4B and E distinctly show a substantial increase in fusion events in RPH3A KO neurons. This variability in pool size observed could potentially be attributed to variation in culture or inherent biological variability.

      Given the increased number of ChgrB-positive DCVs in RPH3A KOs (shown in Figure 2) and that only the cumulative number of exocytosis events were analysed, how can the authors exclude that the RPH3A ko only affects vesicle number but not release, if the % change in released vesicles is not different to WT? Kinetics of release don't seem to be affected. Importantly, what was the density of NPY-pHluorin vesicles in WT vs. ko? 

      In Figure 2 (now Figure 5) we show that RPH3A KO neurons are larger and contain more endogenous ChgB+ puncta than WT neurons. This increased number of ChgrB+ puncta scales with their size as puncta density is not increased. A previous study (Persoon et al., 2018) has demonstrated a strong correlation between DCV number and neuron size. Our data show that RPH3A deficiency increased DCV exocytosis, but the released fraction of vesicles depends on the total number of DCVs, which we determined during live recording by dequenching NPY-pHluorin using NH4+. Considering that this is an overexpression of a heterologous DCV-fusion reporter, and not endogenous staining of DCVs, as in the case of ChgrB+ puncta, some variability is not unexpected.

      Also in these experiments, the question arises of whether the authors analyse axons, dendrites, or both throughout the different panels and how they were identified. 

      In our experimental design we record all fusion events per cell, including both axons and dendrites but excluding the cell soma. We have clarified this in the method section, (p 19, line 508 and p 19, line 521-522).

      (5) Figure 3: in D the authors show that ChgrB-pos. DCV density is slightly increased in KOs. How does this relate to the density of NPY-pHluorin DCVS in Figure 2? 

      We do not observe a difference in NPY-pHluorin density (see Author response image 1). However, it is important to note that we relied on tracing neurites in live recording images to determine the neuronal size. In contrast, the ChgB density was based on dendritic length using MAP2 (post-hoc) staining was limited. In addition, Chgr+ puncta represent an endogenous DCV staining, NPY-pHluorin quantification is based on overexpression of a heterologous DCV-fusion reporter. These two factors likely contribute some variability.

      Author response image 1.

      The authors show a non-significant trend of TeNT coexpression to reduce axon and dendrite lengths in RPH3A KOs. While this trend is visible, I think one cannot draw conclusions from that when not reaching significance. The argument of the authors that the increased axon and dendrite lengths are created by growth factor peptide release from DCV during culture time is interesting. However, the fact that TeNT expression shows a trend toward reducing this effect on axons/dendrites is not sufficient to prove the release of such growth factors. 

      We agree. We have toned down this speculation in the revised manuscript, (p 15-16, line 395-400).

      Lastly, the authors don't provide insight into the mechanisms, of how RPH3A ko increases the number of DCVs per µm dendritic length in the neurons. In my view, there are too many loose ends in this story of how RPH3A ko first increases spontaneous release of DCVs and then enhances neurite growth and DCV density. Did the authors e.g. measure the spontaneous release of DCVs in their cultures? 

      We measured spontaneous release of DCVs during the 30s baseline recording prior to stimulation. We observed no difference in spontaneous release between WT and KO neurons (now Figure S2H). However, baseline recording lasted only 30 seconds. It is possible that this was too short to detect subtle effects.

      Other points: 

      (1) Figure 4: the logic of this experiment is elegant. It shows that the increased number of DCV fusion events in RPH3A KOs is related to the interaction of RPH3A with RAB3A but not with SNAP25. As mentioned above, it is irritating that the reduction of fusion events in KOs and on the release fraction is sometimes reaching significance, but sometimes it does not. Likewise, the absence of significant effects on DCV numbers is not consistent with the results shown in Figures 3C and D. 

      DCV numbers in Figure 3 (now Figure 5) are determined by staining for endogenous ChgB, whereas in Figure 4D and G DCV numbers are determined by overexpressing NPY-pHluorin and counting the dequenched puncta following a NH4+ puff.

      (2) Figure 1B: truncation of the y-axis needs to be clearly indicated. 

      We have replaced this figure with new Figure 1 and have indicated truncations of the y-axis when needed (new Figure 1E). 

      (3) Page 10: "Given that neuropeptides are key modulators of adult neurogenesis (Mu et al., 2010), and that RPH3A depletion leads to increased DCV exocytosis, it is coherent that we observed longer neurites in RPH3A KO neurons." I cannot follow the argument of the authors here: what has neurogenesis to do with neurite length? 

      We apologize for the confusion. We have clarified this in the revised text, (p 16, line 398-400).

      Minor point: 

      There are some typos in the manuscript. e.g., page 8: "... may partially dependent on regulated secretion...); page 6: "...to dequence all...". 

      Thank you for noticing, we have corrected the typos.

      Reviewer #2 (Recommendations For The Authors): 

      (1) Supplementary Figure S1A, in my opinion, should be in Figure 1A as it illustrates all the constructs used in this study and helps the reader to follow it up. 

      We thank the reviewer for their suggestion. However, we feel that with the adjustments we have made in Figure 1, the illustrations of the constructs fit better in Figure S1, since new Figure 1 shows the localization of endogenous RPH3A and not that of the constructs.  

      (2) One of the conclusions of the manuscript is the synaptic localization of the different RPH3A mutants. The threshold for defining synaptic localization is not clear either from the images nor from the analysis: for example, the Menders coefficient for VGut1-Syn1 which is used as a positive control, ranges from 0.65-0.95 and that of RPH3A and Syn1 ranges from 0.5-0.95. These values should be compared to all mutants and the conclusions should be based on such comparison. 

      We agree. We have now repeated our initial co-localization experiment with all the RPH3A mutants (now Figure S1D-F).  

      (3) Strengthening this figure with STED/SIM/dSTORM microscopy can verify and add a new understanding of the subtle changes of RPH3A localization. 

      We fully agree and have now added super-resolution microscopy data, see Joint Public Review item (i).

      (4) As RAB3A/RAB27A (ΔRAB3A/RAB27A) loses the punctate distribution, please clarify how can it function at the synapse and not act as a KO. Is it sorted to the synapse and how does it is sorted to the synapse? 

      We used lentiviral delivery to introduce our constructs, resulting in the overexpression of ΔRAB3A/RAB27A mutant RPH3A. This overexpression likely compensates for the loss of the punctate distribution of RPH3A, thereby maintaining its limiting effect on DCV exocytosis. It is plausible that under physiological conditions, the mislocalization of RPH3A would lead to increased exocytosis, similar to what we observed in the KO. 

      (5) Is RPH3A expressed in both excitatory and inhibitory neurons? 

      We agree this is an important question. Single cell RNA-seq already suggests the protein is expressed in both, but we nevertheless decided to test expression of RPH3A protein in excitatory and inhibitory neurons, using immunocytochemistry with VGAT and VGLUT as markers in hippocampal and striatal WT neurons. We found that RPH3A is expressed in both VGLUT+ hippocampal neurons and VGAT+ striatal neurons (new Figure S1A, B).  

      (6) The differential use of ChgB and NPY as markers for DCVs should be clarified and compared as these are used at different stages of the manuscript. 

      We have previously addressed the comparison between ChgB and NPY-pHluorin (Persoon et al., 2018). We made sure to indicate this more clearly throughout the manuscript to clarify the use of the two markers. 

      (7) FRAP experiments- A graph describing NPY recovery should be added as a reference to 2H and discussed. 

      We agree. We have made the necessary adjustments (new Figure 2G).

      (8) Figure 2E shows some degree of "facilitation" between the 2 8x50 pulses RPH3A KO neurons. Can the author comment on that? What was the reason for using this dual stimulation protocol? 

      There is indeed some facilitation between the two 8 x 50 pulses in KO neurons and to a lesser extent also in the WT neurons, which we have observed before in WT neurons (Baginska et al., 2023). Baginska et al. (2023) showed recently that different stimulation protocols can influence certain fusion dynamics, like the ratio of persistent and transient events and event duration. We used two different stimulation protocols to thoroughly investigate the effect of RPH3A on exocytosis, and assess the robustness of our findings regarding the number of fusion events. Fusion kinetics was similar in WT an KO neurons for both stimulation protocols (new Figure 2D-F).

      (9) Figure 3 quantifies dendrites length and then moves to quantify both axon and dendrites for the Tetanus toxin experiment. What are the effects of KO on axon length? In the main figures, it is not mentioned but in S3 it seems not to be affected. How does it reconcile with the main conclusion on neurite length? 

      Figure 3H (now Figure 6C) shows the effect of the KO on axon length: the axon length is increased in RPH3A KO neurons compared to WT, similar to dendrite length. Re-expressing RPH3A in KO neurons rescues axonal length to WT levels. In Figure S3, we observe a similar trend as in main Figure 3 (new Figure 6), yet this effect did not reach significance. Based on this, we concluded that neurite length is increased upon RPH3A depletion.

      (10) For lay readers, please explain the total pool and how you measured it. However, see the next comment. 

      We agree. We have now defined this better in the revised manuscript, (p 19, line 524-527 and p 20, line 535-539).

      (11) It is a bit hard to understand if the total number of DCV was increased in the KO and if the pool size was increased and in which figure it is quantified. Some sentences like: "A trend towards a larger intracellular DCV pool in KO compared to WT neurons was observed" do not fit with "No difference in DCV pool size was observed between WT and KO neurons (Figure S2D)" or with "During stronger stimulation (16 bursts of 50 APs at 50 Hz), the total fusion and released fraction of DCVs were increased in KO neurons compared to WT". They are not directly supported, or not related to specific figures. Please indicate if the total DCVs pool, as measured by NH4, was increased and based on that, the fraction of the releasable DCVs following the long stimulation. From Figure 2H, the conclusion is an increase in fusion events. In general, NH4 is not quantified clearly- is it quantified in Figure S2C? And if it is a trend, how can it become significant in Figure 3? 

      We agree there has been some inconsistency in the way we describe the data on the total number of DCVs. We have addressed this in the revised text to ensure better clarity. The total DCV pool measured by NPY-pHluorin was not significantly increased in KO neurons, we see a trend towards a bigger DCV pool in the 2x8 50 Hz stimulation paradigm (now Figure S2C), therefore the released fraction of vesicles is not increased in Figure 1G (now Figure 3G). The number of DCV in Figure 3 (now Figure 5) is based on endogenous ChgB staining and not overexpression like the DCV pool measured by NPY-pHluorin. In Figure 3 (now Figure 5) we show that RPH3A KO neurons have slightly more ChgB+ puncta compared to WT.

      (12) In Figure 3, the quantification is not clear, discrete puncta are not visible but rather a smear of chromogranin staining. How was it quantified? An independent method to count DCV number, size, and distribution like EM is necessary to support and add further understanding. 

      We acknowledge that discrete ChgB puncta are not completely visible in Figure 3 (now Figure 5). Besides the inherent limitation in resolution with confocal imaging, we believe that this is due to ChgB accumulation in the KO neurons, as shown in now Figure 5D. Nonetheless, to address this concern of the reviewer, we have selected other images that represent our dataset (now Figure 5A). Furthermore, the number of ChgB+ DCVs was calculated using SynD software (Schmitz et al., 2011; van de Bospoort et al., 2012) (see previous reply). EM would offer valuable independent confirmation on the total DCV number, size and distribution. However, with the current method we already know that vesicle numbers are at least similar. Does that justify the (major) investment in a quantitative EM study? Moreover, this issue does not affect the central message of the current study.

      (13) Can the author discuss if the source of DCVs that are released at the synapse is similar or different from the source of DCVs fused while neurites elongate? 

      With our current experimental design, we are unable to draw conclusions regarding this aspect. We are not sure how experiments to identify this source (probably the Golgi?) would be crucial to sustain the central message of our study.

      (14) An interesting and related question: what are the expression levels of RPH3A during development and neuronal growth during the nervous system development? 

      While we have not specifically examined the expression levels of RPH3A over development, public databases show that RPH3A expression increases over time in mice, consistent with other synaptic proteins (Blake et al., 2021; Baldarelli et al., 2021; Krupke et al., 2017). We have now added this to the revised manuscript (p 2, line 55-56).

      (15) The conclusion from Figure 4 about the contribution of SNAP25 interaction to RPH3A inhibitory effect is not convincing. The data are scattered and in many neurons, high levels of fusion events were detected. Further or independent experiments are needed to support this conclusion. For example, is the interaction with SNAP25 important for its inhibitory activity in other DCV-releasing systems like adrenal medulla chromaffin cells? 

      We agree that further studies in other DCV-releasing systems like chromaffin cells would provide valuable insight into the role of SNAP25 interaction in RPH3A’s inhibitory effect on exocytosis. However, we believe that starting new series of experiments in another model system is outside of the scope of our current study.

      (16) Furthermore, the number of DCVs in the KO is similar in this experiment, raising some more questions about the quantification of the number of vesicles, that differ, in different sections of the manuscript (points # 10,11). 

      The total DCV pool in the fusion experiments is measured by overexpression NPY-pHluorin, this cannot be directly compared to the number of endogenous ChgB+ DCV in Figure 3 (now Figure 5), see also item (11)

      (17) The statement - "RPH3A is the only negative regulator of DCV" is not completely accurate as other DCV inhibitors like tomosyn were described before. 

      We agree. By this statement, we intend to convey that RPH3A is the only negative regulator of DCVs without substantial impact on synaptic vesicle exocytosis, unlike Tomosyns. We have clarified this in the revised text, (p 15, line 366-367).

      (18) The support for the effect of KO on the "clustering of DCVs" is not convincing. 

      The intensity of endogenous ChgB puncta was decreased in RPH3A KO neurons (now Figure 5E). However, the peak intensity induced by single NPY-pHluorin labeled DCV fusion events (quanta) was unchanged (now Figure S2I). This indicates that the decrease in ChgB puncta intensity must be due to a reduced number of DCVs (quanta) in this specific location. We have interpreted that as ‘clustering’, or maybe ‘accumulation’. However, we only put forward this possibility. We are now more careful in our speculations within the text, (p 11 line 271-277).

      (19) Final sentence: "where RPH3A binds available SNAP25, consequently restricting the assembly of SNARE complexes" should be either demonstrated or rephrased as no effect of trans or general SNARE complex formation is shown. 

      We agree. We have made the necessary adjustments in the text, (p 15, line 387-389).   

      (20) A scheme summarizing RPH3A's interaction with synaptic proteins and its effects on DCVs release, maybe even versus its effects on SVs release, should be considered as a figure or graphic abstract. 

      We have included a working model in Figure 7.  

      (21) Figure 4 logically should come after Figure 2 to summarize the fusion-related chapter before moving to neurite elongation. 

      We have placed Figure 4 after Figure 2 (now Figure 3).

      Reviewer #3 (Recommendations For The Authors): 

      One important finding of this study is that RPH3A downregulates neuron size, possibly by inhibiting DCV release. Additionally, the authors demonstrated that the number of DCVs is directly proportional to the number of DCVs per µm2, and that RPH3A KO reduces DCV clustering. This conclusion was drawn by comparing ChgB with NPY-pHluorin loading of the DCVs. However, this comparison is not valid as ChgB is expressed at an endogenous level and NPY-pHluorin is over-expressed. In the KO situation where DCV exocytosis is enhanced, the available endogenous ChgB may be depleted faster than the overexpressed NPY-pHluorin. Hoogstraaten et al. should either perform a study in which ChgB is overexpressed to test whether the difference in DCV remains or at least provides an alternative interpretation of their data. 

      We thank the reviewer for this comment. The reviewer challenges one or two conclusions in our original manuscript (It is not entirely clear to what exactly “This conclusion” refers): (a) “the number of DCVs is directly proportional to the number of DCVs per µm2”, and (b) “that RPH3A KO reduces DCV clustering”. The reviewer probably means that the number of DCVs per neuron is directly proportional to size of the neuron (a) and states this (these) conclusion(s) are “not valid as ChgB is expressed at an endogenous level and NPY-pHluorin is over-expressed” because “endogenous ChgB may be depleted faster than the overexpressed NPY-pHluorin”. We have three arguments to conclude that faster depletion of ChgB cannot affect these two conclusions: (1) DCVs bud off from the Golgi with newly synthesized (fresh) ChgB. Whether or not a larger fraction of DCVs is released does not influence this initial ChgB loading into DCVs (together with over-expressed NPY-pHluorin); (2) in hippocampal neurons merely 1-6% of the total DCV pool undergoes exocytosis (the current study and also extensively demonstrated in Persoon et al., 2018). RPH3A KO neurons release few percent more of the total DCV pool. Hence, “depletion of ChgB” is only marginally different between experimental groups; and (c) the proposed experiment overexpressing ChgB will not help scrutinize our current conclusions as ChgB overexpression is known to affect DCV biogenesis and the total DCV pool, most likely much more than a few percent more release by RPH3A deficiency.

      Hoogstraaten et al. conducted a thorough analysis of the impact of RPH3A KO and its rescue using various mutants on dendrite and axon length (see Supplementary Figure 3). However, they did not test the effect of the ΔSNAP25 mutant. The authors demonstrated that this mutant is the least efficient in rescuing DCV exocytosis (Figure 4E). Hence the neurons expressing this mutant should have a similar size to the KO neurons. This finding would strongly support the argument that DCV exocytosis regulates neuron size. Otherwise, it would suggest that RPH3A may have a function in regulating exocytosis at the growth cones that is independent of SNAP25. Since the authors most probably have the data that allows them to measure the neuron size (acquired for Supplementary Figure 2), I suggest that they perform the required analysis. 

      We agree this is important and performed new experiments to determine the dendrite length of RPH3A WT, KO and KO neurons expressing the ΔSNAP25 mutant. We observed that the dendrite length of RPH3A KO neurons expressing ΔSNAP25 mutant is indeed similar to KO neurons (new Figure S3C). Although not significant we observe a clear trend towards bigger neurons compared to WT.  This strengthens our conclusion that increased DCV exocytosis contributes to the observed increased neuronal size.

      The authors displayed the result of DCV exocytosis in two ways. One is by showing the number of exocytosis events the other is to display the proportion of DCVs that were secreted. They do the latter by dividing the secreted DCV by the total number of DCVs. These are visualized at the end of the experiment through NH4+ application. While this method works well for synaptic secretion as the marker of SV is localized to the SV membrane and remains at the synapse upon SV exocytosis, it cannot be applied in the same manner when it is the DCV content that is labeled as it is released upon secretion. Hence, the total pool of vesicles should be the number of DCV counted upon NH4+ application in addition to those that are secreted. This way of analyzing the total pool of DCV might also explain the difference in this pool size between KO neurons stimulated two times with 8 stimuli instead of one time with 16 stimuli (Sup Fig 2 C and D). This is an important point as it affects the conclusions drawn from Figure 2. 

      We thank the reviewed for this comment. We agree, and we have made the necessary adjustments throughout the manuscript. 

      The kymogram of DCV exocytic events displayed in Figure 2D shows a majority of persistent (>20s long) events. This is strange as NPY-pHluori corresponds to the released cargo. Previous work using the same labeling and stimulation technique showed that content release occurs in less than 10s (Baginska et al. 2023). The authors should comment on that difference. 

      In Baginska et al. (2023), the authors distinguished between persistent and transient events. The transient events are shorter than 10s for the 2x8 and 16x stimulation paradigms, whereas persistent events can last for more than 10s. In our study we did not make this distinction. However, in response to this reviewer, we have now quantified the fusion duration per cell. These new data show that the mean duration is similar between genotypes for both stimulation paradigms. We have added these new data (new Figure S2D-F).

      In Figures 1D and E, some puncta in the kymogram appeared to persist after bleaching. This raises questions about the effectiveness of the bleaching procedure for the FRAP experiment. 

      The reviewer is correct that NPY-pHluorin in Figure 1E (now Figure 2C) is not fully bleached. NPY-pHluorin was more resistant to bleaching than NPY-mCherry. However, we merely bleached the neurites to facilitate our analysis by reducing fluorescence of the stationary puncta without causing phototoxicity. Some remaining fluorescence after bleaching does not affect our conclusions in any way.

      In the discussion, the paragraph titled "RPH3A does not travel with DCVs in hippocampal neurons" is quite confusing and would benefit from a streamlined explanation. 

      We thank the reviewed for this comment. We made the necessary adjustments to make this paragraph clearer, (p 14, line 339-351).

      First paragraph of page 8 "TeNT expression in KO neurons restored neurite length to WT levels. When compared to KO neurons without TeNT, neurite length was not significantly decreased but displayed a trend towards WT levels (Figure 3G, H)." These two sentences are confusing as they seem contradictory. 

      We agree that this conclusion has been too strong. However, we do not see a contradiction. The significant effect between KO and control neurons on both axon and dendrite length is lost upon TeNT expression (which forms the basis for our conclusions cited by the reviewer, now Figure 6B, C). While the difference between KO neurons +/- TeNT did not reach statistical significance. The (strong) trend is clearly in the same direction. We have refined our original conclusion in the revised manuscript, (p 12, line 304-306).

      The data availability statement is missing. 

      We have added the data availability statement, (p 21, line 571-572).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      O'Leary and colleagues present data identifying several procedures that alter discrimination between novel and familiar objects, including time, environmental enrichment, Rac-1, context reexposure, and brief reminders of the familiar object. This is complimented with an engram approach to quantify cells that are active during learning to examine how their activation is impacted following each of the above procedures at test. With this behavioral data, authors apply a modeling approach to understand the factors that contribute to good and poor object memory recall.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and indeed for recognizing our efforts. We engage below with the Reviewer’s specific criticisms.

      Strengths:

      Authors systematically test several factors that contribute to poor discrimination between novel and familiar objects. These results are extremely interesting and outline essential boundaries of incidental, nonaversive memory.<br /> These results are further supported by engram-focused approaches to examine engram cells that are reactivated in states with poor and good object recognition recall.

      We thank the Reviewer for these positive comments.

      Weaknesses:

      For the environmental enrichment, authors seem to suggest objects in the homecage are similar to (or reminiscent of) the familiar object. Thus, the effect of improved memory may not be related to enrichment per se as much as it may be related to the preservation of an object's memory through multiple retrievals, not the enriching experiences of the environment itself. This would be consistent with the brief retrieval figure. Authors should include a more thorough discussion of this.

      This is one of the main issues highlighted by the Editor and the Reviewers. We agree that these results dove-tail with the reminder experiments. We have included additional discussion, see line 510-546.

      Authors should justify the marginally increased number of engram cells in the non-enrichment group that did not show object discrimination at test, especially relative to other figures. More specific cell counting criteria may be helpful for this. For example, was the DG region counted for engram and cfos cells or only a subsection?

      There was a marginal, but non-significant increase in the number of labelled cells within the standard housed mice in Figure 3f. The cell counting criteria was the same across experimental groups and conditions, where the entire dorsal and ventral blade of the dorsal DG was counted for each animal. This non-statistically significant variance may be due to surgical and viral spread difference between mice. We have clarified this in the manuscript, see line 229-232.

      It is unclear why the authors chose a reactivation time point of 1hr prior to testing. While this may be outside of the effective time window for pharmacological interference with reconsolidation for most compounds, it is not necessarily outside of the structural and functional neuronal changes accompanied by reconsolidation-related manipulations.

      A control experiment was performed to demonstrated that a brief reminder exposure of 5 mins on its own was insufficient to induce new learning that formed a lasting memory (Supplementary Figure S4a). Mice given only a brief acquisition period of 5 mins, exhibited no preference for the novel object when tested 1 hour after training, suggesting the absence of a lasting object memory (Supplementary Figure S4b & c). We therefore used the 1-hour time point for the brief reminder experiment in Figure 4a. We have clarified this within the manuscript and supplementary data see line 258-264.

      Figure 5: Levels of exploration at test are inconsistent between manipulations. This is problematic, as context-only reexposures seem to increase exploration for objects overall in a manner that I'm unsure resembles 'forgetting'. Instead, cross-group comparisons would likely reveal increased exploration time for familiar and novel objects. While I understand 'forgetting' may be accompanied by greater exploration towards objects, this is inconsistent across and within the same figure. Further, this effect is within the period of time that rodents should show intact recognition. Instead, context-only exposures may form a competing (empty context) memory for the familiar object in that particular context.

      The Reviewer raises an important question, and we agree with the Reviewer that there should be caution and qualification around interpreting these results as “forgetting”. Indeed, for the context-only rexposures, cross-group comparisons show increased exploration time for familiar and novel objects. As the mice exhibit relatively high exploration of both the novel and familiar objects. An alternative explanation would be that the mice have not truly forgotten the familiar object, but rather as the mouse has not seen the familiar object in the last 6 context only sessions, its reappearance makes it somewhat novel again. Therefore, this change in the object’s reappearance triggers the animal’s curiosity, and in turn drives exploration by the animal. In addition, the context-only exposures may form a competing memory for the familiar object in that particular context. We have highlighted this in the results and also included greater discussion. See lines 306-315.

      I am concerned at the interpretation that a memory is 'forgotten' across figures, especially considering the brief reminder experiments. Typically, if a reminder session can trigger the original memory or there is rapid reacquisition, then this implies there is some savings for the original content of the memory. For instance, multiple context retrievals in the absence of an object reminder may be more consistent with procedures that create a distinct memory and subsequently recruit a distinct engram.

      These findings raise an important question regarding the interpretation of ‘forgetting’. If a reminder trial or experience can trigger the original memory, or there is rapid reacquisition, then this would suggest there is a degree of savings for the original memory content (85, 86). Previous work has emphasized retrieval deficits as a key characteristic of memory impairment, supporting the idea that memory recall or accessibility may be driven by learning feedback from the environment (7, 8, 14–18). Within our behavioral paradigm, a lack of memory expression would still constitute forgetting due to the loss of learned behavioral response in the presence of natural retrieval cues. The changes in memory expression may therefore underlie the adaptive nature of forgetting. This is consistent with the idea that the engram is intact and available, but not accessible. Here we studied natural forgetting, and our data showing memory retrieval following optogenetic reactivation demonstrates that the original engram persists at a cellular level, otherwise activation of those cells would no longer trigger memory recall. We also agree with the reviewer that multiple context retrievals may indeed lead to the formation of a second distinct engram that competes with the original. Recent work suggests that retroactive interference emerges from the interplay of multiple engrams competing for accessibility (18). We have added clarification and included extra discission of this interpretation. See lines 589-598.

      Authors state that spine density decreases over time. While that may be generally true, there is no evidence that mature mushroom spines are altered or that this is consistent across figures. Additionally, it's unclear if spine volume is consistently reduced in reactivated and non-reactivated engram cells across groups. This would provide evidence that there is a functionally distinct aspect of engram cells that is altered consistently in procedures resulting in poor recognition memory (e.g. increased spine density relative to spine density of non-reactivated engram cells and non-engram cells)

      We thank the Reviewer for their helpful comments on explaining our engram dendritic spine data. We agree with the Reviewer that an analysis of the changes in spine type, as well as the difference between engram and non-engram spines as well and reactivation and non-reactivated engram spines would be interesting and may help to further illuminate the morphological changes of forgetting and memory retrieval. Indeed, future analysis could determine if spine density is reduced in reactivated and non-reactivated engram cells or indeed across engram non-engram cells within different learning conditions. This avenue of investigation could determine if there is a functionally distinct aspect of engram cells that are altered following forgetting (67). However, such analysis is beyond the scope of this study. We have highlighted this limitation and included its discussion. See lines 493-499.

      Authors should discuss how the enrichment-neurogenesis results here are compatible with other neurogenesis work that supports forgetting.

      We validated the effectiveness of the enrichment paradigm to enhance neural plasticity by measuring adult hippocampal neurogenesis. The hippocampus has been identified as one of the only regions where postnatal neurogenesis continues throughout life (75). Levels of adult hippocampal neurogenesis do not remain constant throughout life and can be altered by experience (41–43, 57).  In addition, adult born neurons have been shown to contribute to the process of forgetting (74, 78, 79). Although the contribution of adult born neurons to cognition and the memory engram is not fully understood (80, 81). Mishra et al, showed that immature neurons were actively recruited into the engram following a hippocampal-dependent task (67). Moreover, increasing the level of neurogenesis rescued memory deficits by restoring engram activity (67). Augmenting neurogenesis further rescued the deficits in spine density in both immature and mature engram neurons in a mouse model of Alzheimer’s disease (67). Whether neurogenesis alters spine density on differentially for reactivated or non-reactivation engrams cells remains to be investigated (67, 68). This avenue of research would help to illuminate the morphological changes following forgetting and provide evidence if there is a functionally distinct aspect of engram cells that is altered in forgetting (67, 68). Our engram labelling strategy which utilized c-fos-tTA transgenic mice combined with an AAV9-TRE-ChR2-eYFP virus does not necessarily label sufficient immature neurons. Future work could utilize a different engram preparation, such as a genetic labelling strategy (TRAP2) or using a different immediate early gene promoter such as Arc to investigate the contribution of new-born neurons to the engram ensemble. We have added additional discussion of how our work fits with previous literature investigating neurogenesis and forgetting. See lines 547-565.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript examines an important question about how an inaccessible, natural forgotten memory can be retrieved through engram ensemble reactivation. It uses a variety of strategies including optogenetics, behavioral and pharmacological interventions to modulate engram accessibility. The data characterize the time course of natural forgetting using an object recognition task, in which animals can retrieve 1 day and 1 week after learning, but not 2 weeks later. Forgetting is correlated with lower levels of cell reactivation (c-fos expression during learning compared to retrieval) and reduction in spine density and volume in the engram cells. Artificial activation of the original engram was sufficient to induce recall of the forgotten object memory while artificial inhibition of the engram cells precluded memory retrieval. Mice housed in an enriched environment had a slower rate of forgetting, and a brief reminder before the retrieval session promoted retrieval of a forgotten memory. Repeated reintroduction to the training context in the absence of objects accelerated forgetting. Additionally, activation of Rac1-mediated plasticity mechanisms enhanced forgetting, while its inhibition prolonged memory retrieval. The authors also reproduce the behavioral findings using a computational model inspired by Rescorla-Wagner model. In essence, the model proposes that forgetting is a form of adaptive learning that can be updated based on prediction error rules in which engram relevancy is altered in response to environmental feedback.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and for recognizing our efforts. We engage below the Reviewer’s specific criticisms of our interpretations.

      Strengths:

      (1) The data presented in the current paper are consistent with the authors claim that seemingly forgotten engrams sometimes remain accessible. This suggests that retrieval deficits can lead to memory impairments rather than a loss of the original engram (at least in some cases).

      We thank the Reviewer for their positive summary.

      (2) The experimental procedures and statistics are appropriate, and the behavioral effects appear to be very robust. Several key effects are replicated multiple times in the manuscript.

      We thank the Reviewer for their positive comments.

      Weaknesses:

      (1) My major issue with the paper is the forgetting model proposed in Figure 7. Prior work has shown that neutral stimuli become associated in a manner similar to conditioned and unconditioned stimuli. As a result, the Rescorla-Wagner model can be used to describe this learning (Todd & Homes, 2022). In the current experiments, the neutral context will become associated with the unpredicted objects during training (due to a positive prediction error). Consequently, the context will activate a memory for the objects during the test, which should facilitate performance. Conversely, any manipulation that degrades the association between the context and object should disrupt performance. An example of this can be found in Figure 5A. Exposing the mice to the context in the absence of the objects should violate their expectations and create a negative prediction error. According to the Rescorla-Wagner model, this error will create an inhibitory association between the context and the objects, which should make it harder for the former to activate a memory of the latter (Rescorla & Wagner, 1972). As a result, performance should be impaired, and this is what the authors find. However, if the cells encoding the context and objects were inhibited during the context-alone sessions (Figure 5D) then no prediction error should occur, and inhibitory associations would not be formed. As a result, performance should be intact, which is what the authors observe.

      What about forgetting of the objects that occurs over time? Bouton and others have demonstrated that retrieval failure is often due to contextual changes that occur with the passage of time (Bouton, 1993; Rosas & Bouton, 1997, Bouton, Nelson & Rosas, 1999). That is, both internal (e.g. state of the animal) and external (e.g. testing room, chambers, experimenter) contextual cues change over time. This shift makes it difficult for the context to activate memories with which it was once associated (in the current paper, objects). To overcome this deficit, one can simply re-expose animals to the original context, which facilitates memory retrieval (Bouton, 1993). In Figure 2D, the authors do something similar. They activate the engram cells encoding the original context and objects, which enhances retrieval.

      Therefore, the forgetting effects presented in the current paper can be explained by changes in the context and the associations it has formed with the objects (excitatory or inhibitory). The results are perfectly predicted by the Rescorla-Wagner model and the context-change findings of Bouton and others. As a result, the authors do not need to propose the existence of a new "forgetting" variable that is driven by negative prediction errors. This does not add anything novel to the paper as it is not necessary to explain the data (Figures 7 and 8).

      We thank the reviewer for clearly explaining their concern about our model. We are very sorry that we did not sufficiently explain that our model is, in fact, based on the classic Rescorla-Wagner model. The key equation of the model that updates “engram strength”  is equivalent to the canonical Rescorla-Wagner model that is commonly used in research on reinforcement learning and decision-making (105). One potential minor difference is that we crucially assume different learning rates for positive and negative prediction errors. However, this variant of the Rescorla-Wagner model is common in the computational literature and is generally not regarded as a qualitatively different kind of model. In fact, it allows us to capture that establishing an object-context association (after a positive prediction error) is faster than the forgetting process (through negative errors).

      The other equations that are explained in detail in the Methods are necessary to simulate exploration behavior and render the model suitable for model fitting. Concerning exploration behavior, we use the softmax function, which is commonly used in combination with the Rescorla-Wager model, in order to translate the learned quantity (in our case, engram strength) into behavior (here exploration). The other equations are necessary to fit the model to the data (learning rate α and behavioral variability in exploration behavior).

      Therefore, we fully agree with the reviewer that the Rescorla-Wagner can explain our empirical results, in particular by assuming that the different manipulations affect the strength of object-context associations, which, in turn, governs forgetting as behaviorally observed. 

      In our previous version of the manuscript, we only referred to the Rescorla-Wagner model directly in the Methods. But to make this important point clearer, we now refer to the origin of the model multiple times in the Results section as well. See lines 81, 386-393.

      We also agree with the reviewer that the learning/forgetting process can be described in terms of changes in object-context associations (e.g., inhibitory associations after a negative prediction error). Therefore, we now explicitly refer to the relationship between updated object-context associations and forgetting and highlight that we believe that stronger associations signal higher engram “relevancy”. See lines 386-393.

      We have extended Figure 7 (new panels a and b), where we illustrate the idea that (a) object-context associations govern forgetting and (b) show the key Rescorla-Wagner equation, including a simple explanation of the main terms (engram strength, prediction error, and learning rate). Finally, we have also extended our discussion of the model, where we now directly state that the Rescorla-Wagner model captures the key results of our experiments. See lines 573-580.

      In order to further support a link between our empirical data and computational modeling, we also added extra experiments that showed the modulation of engram cells within the dentate gyrus can regulate these object-context associations. See Supplementary Figure 12a-f and lines 401-404.

      To summarize our reply, we agree with the reviewer’s comment and hope that we have clarified the direct relationship to the Rescorla-Wagner model.

      (2) I also have an issue with the conclusions drawn from the enriched environment experiment (Figure 3). The authors hypothesize that this manipulation alleviates forgetting because "Experiencing extra toys and objects during environmental enrichment that are reminiscent of the previously learned familiar object might help maintain or nudge mice to infer a higher engram relevancy that is more robust against forgetting.". This statement is completely speculative. A much simpler explanation (based on the existing literature) is that enrichment enhances synaptic plasticity, spine growth, etc., which in turn reduces forgetting. If the authors want to make their claim, then they need to test it experimentally. For example, the enriched environment could be filled with objects that are similar or dissimilar to those used in the memory experiments. If their hypothesis is correct, only the similar condition should prevent forgetting.

      We thank the Reviewer for this alternative perspective on our findings. First of all, we agree that this statement is speculative. The effects of enrichment on neural plasticity are well established and it likely contributes to the enhanced memory recall. It is important to emphasize that this process of updating is not necessarily separate from enrichment-induced plasticity at an implementational level, but part of the learning experience within an environment containing multiple objects. The enrichment or, more generally, experience, may therefore enhance memory through the modification of activity of specific engram ensembles. The idea of enrichment facilitating memory updating is consistent with the results obtained by the reminder experiments and further supported by our analysis with the Rescorla-Wagner computational model, where experience updates the accessibility of existing memories, possibly through reactivation of the original engram ensemble.

      We would like to further clarify that our explanation concerns the algorithmic level, in contrast to the neural level. Based on the computational analyses using the Rescorla-Wagner model and in line with the reviewer’s previous comment on the model, we believe that forgetting is governed by the strength of object-context associations (or engram relevancy). Our interpretation is that stronger associations signal that the memory or engram representation is important ("relevant") and should not be forgotten. Accordingly, due to a vast majority of experiences with extra cage objects in the enriched environment, mice might generally learn that such objects are common in their environment and potentially relevant in the future (i.e., the object-context association is strong, preventing forgetting). Our speculation of these results is to help unify our empirical data with the computational model.

      We believe that the Reviewer's alternative explanation in terms of synaptic plasticity, spine growth is not mutually exclusive with the modelling work. It is possible that the computational mechanisms that we explore based on the Rescorla-Wagner model are neuronally related to the biological mechanisms that the reviewer suggests at the implementational level. Therefore, ultimately, the two perspectives might even complement each other. We have included additional discussion to clarify this point. See lines 510-546.

      (3) It is well-known that updating can both weaken or strengthen memory. The authors suggest that memory is updated when animals are exposed to the context in the absence of the objects. If the engram is artificially inhibited (opto) during context-only re-exposures, memory cannot be updated. To further support this updating idea, it would be good to run experiments that investigate whether multiple short re-exposures to the training context (in the presence of the objects or during optogenetic activation of the engram) could prevent forgetting. It would also be good to know the levels of neuronal reactivation during multiple re-exposures to the context in the absence versus context in the presence of the objects.

      We thank the Reviewer for their comments. We agree that additional experiments would be helpful to further support the idea of updating. We have performed additional experiments to test the idea that multiple short re-exposures to the training context, in the presence of objects prevents forgetting. In this paradigm, mice were repeatedly exposed to the original object pair (Supplementary Figure S5a). The results indicate that repeated reminder trials facilitate object memory recall (Supplementary Figure 5b&c). These data indicated that subsequent object reminders over time facilitates the transition of a forgotten memory to an accessible memory. See Supplementary Figure S5 and Lines 279-287.

      (4) There are a number of studies that show boundary conditions for memory destabilization/reconsolidation. Is there any evidence that similar boundary conditions exist to make an inaccessible engram accessible?

      The Reviewer asks an interesting question about boundary conditions and engram accessibility. Boundary conditions could indeed affect the degree of destabilization and reconsolidation, the salience or strength of the memory, as well as the timing of retrieval cues. Future models could focus on understanding the specific boundary conditions in which a memory becomes retrievable and the degree to which it is sufficiently destabilized and liable for updating and forgetting. We have included additional discussion on the potential role of boundary conditions for engram accessibility. See lines 661-666.

      (5) More details about how the quantification of immunohistochemistry (c-fos, BrdU, DAPI) was performed should be provided (which software and parameters were used to consider a fos positive neurons, for example).

      We have added additional information for the parameters of quantification of immunohistochemistry. See lines 796-809.

      (6) Duration of the enrichment environment was not detailed.

      We have highlighted the details for the environmental enrichment duration. See lines 756.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ryan and colleagues uses a well-established object recognition task to examine memory retrieval and forgetting. They show that memory retrieval requires activation of the acquisition engram in the dentate gyrus and failure to do so leads to forgetting. Using a variety of clever behavioural methods, the authors show that memories can be maintained and retrieval slowed when animals are reared in environmental enrichment and that normally retrieved memories can be forgotten if exposed to the environment in which the expected objects are no longer presented. Using a series of neural methods, the authors also show that activation or inhibition of the acquisition engram is key to memory expression and that forgetting is due to Rac1.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and indeed for recognizing our efforts. We engage below the Reviewer’s specific criticisms of our interpretations.

      Strengths:

      This is an exemplary examination of different conditions that affect successful retrieval vs forgetting of object memory. Furthermore, the computational modelling that captures in a formal way how certain parameters may influence memory provides an important and testable approach to understanding forgetting.

      The use of the Rescorla-Wagner model in the context of object recognition and the idea of relevance being captured in negative prediction error are novel (but see below).

      The use of gain and loss of function approaches are a considerable strength and the dissociable effects on behaviour eliminate the possibility of extraneous variables such as light artifacts as potential explanations for the effects.

      We thank the Reviewer for their positive comments.

      Weaknesses:

      Knowing what process (object retrieval vs familiarity) governed the behavioural effect in the present investigation would have been of even greater significance.

      The Reviewer touches on an important issue of the object recognition task. Understanding how experience alters object familiarity versus object retrieval and its impact on learning would help to develop better models of object memory and forgetting. We have added additional discussion. See lines 666-669.

      The impact of the paper is somewhat limited by the use of only one sex.

      We agree that using only male mice limits the impact of the paper. Indeed, the field of behavioural neuroscience is moving to include sex as a variable. Future experiments should include both male and female mice.

      While relevance is an interesting concept that has been operationalized in the paper, it is unclear how distinct it is from extinction. Specifically, in the case where the animals are exposed to the context in the absence of the object, the paper currently expresses this as a process of relevance - the objects are no longer relevant in that context. Another way to think about this is in terms of extinction - the association between the context and the objects is reduced results in a disrupted ability of the context to activate the object engram.

      We thank the reviewer for their insightful comment on the connection between engram relevance and memory extinction. Lacagnina et al., demonstrated that extinction training suppressed the reactivation of a fear engram, while activating a second putative extinction ensemble (59). In another study, these extinction engram cells and reward cells were shown to be functionally interchangeable (92). Moreover, in a study conducted by Lay et al., the balance between extinction and acquisition was disrupted by inhibiting the extinction recruited neurons in the BLA and CN (93). These results suggested that decision making after extinction can be governed by a balance between acquisition and extinction specific ensembles (93). Together, this may suggest that in the present study, when mice are repeatedly exposed to the training context, the association between the context and the objects is reduced, resulting in a disrupted ability of the context to activate the object engram. Therefore, memory relevance and extinction may operate similarly to effect engram accessibility, and in essence ‘forgetting’ of object memories may be due to neurobiological mechanisms similar to that of extinction learning (4). We have included additional discussion on the link between our results and the extinction literature. See lines 642-654.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional measures that may help interpretation of and clarify data are:

      A minute-by-minute analysis for training and testing may provide insight about the learning rate and testing temporal dynamics that may shed light substantially on differential levels of exploration. This should be applied across figures and would support conclusions from models in Figures 7-8 as well.

      Locomotion/distance travelled measures.

      We have included additional analysis for a minute-by-minute analysis of training and testing of the object memory test at 24 hr, 2 weeks as well as under the standard housing and enrichment conditions. The results further support the initial finding that novel object recognition is increased in mice that recall the object at 24 hr. Similarly, mice housed in the enriched housing initially explore the novel object more compared to the familiar object. See Supplementary Figure 1 and 2, as well as lines 103-105 and 211-213.

      The appropriate control for the context exposure figure would be to expose to a novel context in one group and the acquisition/testing context for the other.

      We agree with the reviewer that an additional control of a novel context would further support our findings. Indeed, this line of investigate may dove-tail with the other reviewer comments on the role of competing engrams and interference. Future work could investigate the degree to which novel contexts and multiple memories can affect the rate of forgetting through engram updating. We have included additional discussion. See lines 643 and 655. However, in our experience it is necessary to pre-expose mice to different contexts before object exposure (e.g. Autore et al ’23), in order to form discriminate object/context associations. Establishing such a paradigm for this study would be at odds with the established paradigms and schedules in this current study. Moreover, the possibility that the effect of object displacement on forgetting requires the familiar context, or not, does not impact the main conclusions of this study. However, we agree that it is a point for expansion in the future.

      A control virus+light group vs simply a no-light condition.

      For optogenetic experiments. Control mice underwent the same surgery procedure with virus and optic fibre implantation. However, no light was delivered to excite or inhibit the respective opsin. Previous papers have shown laser light delivered to tissue expressing an AAV-TRE-EYFP lacking an light-opsin does cause cellular excitation. We have clarified this in the text. See lines 726-729.

      Reviewer #2 (Recommendations For The Authors):

      Minor details:

      (1) In the pharmacological modification of Rac 1, please specify what percentage of DMSO was used to dissolve Rac1 inhibitor and correct the typo 'DSMO'

      Rac1 inhibitor (Ehop016) was reconstituted and prepared in PBS with 1% Tween-80, 1% DMSO and 30% PEG. We have clarified this in the text and corrected the typo, thank you. See lines 767.

      (2) In the penultimate paragraph there is a typo 'predication error'

      This is now corrected. Thankyou.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find information on what the No Light group consisted of. Was there a control virus infused, were the animals implanted with optical fibres (in the presence or absence of a virus), were they surgical controls, etc?

      For optogenetic experiments. No Light Control mice underwent the same surgery procedure with virus and optic fibre implantation. However, no light was delivered to excite or inhibit the respective opsin. We have clarified this in the text. See lines 726-729.

      The discussion lacked specificity in places. For example, the idea of eluding to 'other variables' is somewhat vague (p. 21, middle paragraph). Some examples of what other variables could be relevant would be helpful in capturing what direction or relevance the model may have going forward.

      We have expanded the discussion of other variables which might impact engram relevance and how the model might be developed moving forward. These may include, boundary conditions of destabilization and reconsolidation, the salience or strength of the memory as well as the timing of retrieval cues or updating experience. Future models could focus on understanding the specific boundary conditions in which a memory becomes retrievable and the degree to which it is sufficiently destabilized and liable for updating and forgetting. The role of perceptual learning on memory retrieval and forgetting may also be an avenue of future investigation. Understanding how experience alters object familiarity versus object retrieval and its impact on learning would also help to develop better models of object memory and forgetting. In the current study, only male mice were utilized. Therefore, future work could also include sex as a variable to fully elucidate the impact of experience on the processes of forgetting. See lines 642-669.

      In the same paragraph (p. 21, middle paragraph) there is mention of multiple engrams and how they can compete. The authors reference Autore et al (2023), but I thought Lacagina did this really beautifully also in an experimental setting. This idea is also expressed in Lay et al. (2022). So additional references would further strengthen the authors argument here.

      We thank the reviewer for the additional references for discussing engram competition. We have included these papers in the discission. See lines 642-654.

      Relatedly, environmental enrichment was considered in terms of object relevance. I wonder if the authors may want to consider thinking about their results in terms of effects on perceptual learning.

      Indeed, perceptual learning maybe playing a role in environmental enrichment. We have included additional discussion. See lines 666-669.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors present a model for multisensory correlation detection that is based on the neurobiologically plausible Hassenstein Reichardt detector. It modifies their previously reported model (Parise & Ernst, 2016) in two ways: a bandpass (rather than lowpass) filter is initially applied and the filtered signals are then squared. The study shows that this model can account for synchrony judgement, temporal order judgement, etc in two new data sets (acquired in this study) and a range of previous data sets.

      Strengths:

      (1) The model goes beyond descriptive models such as cumulative Gaussians for TOJ and differences in cumulative Gaussians for SJ tasks by providing a mechanism that builds on the neurobiologically plausible Hassenstein-Reichardt detector.

      (2) This modified model can account for results from two new experiments that focus on the detection of correlated transients and frequency doubling. The model also accounts for several behavioural results from experiments including stochastic sequences of A/V events and sine wave modulations.

      Additional thoughts:

      (1) The model introduces two changes: bandpass filtering and squaring of the inputs. The authors emphasize that these changes allow the model to focus selectively on transient rather than sustained channels. But shouldn't the two changes be introduced separately? Transients may also be detected for signed signals.

      We updated the original model because our new psychophysical evidence demonstrates the fundamental role of unsigned transient for multisensory perception. While the original model received input from sustained unimodal channels (low-pass filters), the new version receives input from unsigned unimodal transient channels. Transient channels are normally modelled through bandpass filters (to remove the DC and high-frequency signal components) and squaring (to remove the sign). While these may appear as two separate changes in the model, they are, in fact, a single one: the substitution of sustained with unsigned transient channels (for a similar approach, see Stigliani et al. 2017, PNAS). Either change alone would not be sufficient to implement a transient channel that accounts for the present results.

      That said, we were also concerned with introducing too many changes in the model at once. Indeed, we simply modelled the unimodal transient channels as a single band-pass filter followed by squaring. This is already a stripped-down version of the unsigned transient detectors proposed by Adelson and Bergen in their classic Motion Energy model. The original model consisted of two biphasic temporal filters 90 degrees out of phase (i.e., quadrature filters), whose output is later combined. While a simpler implementation of the transient channels was sufficient in the present study, the full model may be necessary for other classes of stimuli (including speech, Parise, 2024, BiorXiv). Therefore, for completeness, we now include in the Supplementary Information a formal description of the full model, and validate it by simulating our two novel psychophysical studies. See Supplementary Information “The quadrature MCD model” section and Supplementary Figure S8.

      (2) Because the model is applied only to rather simple artificial signals, it remains unclear to what extent it can account for AV correlation detection for naturalistic signals. In particular, speech appears to rely on correlation detection of signed signals. Can this modified model account for SJ or TOJ judgments for naturalistic signals?

      It can. In a recent series of studies we have demonstrated that a population of spatially-tuned MCD units can account for audiovisual correlation detection for naturalistic stimuli, including speech (e.g. the McGurk Illusion). Once again, unsigned transients were sufficient to replicate a variety of previous findings. We have now extended the discussion to cover this recent research: Parise, C. V. (2024). Spatiotemporal models for multisensory integration. bioRxiv, 2023-12.

      Even Nidiffer et al. (2018) which is explicitly modelled by the authors report a significant difference in performance for correlated and anti-correlated signals. This seems to disagree with the results of study 1 reported in the current paper and the model's predictions. How can these contradicting results be explained? If the brain detects correlation on signed and unsigned signals, is a more complex mechanism needed to arbitrate between those two?

      We believe the reviewer here refers to our Experiment 2 (where, like Nidiffer at al. (2018) we used periodic stimuli, not Experiment 1, which consists of step stimuli). We were also puzzled by the difference between our Experiment 2 and Nidiffer et al. (2018): we induced frequency doubling, Nidiffer did not. Based on quantitative simulations, we concluded that this difference could be attributed to the fact that while Nidiffer included on each trial an intensity ramp in their periodic audiovisual stimuli, we did not. As a result, when considering the ramp (unlike in Nidiffer’s analyses), all audiovisual signals used by Nidiffer were positively correlated (irrespective of frequency and phase offset), while our signals in Experiment 2 were sometimes correlated and other times not (depending on the phase offset). This important simulation is included in Supplementary Figure S7; we also have now updated the text to better highlight the role of the pedestal in determining the direction of the correlation.

      (3) The number of parameters seems quite comparable for the authors' model and descriptive models (e.g. PSF models). This is because time constants require refitting (at least for some experimental data sets) and the correlation values need to be passed through a response mode (i.e. probit function) to account for behavioural data. It remains unclear how the brain adjusts the time constants to different sensory signals.

      This is a deep question. For simplicity, here the temporal constants were fitted to the empirical psychometric functions. To avoid overfitting, whenever possible we fitted such parameters over some training datasets, while trying to predict others. However, in some cases, it was necessary to fit the temporal constants to specific datasets. This may suggest that the temporal tuning of those units is not crystalised to some pre-defined values, but is adjusted based on recent perceptual history (e.g., the sequence of trials and stimuli participants are exposed to during the various experiments).

      For transparency, here we show how varying the tuning of the temporal constants of the filters affects the goodness of fit of our new psychophysical experiments (Supplementary Figure S8). As it can be readily appreciated, the relative temporal tuning of the unimodal transient detector was critical, though their absolute values could vary over a range of about 15 to over 100ms. The tuning of the low-pass filters of the correlation detector (not shown here) displayed much lower temporal sensitivity over a range between 0.1s to over 1s.

      This simulation shows the impact of temporal tuning in our simulations, however, the question remains as to how such a tuning gets selected in the first place. An appealing explanation relies on natural scene statistics: units are temporally tuned to the most common audiovisual stimuli. Although our current empirical evidence does not allow us to quantitatively address this question, in previous simulations (see Parise & Ernst, 2016, Supplementary Figure 8), by analogy with visual motion adaptation, we show how the temporal constants of our model can dynamically adjust and adapt to recent perceptual history. We hope these new and previous simulations address the question about the nature of the temporal tuning of the MCD units.

      (4) Fujisaki and Nishida (2005, 2006) proposed mechanisms for AV correlation detection based on the Hassenstein-Reichardt motion detector (though not formalized as a computational model).

      This is correct, Fujisaki and Nishida (2005, 2007) also hypothesized that AV synchrony could be detected using a mechanism analogous to motion detection. Interestingly, however, they ruled out such a hypothesis, as their “data do not support the existence of specialized low-level audio-visual synchrony detectors”. Yet, along with our previous work (Parise & Ernst, 2016, where we explicitly modelled the experiments of Fujisaki and Nishida), the present simulations quantitatively demonstrate that a low-level AV synchrony detector is instead sufficient to account for audiovisual synchrony perception and correlation detection. We now credit Fujusaki and Nishida in the modelling section for proposing that AV synchrony can be detected by a cross-correlator.

      Finally, we believe the reviewer is referring to the 2005 and 2007 studies of Fujisaki and Nishida (not 2006); here are the full references of the two articles we are referring to:

      Fujisaki, W., & Nishida, S. Y. (2005). Temporal frequency characteristics of synchrony–asynchrony discrimination of audio-visual signals. Experimental Brain Research, 166, 455-464.

      Fujisaki, W., & Nishida, S. Y. (2007). Feature-based processing of audio-visual synchrony perception revealed by random pulse trains. Vision Research, 47(8), 1075-1093.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written manuscript that seeks to detail the performance of two human psychophysical experiments designed to look at the relative contributions of transient and sustained components of a multisensory (i.e., audiovisual) stimulus to their integration. The work is framed within the context of a model previously developed by the authors and is now somewhat revised to better incorporate the experimental findings. The major takeaway from the paper is that transient signals carry the vast majority of the information related to the integration of auditory and visual cues, and that the Multisensory Correlation Detector (MCD) model not only captures the results of the current study but is also highly effective in capturing the results of prior studies focused on temporal and causal judgments.

      Strengths:

      Overall the experimental design is sound and the analyses are well performed. The extension of the MCD model to better capture transients makes a great deal of sense in the current context, and it is very nice to see the model applied to a variety of previous studies.

      Weaknesses:

      My one major issue with the paper revolves around its significance. In the context of a temporal task(s), is it in any way surprising that the important information is carried by stimulus transients? Stated a bit differently, isn't all of the important information needed to solve the task embedded in the temporal dimension? I think the authors need to better address this issue to punch up the significance of their work.

      In hindsight, it may appear unsurprising that transient signals carry most information for audiovisual integration. Yet, so somewhat unexpectedly, this has never been investigated using perhaps the most diagnostic psychophysical tools for perceived crossmodal timing; namely temporal order and simultaneity judgments–along with carefully designed experiments with quantitative predictions for the effect of either channel. The fact that the results conform to intuitive expectations further supports the value of the present work: grounding empirically with what is intuitively expected. This offers solid psychophysical evidence that one can build on for future advancements. Importantly, developing a model that builds on our new results and uses the same parameters to predict a variety of classic experiments in the field, further supports the current approach.

      If “significance” is intended as shaking previous intuitions or theories, then no: this is not a significant contribution. If instead, by significance we intend to build a solid empirical and theoretical ground for future work, then we believe this study is not significant, it is foundational. We hope that this work's significance is better captured in our discussion.

      On a side note, there is an intriguing factor around transient vs. sustained channels: what matters is the amount of change, not the absolute stimulus intensity. Previous studies, for example, have suggested a positive cross modal mapping between auditory loudness and visual lightness or brightness [Odegaard et al., 2004]. This study, conversely, challenges this view and demonstrates that what matters for multisensory integration in time is not the intensity of a stimulus, but changes thereof.

      In a more minor comment, I think there also needs to be a bit more effort into articulating the biological plausibility/potential instantiations of this sustained versus transient dichotomy. As written, the paper suggests that these are different "channels" in sensory systems, when in reality many neurons (and neural circuits) carry both on the same lines.

      The reviewer is right, in our original manuscript we glossed over this aspect. We have now expanded the introduction to discuss their anatomical basis. However, we are not assuming any strict dichotomy between transient and sustained channels; rather, our results and simulations demonstrate that transient information is sufficient to account for audiovisual temporal integration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Related to point 2 of the public review, can the authors provide additional results showing that the model can also account for naturalistic signals and more complex stochastic signals?

      While working on this manuscript, we were also working in parallel on a project related to audiovisual integration of naturalistic signals. A pre-print is available online [Parise, 2024, BiorXiv], and the related study is now discussed in the conclusions.

      (2) As noted in the public review, Fujisaki and Nishida (2005, 2006) already proposed mechanisms for AV correlation detection based on the Hassenstein-Reichardt motion detector. Their work should be referenced and discussed.

      We have now acknowledged the contribution of Fujisaki and Nishida in the modelling section, when we first introduce the link between our model and the Hassenstein-Reichardt detectors.

      (3) Experimental parameters: Was the phase shift manipulated in blocks? If yes, what about temporal recalibration?

      To minimise the effect of temporal recalibration, the order of trials in our experiments was randomised. Nonetheless, we can directly assess potential short-term recalibration effects by plotting our psychophysical responses against both the current SOA, and that of the previous trials. The resulting (raw) psychometric surfaces below are averaged across observers (and conditions for Experiment 1). In all our experiments, responses are obviously dependent on the current SOA (x-axis). However, the SOA of the previous trials (y-axis) does not seem to meaningfully affect simultaneity and temporal order judgments. The psychometric curves above the heatmaps represent the average psychometric functions (marginalized over the SOA of the previous trial).

      All in all, the present analyses demonstrate negligible temporal recalibration across trials, likely induced by a random sequence of lags or phase shifts. Therefore, when estimating the temporal constants of the model, it seems reasonable to ignore the potential effects of temporal recalibration. To avoid increasing the complexity of the present manuscript, we would prefer not to include the present analyses in the revised version.

      Author response image 1.

      Effect of previous trial. Psychometric surfaces for Experiments 1 and 2 plotted against the lag in the current vs. the previous trial. While psychophysical responses are strongly modulated by the lag in the last trial (horizontal axis), they are relatively unaffected by the lag in the previous trial (vertical axis).

      (4) The model predicts no differences for experiment 1 and this is what is empirically observed. Can the authors support these null results with Bayes factors?

      This is a good suggestion: we have now included a Bayesian repeated measures ANOVA to the analyses of Experiment 1. As expected, these analyses provide further, though mild evidence in support for the null hypothesis (See Table S2). For completeness, the new Bayesian analyses are presented alongside the previous frequentist ones in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This important work advances our understanding of sperm motility regulation during fertilization by uncovering the midpiece/mitochondria contraction associated with motility cessation and structural changes in the midpiece actin network as its mode of action. The evidence supporting the conclusion is solid, with rigorous live cell imaging using state-of-the-art microscopy, although more functional analysis of the midpiece/mitochondria contraction would have further strengthened the study. The work will be of broad interest to cell biologists working on the cytoskeleton, mitochondria, cell fusion, and fertilization. Strengths: The authors demonstrate that structural changes in the flagellar midpiece F-actin network are concomitant to midpiece/mitochondrial contraction and motility arrest during sperm-egg fusion by rigorous live cell imaging using state-of-art microscopy.

      Response P1.1: We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      Many interesting observations are listed as correlated or in time series but do not necessarily demonstrate the causality and it remains to be further tested whether the sperm undergoing midpiece contraction are those that fertilize or those that are not selected. Further elaboration of the function of the midpiece contraction associated with motility cessation (a major key discovery of the manuscript) would benefit from a more mechanistic study.

      Response P1.2: We thank the reviewer for this point. We have toned down some of our statements since some of the observations are indeed temporal correlations. We will explore some of these possible connections in future experiments. In addition, we have now incorporated additional experiments and possible explanations about the function of the midpiece contraction.

      Reviewer #2 (Public Review): 

      (1) The authors used various microscopy techniques, including super-resolution microscopy, to observe the changes that occur in the midpiece of mouse sperm flagella. Previously, it was shown that actin filaments form a double helix in the midpiece. This study reveals that the structure of these actin filaments changes after the acrosome reaction and before sperm-egg fusion, resulting in a thinner midpiece. Furthermore, by combining midpiece structure observation with calcium imaging, the authors show that changes in intracellular calcium concentrations precede structural changes in the midpiece. The cessation of sperm motility by these changes may be important for fusion with the egg. Elucidation of the structural changes in the midpiece could lead to a better understanding of fertilization and the etiology of male infertility. The conclusions of this manuscript are largely supported by the data, but there are several areas for improvement in data analysis and interpretation. Please see the major points below.

      Response P2.1: We thank the reviewer for the positive comments.

      (2) It is unclear whether an increased FM4-64 signal in the midpiece precedes the arrest of sperm motility. in or This needs to be clarified to argue that structural changes in the midpiece cause sperm motility arrest. The authors should analyze changes in both motility and FM4-64 signal over time for individual sperm.

      Response P2.2 : We have conducted single cell experiments tracking both FM4-64 and motility as the reviewer suggested (Supplementary Fig S1). We have observed that in all cases, cells gradually diminished the beating frequency and increased FM4-64 fluorescence in the midpiece until a complete motility arrest is observed. A representative example is shown in this Figure but we will reinforce this concept in the results section.

      (3) It is possible that sperm stop moving because they die. Figure 1G shows that the FM464 signal is increased in the midpiece of immotile sperm, but it is necessary to show that the FM4-64 signal is increased in sperm that are not dead and retain plasma membrane integrity by checking sperm viability with propidium iodide or other means.

      Response P2.3: This is a very good point. In our experiments, we always considered sperm that were motile to hypothesize about the relevance of this observation. We have two types of experiments: 

      (1) Sperm-egg Fusion: In experiments where sperm and eggs were imaged to observe their fusion, sperm were initially moving and after fusion, the midpiece contraction (increase in FM4-64 fluorescence was observed) indicating that the change in the midpiece (that was observed consistently in all fusing cells analyzed), is part of the process. 

      (2) Sperm that underwent acrosomal exocytosis (AE): we have observed two behaviours as shown in Figure 1: 

      a) Sperm that underwent AE and they remain motile without midpiece contraction (they are alive for sure); 

      b) Sperm that underwent AE and stopped moving with an increase in FM464 fluorescence. We propose that this contraction during AE is not desired because it will impede sperm from moving forward to the fertilization site when they are in the female reproductive tract. In this case, we acknowledge that the cessation of sperm motility may be attributed to cellular death, potentially correlating with the increased FM4-64 signal observed in the midpiece of immotile sperm that have undergone AE. To address this hypothesis, we conducted image-based flow cytometry experiments, which are well-suited for assessing cellular heterogeneity within large populations.

      Author response image 1 illustrates the relationship between cell death and spontaneous AE in noncapacitated mouse sperm, where intact acrosomes are marked by EGFP. Cell death was evaluated using Sytox Blue staining, a dye that is impermeable to live cells and shows affinity for DNA. AE was assessed by the absence of EGFP in the acrosome. 

      Author response image 1a indicates a lack of correlation between Sytox and EGFP fluorescence. Two populations of sperm with EGFP signals were found (EGFP+ and EGFP-), each showing a broad distribution of Sytox signal, enabling the distinction between cells that retain plasma membrane integrity (live sperm: Sytox-) and those with compromised membranes (dead cells: Sytox+). The observed bimodal distribution of EGFP signal, regardless of live versus dead cell populations, indicates that the fenestration of the plasma membrane known to occur during AE is a regulated process that does not necessarily compromise the overall plasma membrane integrity. 

      These observations are reinforced by the single-cell examples in Author response image 1b, where we were able to identify sperm in four categories: live sperm with intact acrosome (EGFP+/Sytox-), live sperm with acrosomal exocytosis (EGFP-/Sytox-), dead sperm with intact acrosome (EGFP+/Sytox+), and dead sperm with AE (EGFP-/Sytox+). Note the case of AE (lacking EGFP signal) which bears an intact plasma membrane (lacking Sytox Blue signal). Author response image 2 shows single-cell examples of the four categories observed with confocal microscopy to reinforce the observations from Author response image 1a.

      Author response image 1.

      Fi. Image based flow cytometry analysis (ImageStream Merk II), of non-capacitated mouse sperm, showing the distribution of EGFP signal (acrosome integrity) against Sytox Blue staining (cell viability).  (A) The quadrants show: Sytox Blue + / EGFP low (17.6%), Sytox Blue + / EGFP high (40.1%), Sytox Blue - / EGFP high (20.2%), and Sytox Blue - / EGFP low (21.7%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented in a log10 scale of arbitrary units of fluorescence.  (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A). The top row displays sperm with compromised plasma membrane integrity (Sytox Blue +), showing low (left) and high (right) EGFP signals. The bottom row shows sperm with intact plasma membrane (Sytox Blue -), displaying high (left) and low (right) EGFP signal. It is worth noting that when analyzing the percentages in (A), we observed that the data also encompass a population of headless flagella, which was present in all observed categories. Therefore, the percentages should be interpreted with caution.

      Author response image 2.

      Confocal Microscopy Examples of AE and cell viability. The top row features sperm with compromised plasma membrane integrity (Sytox Blue +) and high EGFP expression; the second row displays sperm with compromised membrane and low EGFP expression; the third row illustrates sperm with intact membrane (Sytox Blue -) and high EGFP expression; the bottom row shows sperm with intact membrane and low EGFP expression. 

      Author response images 3-5 provide insight into the relationship between FM4-64 and Sytox Blue fluorescence intensities in non-capacitated sperm (CTRL, Author response image 3), capacitated sperm and acrosome exocytosis events stimulated with 100 µM progesterone (PG, Author response image 4), and capacitated sperm stimulated with 20 µM ionomycin (IONO, Author response image 5). Two populations of sperm with Sytox Blue signals were clearly distinguished (Sytox+ and Sytox-), enabling the discernment between live and dead sperm. Interestingly, the upper right panels of Author response images 3A, 4A, and 5A (Sytox Blue+ / FM4-64 high) consistently show a positive correlation between FM4-64 and Sytox Blue. This observation aligns with the concern raised by Reviewer 2, suggesting that compromised membranes due to cell death provide more binding sites for FM4-64. 

      Nonetheless, the lower panels of Author response images 3A, 4A and 5A (Sytox Blue-) show no correlation with FM4-64 fluorescence, indicating that this population can exhibit either low or high FM4-64 fluorescence. As expected, in stark contrast with the CTRL case, the stimulation of AE with PG or IONO in capacitated sperm increased the population of live sperm with high FM4-64 fluorescence (Sytox Blue+ / FM4-64 high: CTRL: 7.85%, PG: 8.73%, IONO: 13.5%). 

      Single-cell examples are shown in Author response images 3B, 4B, and 5B, where the four categories are represented: dead sperm with low FM4-64 fluorescence (Sytox Blue+ / FM4-64 low), dead sperm with high FM4-64 fluorescence (Sytox Blue+ / FM4-64 high), live sperm with low FM4-64 fluorescence (Sytox Blue- / FM4-64 low), and live sperm with high FM4-64 fluorescence (Sytox Blue- / FM4-64 high). 

      Author response image 3.

      Relationship between cell death and FM4-64 fluorescence in capacitated sperm without inductor of RA. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM464 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (13.3%), Sytox Blue+ / FM4-64 high (49.8%), Sytox Blue- / FM4-64 low (28.1%), and Sytox Blue- / FM4-64 high (7.85%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A).

      Author response image 4.

      Relationship between cell death and FM4-64 fluorescence capacitated sperm stimulated with progesterone. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM4-64 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (9.04%), Sytox Blue+ / FM4-64 high (61.6%), Sytox Blue- / FM4-64 low (19.7%), and Sytox Blue- / FM4-64 high (8.73%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A)

      Author response image 5.

      Relationship between cell death and FM4-64 fluorescence capacitated sperm stimulated with ionomycin. Image-based flow cytometry analysis of non-capacitated mouse sperm loaded with FM464 and Sytox Blue dyes, with one and two minutes of incubation time, respectively. (A) The quadrants show: Sytox Blue+ / FM4-64 low (4.52%), Sytox Blue+ / FM4-64 high (60.6%), Sytox Blue- / FM4-64 low (20.5%), and Sytox Blue- / FM4-64 high (13.5%). Each quadrant indicates the percentage of the total sperm population exhibiting the corresponding staining pattern. Axes are presented on a log10 scale of arbitrary units of fluorescence. (B) Representative single-cell images corresponding to the four categorized sperm populations from the flow cytometry analysis in panel (A).

      Based on the data presented in Author response images 1 to 6, we derive the following conclusions summarized below:

      (1) There is no direct relationship between cell death (Sytox Blue-) and AE (EGFP) (Author response images 1 and 2).

      (2) There is bistability in the FM4-64 fluorescent intensity. Before reaching a certain threshold, there is no correlation between FM4-64 and Sytox Blue signals, indicating no cell death. However, after crossing this threshold, the FM4-64 signal becomes correlated with Sytox Blue+ cells, indicating cell death (Author response images 4-6).

      (3) The Sytox Blue- population of capacitated sperm is sensitive to AE stimulation with progesterone, leading to the expected increase in FM4-64 fluorescence.

      Therefore, while the FM4-64 signal alone is not a definitive marker for either AE or cell death, it is crucial to use additional viability assessments, such as Sytox Blue, to accurately differentiate between live and dead sperm in studies of acrosome exocytosis and sperm motility. In the present work, we did not use a cell viability marker due to the complex multicolor, multidimensional fluorescence experiments. However, cell viability was always considered, as any imaged sperm was chosen based on motility, indicated by a beating flagellum. The determination of whether selected sperm die during or after AE remains to be elucidated. The results presented in Figure 2 and Supplementary S1 show examples of motile sperm that experience an increase in FM4-64 fluorescence.

      All this information is added to the manuscript (Supplementary Figure 1D).

      (4) It is unclear how the structural change in the midpiece causes the entire sperm flagellum, including the principal piece, to stop moving. It will be easier for readers to understand if the authors discuss possible mechanisms.

      Response P2.4: As requested, we have incorporated a possible explanation in the discussion section (see line 644-656). We propose three possible hypotheses for the cessation of sperm motility, which can be attributed to the simultaneous occurrence of various events:

      (1) Rapid increase in [Ca2+]i levels: A rapid increase in [Ca2+]i levels may trigger the activation of Ca2+ pumps within the flagellum. This process consumes local ATP levels, disrupting glycolysis and thereby depleting the energy required for motility.

      (2) Reorganization of the actin cytoskeleton: Alterations in the actin cytoskeleton can lead to changes in the mechanical properties of the flagellum, impacting its ability to move effectively.

      (3) Midpiece contraction: Contraction in the midpiece region can potentially interfere with mitochondrial function, impeding the energy production necessary for sustained motility.

      (5) The mitochondrial sheath and cell membrane are very close together when observed by transmission electron microscopy. The image in Figure 9A with the large space between the plasma membrane and mitochondria is misleading and should be corrected. The authors state that the distance between the plasma membrane and mitochondria approaches about 100 nm after the acrosome reaction (Line 330 - Line 333), but this is a very long distance and large structural changes may occur in the midpiece. Was there any change in the mitochondria themselves when they were observed with the DsRed2 signal?

      Response P2.5: The authors appreciate the reviewer’s observation regarding the need to correct the image in Figure 9A, as the original depiction conveys a misleading representation of the spatial relationship between the mitochondrial sheath and the plasma membrane. This figure has been corrected to accurately reflect a more realistic proximity, while keeping in mind that it is a cartoonish representation.

      Regarding the comments about the distances mentioned between former lines 330 and 333, the measurement was not intended to describe the gap between the plasma membrane and the mitochondria but rather the distance between F-actin and the plasma membrane. 

      Author response image 6 shows high-resolution scanning electron microscopy (SEM) of two sperm fixed with a protocol tailored to preserve plasma membranes (ref), where the insets clearly show the flagellate architecture in the midpiece with an intact plasma membrane covering the mitochondrial network. A non-capacitated sperm with an intact acrosome is shown in panel A, and a capacitated sperm that has experienced AE is shown in panel B.

      Notably, the results depicted in Author response image 6 demonstrate that, irrespective of the AE status, the distance between the plasma membrane and mitochondria consistently remains less than 20 nm, thus confirming the close proximity of these structures in both physiological states. As Reviewer 2 pointed out, if there is no significant difference in the distance between the plasma membrane and mitochondria, then the observed structural changes in the actin network within the midpiece should somehow alter the actual deposition of mitochondria within the midpiece. Figure 5D-F shows that midpiece contraction is associated with a decrease in the helical pitch of the actin network; the distance between turns of the actin helix decreases from  l = 248  nm to  l = 159  nm. This implies a net change in the number of turns the helix makes per 1 µm, from 4 to 6 µm-1.

      Author response image 6.

      SEM image showing the proximity between plasma membrane and mitochondria. Scale bar 100 nm.

      Additionally, a structural contraction can be observed in Figure 5D-F, where the radius of the helix decreases by about 50 nm. To clarify this point, we sought to measure the deposition of individual DsRed2 mitochondria using computational superresolution microscopy—FF-SRM (SRRF and MSSR), Structured Illumination Microscopy (SIM), or a combination of both (SIM + MSSR), in 2D. Author response image 7 shows that these three approaches allow the observation of individual DsRed mitochondria; however, the complexity of their 3D arrangement, combined with the limited space between mitochondria (as seen in Author response image 6), precludes a reliable estimation of mitochondrial organization within the midpiece. To overcome these challenges, we decided to study the midpiece architecture via SEM experiments on non-capacitated versus capacitated sperm stimulated with ionomycin to undergo the AE.

      Author response image 7.

      Organization of mitochondria observed via FF-SRM and SIM. Scale bar 2 µm. F.N: Fluorescence normalized. F: Frequency

      Author response image 8 presents a single-cell comparison of the midpiece architecture in noncapacitated (NC) and acrosome-intact (AI) versus acrosome-reacted (AR) sperm, along with measurements of the midpiece diameter throughout its length. Notably, the diameter of the midpiece increases from the base of the head to more distal regions, ranging from 0.45 nm to 1.10 µm (as shown in Author response images 7 and 8). A significant correlation between the diameter of the flagellum and its curvature was observed (Author response image 9), suggesting a reorganization of the midpiece due to shearing forces. This is further exemplified in Author response images 8 and 9, which provide individual examples of this phenomenon.

      Author response image 8.

      Comparison of the midpiece architecture in acrosome-intact and acrosome-reacted sperm using scanning electron microscopy (SEM).

      As expected, the overall diameter of the midpiece in AI sperm was larger than in AR sperm, with measurements of 0.731 ± 0.008 µm for AI and 0.694 ± 0.007 µm for AR (p = 0.013, Kruskal-Wallis test n > 100, N = 2), as shown in Author response image 10. Additionally, this Author response image 7 indicates that the reorganization of the midpiece architecture involves a change in the periodicity of the mitochondrial network, with frequencies shifting from fNC to fEA mitochondria per micron.  

      Author response image 9.

      Comparison of the midpiece architecture in acrosome-intact (A) and acrosome-reacted (B) sperm using scanning electron microscopy (SEM).

      Collectively, the structural results presented in Figure 5 and Author response images 6 to 10 demonstrate that the AE involves a comprehensive reorganization of the midpiece, affecting its diameter, pitch, and the organization of both the actin and mitochondrial networks. All this information is now incorporated in the new version of the paper (Figure. 2F)

      Author response image 10.

      Quantification of the midpiece diameter of the sperm flagellum in acrosome-intact and acrosome-reacted sperm analyzed by scanning electron microscopy (SEM). Data is presented as mean ± SEM. Kruskal-Wallis test was employed,  p = 0.013 (AI n=85 , AR n=72).

      (6) In the TG sperm used, the green fluorescence of the acrosome disappears when sperm die. Figure 1C should be analyzed only with live sperm by checking viability with propidium iodide or other means.

      Response P2.6: We concur with Reviewer 2 that ideally, any experiment conducted for this study should include an intrinsic cell viability test. However, the current research employs a wide array of multidimensional imaging techniques that are not always compatible with, or might be suboptimal for, simultaneous viability assessments. In agreement with the reviewer's concerns, it is recognized that the data presented in Figure 1C may inherently be biased due to cell death. Nonetheless, Author response image 1 demonstrates that the relationship between AE and cell death is more complex than a straightforward all-or-nothing scenario. Specifically, Author response image 1C illustrates a case where the plasma membrane is compromised (Sytox Blue+) yet maintains acrosomal integrity (EGFP+). This observation contradicts Reviewer 1's assertion that "the green fluorescence of the acrosome disappears when sperm die," as discussed more comprehensively in response P2.3.

      In light of these observations, we have meticulously revisited the entire manuscript to address and clarify potential biases in our results due to cell death. Consequently, Author response image 5 and its detailed description have been incorporated into the supplementary material of the manuscript to contribute to the transparency and reliability of our findings.

      Reviewer #3 (Public Review):

      (1) While progressive and also hyperactivated motility are required for sperm to reach the site of fertilization and to penetrate the oocyte's outer vestments, during fusion with the oocyte's plasma membrane it has been observed that sperm motility ceases. Identifying the underlying molecular mechanisms would provide novel insights into a crucial but mostly overlooked physiological change during the sperm's life cycle. In this publication, the authors aim to provide evidence that the helical actin structure surrounding the sperm mitochondria in the midpiece plays a role in regulating sperm motility, specifically the motility arrest during sperm fusion but also during earlier cessation of motility in a subpopulation of sperm post acrosomal exocytosis. The main observation the authors make is that in a subpopulation of sperm undergoing acrosomal exocytosis and sperm that fuse with the plasma membrane of the oocyte display a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix. The authors show the decrease in midpiece diameter via various microscopy techniques all based on membrane dyes, bright-field images and other orthogonal approaches like electron microscopy would confirm those observations if true but are missing. The lack of additional experimental evidence and the fact that the authors simultaneously observe an increase in membrane dye fluorescence suggests that the membrane dyes instead might be internalized and are now staining intracellular membranes, creating a false-positive result. The authors also propose that the midpiece diameter decrease is driven by changes in sperm intracellular Ca2+ and structural changes of the actin helix network. Important controls and additional experiments are needed to prove that the events observed by the authors are causally dependent and not simply a result of sperm cells dying.

      Response P3.1: We appreciate the reviewer's observations and critiques. In response, we have expanded our experimental approach to include alternative methodologies such as mathematical modeling and electron microscopy, alongside further fluorescence microscopy studies. This diversified approach aims to mitigate potential interpretation artifacts and substantiate the validity of our observations regarding the contraction of the sperm midpiece. Additionally, we have implemented further control experiments to fortify the credibility and robustness of our findings, ensuring a more comprehensive and reliable set of results.

      First, we acknowledge the concerns raised by Reviewer 2 regarding the interpretation of the magnitude of the observed contraction of the sperm flagellum's midpiece (see response P2.5). Specifically, we believe that the assertion that "... there is a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix" stated by reviewer 3 needs careful examination. We recognize that the fluorescence microscopy data provided might not conclusively support such a substantial shift. Our live cell imaging and superresolution microscopy experiments indicate that there is a significant decrease in the diameter of the sperm flagellum associated with AE. This is supported by colocalization experiments where FM4-64-stained structures (fluorescing upon binding to membranes) are observed moving closer to Sir-Actinlabeled structures (binding to F-actin). Quantitatively, Figure S5 describes the spatial shift between FM4-64 and Sir-Actin signals, narrowing from a range of 140-210 nm to 50-110 nm (considering the 2nd and 3rd quartiles of the distributions). The mean separation distance between both signals changes from 180 nm in AI cells to 70 nm in AR cells, a net shift of 110 nm. This observation suggests caution regarding the claim of a "200 nm shift of the plasma membrane towards the actin cortex." 

      Moreover, the concerns raised by Reviewer #3 about the potential internalization of membrane dyes, which might create a false-positive result by staining intracellular membranes, offer an alternative mechanism to explain a shift of up to 100 nm. This perspective is also supported by the critique from Reviewer #2 regarding the substantial distance (about 100 nm) between the plasma membrane and mitochondria post-acrosome reaction:  “The authors state that the distance between the plasma membrane and mitochondria approaches about 100 nm after the acrosome reaction (…), but this is a very long distance and large structural changes may occur in the midpiece”. These insights have prompted us to refine our methodology and interpretation of the data to ensure a more accurate representation of the underlying biological processes.

      Author response image 11 shows a first principles approach in two spatial dimensions to explore three scenarios where a membrane dye, such as FM4-64, stains structures at and within the midpiece of a sperm flagellum, but yet does not result in a net change of diameter. Author response image 11A-C illustrates three theoretical arrangements of fluorescent dyes: Model 1 features two rigid, parallel structures that mimic the plasma membrane surrounding the midpiece of the flagellum. Model 2 builds on Model 1 by incorporating the possibility of dye internalization into structures located near the membrane, suggesting a slightly more complex interaction with nearby membranous intracellular structures. Model 3 represents an extreme scenario where the fluorescent dyes stain both the plasma membrane and internal structures, such as mitochondrial membranes, indicating extensive dye penetration and binding. Author response image 11D-F displays the convolution of the theoretical fluorescent signals from Models 1 to 3 with the theoretical point spread function (PSF) of a fluorescent microscope, represented by a Gaussian-like PSF with a sigma of 19 pixels (approximately 300 nm). This process simulates how each model's fluorescence would manifest under microscopic observation, showing subtle differences in the spatial distribution of fluorescence among the models. Author response image 11G-I reveals the superresolution images obtained through Mean Shift Super Resolution (MSSR) processing of the models depicted in Author response image 11D-F.

      By analyzing the three scenarios, it becomes clear that the signals from Models 2 and 3 shift towards the center compared to Model 1, as depicted in Author response image 11J. This shift in fluorescence suggests that the internalization of the dye and its interaction with internal structures might significantly influence the perceived spatial distribution and intensity of fluorescence, thereby impacting the interpretation of structural changes within the midpiece. Consequently, the experimentally observed contraction of up to 100 nm in  could represent an actual contraction of the sperm flagellum's midpiece, a relocalization of the FM4-64 membrane dyes to internal structures, or a combination of both scenarios.

      To discern between these possibilities, we implemented a scanning electron microscopy (SEM) approach. The findings presented in Figure 5 and Author response images 7 to 9 conclusively demonstrate that the AE involves a comprehensive reorganization of the midpiece. This reorganization affects its diameter, which changes by approximately 50 nm, as well as the pitch and the organization of both the actin and mitochondrial networks. This data corroborates the structural alterations observed and supports the validity of our interpretations regarding midpiece dynamics during the AE.

      Author response image 11.

      Modeling three scenarios of midpiece staining with membrane fluorescent dyes.

      Secondly, we wish to clarify that in some of our experiments, we have utilized changes in the intensity of FM4-64 fluorescence as an indirect measure of midpiece contraction. This approach is supported by a linear inverse correlation between these variables, as illustrated in Figure S2D. It is important to note that this observation is correlative and indirect; therefore, our data does not directly substantiate the claim that "in a subpopulation of sperm undergoing AE and sperm that fuse with the plasma membrane of the oocyte, there is a decrease in midpiece parameter due to a 200 nm shift of the plasma membrane towards the actin helix". Specifically, we have not directly measured the distance between the plasma membrane and actin cortex in experiments involving gamete fusion.

      All the concerns highlighted in this Response P1.1 have been addressed and incorporated into the manuscript. This addition aims to provide comprehensive insight into the experimental observations and methodologies used, ensuring that the data is transparent and accessible for thorough review and replication.

      Editor Comment:

      As the authors can see from the reviews, the reviewers had quite different degrees of enthusiasm, thus discussed extensively. The major points in consensus are summarized below and it is highly recommended that the authors consider their revisions.

      (1) Causality of midpiece contraction with motility arrest is not conclusively supported by the current evidence. Time-resolved imaging of FM4-64 and motility is needed and the working model needs to be revised with two scenarios - whether the sperm contracting indicates a fertilizing sperm or sperm to be degenerated.

      (2) The rationale for using FM4-64 as a plasma membrane marker is not clear as it is typically used as an endo-membrane marker, which is also related to the discrepancy of Fluo-4 signal diameter vs. FM4-64 (Figure 4E). The viability of sperm with increased FM4-64 needs to be demonstrated.

      (3) The mechanism of midpiece contraction in motility cessation along the whole flagellum is not discussed.

      (4) The use of an independent method to support the changes in midpiece diameter/structural changes such as DsRed (transgenic) or TEM.

      (5) The claim of Ca2+ change needs to be toned down.

      Response Editor: We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. We have addressed all these points in the current version. Briefly,

      (1) Time resolved images to show the correlation between FM4-64 fluorescence increase and the motility was incorporated

      (2) The rationale for using FM4-64 was added.

      (3) The mechanism of midpiece contraction was discussed in the paper

      (4) An independent method was included to support our conclusions (SEM and other markers not based on membrane dyes)

      (5) The results related to the calcium increase were toned down.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To claim midpiece actin polymerization/re-organization is required for AE, demonstrating that AE does not occur in the presence of actin depolymerizing drugs (e.g., Latrunculin A, Cytochalasin D) would be necessary since the current data only shows the association/correlation. Was the block of AE by actin depolymerization observed?

      Response R1.1: We agree with the reviewer but unfortunately, since actin polymerization and or depolymerization in the head are important for exocytosis, we cannot use this experimental approach to dissect both events. Addition of these inhibitors block the occurrence of AE (PMID: 12604633).

      (2) Please provide the rationale for using FM4-64 to visualize the plasma membrane since it has been reported to selectively stain membranes of vacuolar organelles. What is the principle of increase of FM4-64 dye intensity, other than the correlation with midpiece contraction? For example, in lines 400-402: the authors mentioned that 'some acrosomereacted moving sperm within the perivitelline space had low FM4-64 fluorescence in the midpiece (Figure 6C). After 20 minutes, these sperm stopped moving and exhibited increased FM4-64 fluorescence, indicating midpiece contraction (Figure 6D).' While recognizing the increase of FM4-64 dye intensity can be an indicator of midpiece contraction, without knowing how and when the intensity of FM4-64 dye changes, it is hard to understand this observation. Please discuss.

      Response R1.2: FM4-64 is an amphiphilic styryl fluorescent dye that preferentially binds to the phospholipid components of cell membranes, embedding itself in the lipid bilayer where it interacts with phospholipid head groups. Due to its amphiphilic nature, FM dyes primarily anchor to the outer leaflet of the bilayer, which restricts their internalization. It has been demonstrated that FM4-64 enters cells through endocytic pathways, making these dyes valuable tools for studying endocytosis.

      Upon binding, FM4-64's fluorescence intensifies in a more hydrophobic environment that restricts molecular rotation, thus reducing non-radiative energy loss and enhancing fluorescence. These photophysical properties render FM dyes useful for observing membrane fusion events. When present in the extracellular medium, FM dyes rapidly reach a chemical equilibrium and label the plasma membrane in proportion to the availability of binding sites.

      In wound healing studies, for instance, the fluorescence of FM4-64 is known to increase at the wound site. This increase is attributed to the repair mechanisms that promote the fusion of intracellular membranes at the site of the wound, leading to a rise in FM4-64 fluorescence. Similarly, an increase in FM4-64 fluorescence has been reported in the heads of both human and mouse sperm, coinciding with AE. In this scenario, the fusion between the plasma membrane and the acrosomal vesicle provides additional binding sites for FM4-64, thus increasing the total fluorescence observed in the head. This dynamic response of FM4-64 makes it an excellent marker for studying these cellular processes in real-time.

      This study is the first to report an increase in FM4-64 fluorescence in the midpiece of the sperm flagellum. Figures 5 and Author response images 6 to 9 demonstrate that during the contraction of the sperm flagellum, structural rearrangements occur, including the compaction of the mitochondrial sheath and other membranous structures. Such contraction likely increases the local density of membrane lipids, thereby elevating the local concentration of FM4-64 and enhancing the probability of fluorescence emission. Additionally, changes in the microenvironment such as pH or ionic strength during contraction might further influence FM4-64’s fluorescence properties, as detailed by Smith et al. in the Journal of Membrane Biology (2010). The photophysical behavior of FM4-64, including changes in quantum yield due to tighter membrane packing or alterations in curvature or tension, may also contribute to the increased fluorescence observed. Notably, Figure S2 indicates that other fluorescent dyes like Memglow 700, Bodipy-GM, and FM1-43 also show a dramatic increase in their fluorescence during the midpiece contraction. Investigating whether the compaction of the plasma membrane or other mesoscale processes occur in the midpiece of the sperm flagellum could be a valuable area for future research. The use of fluorescent dyes such as LAURDAN or Nile Red might provide further insights into these membrane dynamics, offering a more comprehensive understanding of the biochemical and structural changes during sperm motility and gamete fusion events.

      (3) As the volume of the whole midpiece stays the same while the diameter decreases along the whole midpiece (midpiece contraction), the authors need to describe what changes in the midpiece length they observe during the contraction. Was the length of the midpiece during the contraction measured and compared before and after contraction?

      Response R1.3: As requested, we have measured the length of the midpiece in AI and AR sperm. As shown in Author response image 12 (For review purposes only), no statistically significant differences were observed. 

      Author response image 12.

      Midpiece length measured by the length of mitochondrial DsRed2 fluorescence in EGFP-DsRed2 sperm. Measurements were done before (acrosome-intact) and after (acrosome-reacted) acrosome exocytosis and midpiece contraction. Data is presented as the mean ± sem of 14 cells induced by 10 µM ionomycin. Paired t-test was performed, resulting in no statistical significance. 

      (4) Most of all, it is not clear what the midpiece, thus mitochondria, contraction means in terms of sperm bioenergetics and motility cessation. Would the contraction induce mitochondrial depolarization or hyperpolarization, increase or decrease of ATP production/consumption? It will be great if this point is discussed. For example, an increase in mitochondrial Ca2+ is a good indicator of mitochondrial activity (ATP production).

      Response R1.4: That is an excellent point. We have discussed this idea in the discussion (line 620-624). We are currently exploring this idea using different approaches because we also think that these changes in the midpiece may have an impact in the function of the mitochondria and perhaps, in their fate once they are incorporated in the egg after fertilization. 

      (5) The authors claimed that Ca2+ signal propagates from head to tail, which is the opposite of the previous study (PMID: 17554080). Please clarify if it is a speculation. Otherwise, please support this claim with direct experimental evidence (e.g., high-speed calcium imaging of single cells).

      Response R1.5: In that study, it was claimed that a [Ca2+]i  increase that propagates from the tail to the head occurs when CatSper is stimulated. They did not evaluate the occurrence of AE when monitoring calcium.

      Our data is in agreement with our previous results (PMID: 26819478) that consistently indicated that only the[Ca2+]i  rise originating in the sperm head is able to promote AE. 

      (6) Figure 4E: Please explain how come Fluo4 signal diameter can be smaller than FM4-64 dye if it stains plasma membrane (at 4' and 7').

      Response R1.6: When colocalizing a diffraction-limited image (Fluo4) with a super-resolution image (FM4-64), discrepancies in signal sizes and locations can become apparent due to differences in resolution. The Fluo4 signal, being diffraction-limited, adheres to a resolution limit of approximately 200-300 nanometers under conventional light microscopy. This limitation causes the fluorescence signal to appear broader and less defined. Conversely, super-resolution microscopy techniques, such as SRRF (Super-Resolution Radial Fluctuations), achieve resolutions down to tens of nanometers, allowing FM4-64 to reveal finer details at the plasma membrane and display potentially smaller apparent sizes of stained structures. Although both dyes might localize to the same cellular regions, the higher resolution of the FM4-64 image allows it to show a more precise and smaller diameter of the midpiece of the flagellum compared to the broader, less defined signal of Fluo4. To address this, the legend of Figure 4E has been slightly modified to clarify that the FM4-64 image possesses greater resolution. 

      (7) Figure 5D-G: the midpiece diameter of AR intact cells was shown ~ 0.8 um or more in Figure 2, while now the radius in Figure 5 is only 300 nm. Since the diameter of the whole midpiece is nearly uniform when the acrosome is intact, clarify how and what brings this difference and where the diameter/radius measurement is done in each figure.

      Response R1.7: The difference resides in what is being measured. In Figure 2, the total diameter of the cell is measured, through the maximum peaks of FM4-64 fluorescence which is a probe against plasma membrane. As for Figure 5, the radius shown makes reference to the radius of the actin double helix within the midpiece. To that end, cells were fixed and stained with phalloidin, a F-actin probe.

      Minor points

      (8) Figure S1 title needs to be changed. The "Midpiece contraction" concept is not introduced when Figure S1 is referred to.

      Response R1.8: This was corrected in the new version.

      (9) Reference #19: the authors are duplicated.

      Response R1.9: This was corrected in the new version.

      (10) Line 315-318: sperm undergoing contraction -> sperm undergoing AR/AE?

      Response R1.10: This was corrected in the new version.

      (11) Line 3632 -> punctuation missing.

      Response R1.11: Modified as requested.

      (12) Movie S7: please add an arrow to indicate the spermatozoon of interest.

      Response R1.12:  The arrow was added as suggested.

      (13) Line 515: One result of this study was that the sperm flagellum folds back during fusion coincident with the decrease in the midpiece diameter. The authors did not provide an explanation for this observation. Please speculate the function of this folding for the fertilization process.

      Response R1.13: As requested, this is now incorporated in the discussion. We speculate that the folding of the flagellum during fusion further facilitates sperm immobilization because it makes it more difficult for the flagellum to beat. Such processes can enhance stability and increase the probability of fusion success. Mechanistically, the folding may occur as a consequence of the deformation-induced stress that develops during the decrease of midpiece diameter. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2C, D, E. Does "-1" on the X-axis mean one minute before induction? If so, the diameter is already smaller and FM4-64 fluorescence intensity is higher before the induction in the spontaneous group. Does the acrosome reaction already occur at "-1" in this group?

      Response R2.1: Yes, “-1” means that the measurements of the diameter/FM4-64 fluorescence was done one minute before the induction. And it is correct that the diameter is smaller and FM464 fluorescence higher in the spontaneous group because these sperm underwent acrosome exocytosis before the induction, that is, spontaneously.

      (2) Figure 3D. Purple dots are not shown in the graph on the right side.

      Response R2.2: Modified as requested.

      (3) Lines 404-406. "These results suggest that midpiece contraction and motility cessation occur only after acrosome-reacted sperm penetrate the zona pellucida". Since midpiece contraction and motility cessation also occur before the passage through the zona pellucida (Figure 9B), "only" should be deleted.

      Response R2.3: Modified as requested.

      Reviewer #3 (Recommendations For The Authors):

      (1) Do the authors have a hypothesis as to why the observed decrease in midpiece parameter results in cessation of sperm motility? It would be beneficial for the manuscript to include a paragraph about potential mechanisms in the discussion.

      Response R3.1: As requested, a potential mechanism has been proposed in the discussion section (line 644-656).

      (2) Since the authors propose in Gervasi et al. 2018 that the actin helix might be responsible for the integrity of the mitochondrial sheath and the localization of the mitochondria, is it possible that the proposed change in plasma membrane diameter and actin helix remodeling for example alters the localization of the mitochondria? TEM should be able to reveal any associated structural changes. In its current state, the manuscript lacks experimental evidence supporting the author's claim that the "helical actin structure plays a role in the final stages of motility regulation". The authors should either include additional evidence supporting their hypothesis or tone down their conclusions in the introduction and discussion.

      Response  R3.3: We agree with the reviewer. This is an excellent point. As suggested by this reviewer as well as the other reviewers, we have performed SEM to observe the changes in the midpiece observed after its contraction for two main reasons. First, to confirm this observation using a different approach that does not involve the use of membrane dyes. As shown in Author response image 6-10, we have observed that in addition to the midpiece diameter, there is a reorganization of the mitochondria sheet that is also suggested by the SIM experiments. These observations will be explored with more experiments to confirm the structural and functional changes that mitochondria undergo during the contraction. We are currently investigating this phenomenon, These results are now included in the new Figure  2F.

      (3) In line 134: The authors write: 'Some of the acrosome reacted sperm moved normally, whereas the majority remained immotile". Do the authors mean that a proportion of the sperm was motile prior to acrosomal exocytosis and became immotile after, or were the sperm immotile to begin with? Please clarify.

      Response R3.4: This statement is based on the quantification of the motile sperm after induction of AE within the AR population (Fig. 1C). 

      (4) The authors do not provide any experimental evidence supporting the first scenario. In video 1 a lot of sperm do not seem to be moving to begin with, only a few sperm show clear beating in and out of the focal plane. The highlighted sperm that acrosome-reacted upon exposure to progesterone don't seem to be moving prior to the addition of progesterone. In contrast, the sperm that spontaneously acrosome react move the whole time. In video 1 this reviewer was not able to identify one sperm that stopped moving upon acrosomal exocytosis. Similarly in video 3, although the resolution of the video makes it difficult to distinguish motile from non-motile sperm. In video 2 the authors only show sperm that are already acrosome reacted. Please explain and provide additional evidence and statistical analysis supporting that sperm stop moving upon acrosomal exocytosis.

      Response R3.5: In videos 1 and 3, the cells are attached to the glass with concanavalin-A, this lectin makes sperm immotile (if well attached) because both the head and tail stick to the glass. The observed motility of sperm in these videos is likely due to them not being properly attached to the glass, which is completely normal. On the contrary, in videos 2 and 4, sperm are attached to the glass with laminin. This is a glycoprotein that only binds the sperm to the glass through its head, that is why they move freely.

      (5) Could the authors provide additional information about the FM4-64 fluorescent dye?

      What is the mechanism, and how does it visualize structural changes at the flagellum? Since the whole head lights up, does that mean that the dye is internalized and now stains additional membranes, similar to during wound healing assays (PMID 20442251, 33667528). Or is that an imaging artifact? How do the authors explain the correlation between FM4-64 fluorescence increase in the midpiece and the observed change in diameter? Does FM4-64 have solvatochromatic properties?

      Response R3.6: We appreciate the insightful queries posed by Reviewer 3, which echo the concerns initially brought forward by Reviewer 1. For a detailed explanation of the mechanism of FM4-64 dye, how we interpret  it, visualizes structural changes in the flagellum, and its behavior during cellular processes, please refer to our detailed response in Response R1.2. In brief, FM464 is a lipophilic styryl dye that preferentially binds to the outer leaflets of cellular membranes due to its amphiphilic nature. Upon binding, the dye becomes fluorescent, allowing for the visualization of membrane dynamics. The increase in fluorescence in the sperm head or midpiece likely results from the dye’s accumulation in areas where membrane restructuring occurs, such as during AE or in response to changes in the flagellum structure.

      Regarding the specific questions about internalization and whether FM4-64 stains additional membranes similarly to what is observed in wound healing assays, it's important to note that FM4-64 can indeed be internalized through endocytosis and subsequently label internal vesicular structures. Additionally, FM4-64 may experience changes in its fluorescence as a result of fusion events that increase the lipid content of the plasma membrane, as observed in studies cited (PMID 20442251, 33667528). This characteristic makes FM4-64 valuable not only for outlining cell membranes but also for tracking the dynamics of both internal and external membrane systems, particularly during cellular events that involve significant membrane remodeling, such as wound healing or AE.

      Concerning whether the increased fluorescence and observed changes in diameter are artifacts or reflect real biological processes, the correlation observed likely indicates actual changes in the midpiece architecture through molecular mechanisms that remain to be further elucidated. The data presented in Figures 5 and Author response images 6-10 support that this increase in fluorescence is not merely an artifact but a feature of how FM4-64 interacts with its environment. 

      Finally, regarding the solvatochromatic properties of FM4-64, while the dye does show changes in its fluorescence intensity in different environments, its solvatochromatic properties are generally less pronounced than those of dyes specifically designed to be solvatochromatic. FM464's fluorescence changes are more a result of membrane interaction dynamics and dye concentration than of solvatochromatic shifts. 

      (6) For the experiment summarized in Figure S1, did the authors detect sperm that acrosome-reacted upon exposure to progesterone and kept moving? This reviewer is wondering how the authors reliably measure FM4-64 fluorescence if the flagellum moves in and out of the focal plane. If the authors observe sperm that keep moving, what was the percentage within a sperm population and how did FM4-64 fluorescence change?

      Response R3.6: We did identify sperm that underwent acrosome reaction upon exposure to progesterone and continued to exhibit movement. However, due to the issue raised by the reviewer regarding the flagellum going out of focus, we opted to quantify the percentage of sperm that were adhered to the slide (using laminin). This approach allows for the observation of flagellar position over time, facilitating an easy assessment of fluorescence changes. The percentage of sperm that maintained movement after AE is depicted in Figure 1C.

      (7) In Figure S1B it doesn't look like the same sperm is shown in all channels or time points, the hook shown in the EGFP channel is not always pointing in the same direction. If FM4-64 is staining the plasma membrane, how do the authors explain that the flagellum seems to be more narrow in the FM4-64 channel than in the brightfield and DsRed2 channel?

      Response 3.7: It is the same sperm, but due to technical limitations images were sequentially acquired. For example, for time 5 minutes after progesterone, all images in DIC were taken, then all images in the EGFP channel, then DsRed2* and finally FM4-64. The reason for this was to acquire images as fast as possible, particularly in DIC images which were then processed to get the beat frequency.

      Regarding the flagellum that seems to be more narrow in the FM4-64 channel compared to the BF or DsRed2 channel, the explanation is related to the fact that intensity of the DsRed2 signal is stronger than the other two. This higher signal may have increased the amount of photons captured by the detector.

      (8) Overall, it would be beneficial to include statistics on how many sperm within a population did change FM4-64 fluorescence during AE and how many did not, in addition to information about motility changes and viability. Did the authors exclude that the addition of FM4-64 causes cell death which could result in immotile sperm or that only dying sperm show an increase in FM4-64 fluorescence?

      Response 3.8: The relationship between cell death and the increase in FM4-64 fluorescence is widely discussed in Response P2.3. In our experiments, we always considered sperm that were motile to hypothesize about the relevance of this observation. We have two types of experiments: 

      (1) Sperm-egg Fusion: In experiments where sperm and eggs were imaged to observe their fusion, sperm were initially moving and after fusion, the midpiece contraction (increase in FM4-64 fluorescence was observed) indicating that the change in the midpiece (that was observed consistently in all fusing cells analyzed), is part of the process. 

      (2) Sperm that underwent AE: we have observed two behaviours as shown in Figure 1: 

      a) Sperm that underwent AE and they remain motile without midpiece contraction (they are alive for sure); 

      b) Sperm that underwent AE and stopped moving with an increase in FM464 fluorescence. We propose that this contraction during AE is not desired because it will impede sperm from moving forward to the fertilization site when they are in the female reproductive tract. In this case, we acknowledge that the cessation of sperm motility may be attributed to cellular death, potentially correlating with the increased FM4-64 signal observed in the midpiece of immotile sperm that have undergone AE. To address this hypothesis, we conducted image-based flow cytometry experiments, which are well-suited for assessing cellular heterogeneity within large populations.

      Regarding the relationship between the increase in FM4-64 and AE, we have always observed that AE is followed by an increase in FM4-64 in the head in mice (PMID: 26819478) as well as in human (PMID: 25100708) sperm. This was originally corroborated with the EGFP sperm. However, not all the cells that undergo AE increase the FM4-64 fluorescence in the midpiece.

      (9) The authors report that a fraction of sperm undergoes AE without a change in FM4-64 fluorescence (Figure 1F). How does the [Ca2+]i change in those cells? Again statistics on the distribution of a certain pattern within a population in addition to showing individual examples would be very helpful.

      Response 3.9: A recent work shows that an initial increase in [Ca2+]i  is required to induce changes in flagellar beating necessary for hyperactivation (Sánchez-Cárdenas et al., 2018). However, when [Ca2+]i  increases beyond a certain threshold, flagellar motility ceases. These conclusions are based on single-cell experiments in murine sperm with different concentrations of the Ca2+ ionophore, A23187. The authors reported that complete loss of motility was observed when using ionophore concentrations higher than 1 μM. In contrast, spermatozoa incubated with 0.5 μM A23187 remained motile throughout the experiment. Once the Ca2+ ionophore is removed, the sperm would reduce the concentration of this ion to levels compatible with motility and hyperactivation (Navarrete et al., 2016). However, some of the washed cells did not recover mobility in the recorded time window (Sánchez-Cárdenas et al., 2018). These results would indicate that due to the increase in [Ca2+]i  induced by the ionophore, irreversible changes occurred in the sperm flagellum that prevented recovery of mobility, even when the ionophore was not present in the recording medium. 

      Taking into account our results, one possible scenario to explain this irreversible change would be the contraction of the midpiece. Our results demonstrate that the increase in [Ca2+]i observed in the midpiece (whether by induction with progesterone, ionomycin or occurring spontaneously) causes the contraction of this section of the flagellum and its subsequent immobilization. 

      (10) While the authors results show that changes in [Ca2+]i correlate with the observed reduction of the midpiece diameter, they do not provide evidence that the structural changes are triggered by Ca2+i influx. It could just be a coincidence that both events spatially overlap and that they temporarily follow each other. The authors should either provide additional evidence or tone down their conclusion.

      Response 3.10: We agree with the reviewer. As suggested, we have toned down our conclusion.

      (11) Are the authors able to detect the changes in the midpiece diameter independent from FM4-64 or other plasma membrane dyes? An alternative explanation could be that the dyes are internalized due to cell death and instead of staining the plasma membrane they are now staining intracellular membranes, resulting in increased fluorescence and giving the illusion that the midpiece diameter decreased. How do the authors explain that the Bodipy-GM1 Signal directly overlaps with DsRed2 and SIR-actin, shouldn't there be some gap? Since the rest of the manuscript is based on that proposed decrease in midpiece diameter the authors should perform orthogonal experiments to confirm their observation.

      Response 3.11: As requested by the reviewer, we have not used new methods to visualize the change in sperm diameter in the midpiece. In neither of them, a membrane dye was used. First, we have performed immunofluorescence to detect a membrane protein (GLUT3). Second, we have used scanning electron microscopy. The results are now incorporated in the new Figure 2FG. In both experiments, a change in the midpiece diameter was observed. Please, also visit responses P2.5 and Author response images 8 to 10.  

      Regarding the overlap between the signal of Bodipy GM1 (membrane) and the fluorescence of DsRed2 (mitochondria) and Sir-Actin (F-actin), it is only observed in acrosomereacted sperm, not in acrosome-intact sperm (Figure S4). In our view, these structures become closed after midpiece contraction, and the resolution of the images is insufficient to distinguish them clearly. This issue is also evident in Figure 5B. Therefore, we conducted additional experiments using more powerful super-resolution techniques such as STORM (Figures 5D-F).

      (12) The proposed gap of 200 nM between the actin helix and the plasma membrane, has been observed by TEM? Considering that the diameter of the mouse sperm midpiece is about 1 um, that is a lot of empty space which leaves only about 600 nm for the rest of the flagellum. The axoneme is 300 nm and there needs to be room for the ODFs and the mitochondria. Please explain.

      Response 3.12: Unfortunately, the filament of polymerized actin cannot be observed by TEM. Furthermore, we were discouraged from trying other approaches, such as utilizing phalloidin gold, because for some reason, it does not work properly.

      In our view, the 200 nm gap between the actin cytoskeleton and the plasma membrane is occupied by the mitochondria (that is the size that it is frequently reported based on TEM; see https://doi.org/10.1172/jci.insight.166869).

      (13) The results provided by the authors do not convince this reviewer that the actin helix moves, either closer to the plasma membrane or toward the mitochondria, the observed differences are minor and not confirmed by statistical analysis.

      Response 3.13: As requested, the title of that section was changed. Moreover, our conclusion is exactly as the reviewer is suggesting: “Since the results of the analysis of SiR-actin slopes were not conclusive, we studied the actin cytoskeleton structure in more detail”. This conclusion is based on the statistical analysis shown in Figure S5D-E.

      (14) The fluorescence intensity of all plasma membrane dyes increases in all cells chosen by the authors for further analysis. Could the increase in SiR-Actin fluorescence be explained by a microscopy artifact instead of actin helix remodeling? Alternatively, can the authors exclude that the observed increase in SIR-Actin might be an artifact caused by the increase in FM4-64 fluorescence? Since the brightness in the head similarly increases to the fluorescence in the flagellum the staining pattern looks suspiciously similar. Did the authors perform single-stain controls?

      Response 3.14: We had similar concerns when we were doing the experiments using SiR-actin. Although we have performed single stain controls to make sure that the actin helix remodelling occurs during the midpiece contraction, we have performed experiments using higher resolution techniques such as STORM using a different probe to stain actin (Phalloidin).

      (15) Should actin cytoskeleton remodeling indeed result in a decrease of actin helix diameter, what do the authors propose is the underlying mechanism? Shouldn't that result in changes in mitochondrial structure or location and be visible by TEM? This reviewer is also wondering why the authors focus so much on the actin helix, while the plasma membrane based on the author's results is moving way more dramatically.

      Response 3.15: This raises an intriguing point. Currently, we lack an understanding of the underlying mechanism driving actin remodeling, and we are eager to conduct further experiments to explore this aspect. For instance, we are investigating the potential role of Cofilin in remodeling the F-actin network. Initial experiments utilizing STORM imaging have revealed the localization of Cofilin in the midpiece region, where the actin helix is situated.

      Regarding mitochondria, thus far, we have not uncovered any evidence suggesting that acrosome reaction or fusion with the egg induces a rearrangement of these organelles within the structure. The rationale for investigating polymerized actin in depth stems from the fact that, alongside the axoneme and other flagellar structures such as the outer dense fibers and fibrous sheet, these are the sole cytoskeletal components present in that particular tail region.

      (14) The fact that the authors observe that most sperm passing through the zona pellucida, which requires motility, display high FM4-64 fluorescence, doesn't that contradict the authors' hypothesis that midpiece contraction and motility cessation are connected? Videos confirming sperm motility and information about pattern distribution within the observed sperm population in the perivitelline space should be provided.

      Response 3.14: We believe it is a matter of time, as depicted in Figure 1D, our model shows that first the cells lose the acrosome, present motility and low FM4-64 fluorescence in the midpiece (pattern II) and after that, they lose motility and increase FM4-64 fluorescence in the midpiece (pattern III). That is why, we think that when sperm pass the zona pellucida they present pattern II and after some time they evolve into pattern III. 

      (15) In the experiments summarized in Figure 8, did all sperm stop moving? Considering that 74 % of the observed sperm did not display midpiece contraction upon fusion, again doesn't that contradict the authors' hypothesis that the two events are interdependent? Similarly, in earlier experiments, not all acrosome-reacted sperm display a decrease in midpiece diameter or stop moving, questioning the significance of the event. If some sperm display a decrease in midpiece diameter and some don't, or undergo that change earlier or later, what is the underlying mechanism of regulation? The observed events could similarly be explained by sperm death: Sperm are dying × plasma membrane integrity changes and plasma membrane dyes get internalized × [Ca2+]i simultaneously increases due to cell death × sperm stop moving.

      Response 3.15: The percentage of sperm that did not exhibit midpiece contraction in Fig.8B is 26%, not 74%, indicating that it does not contradict our hypothesis. However, this still represents a significant portion of sperm that remain unchanged in the midpiece, leaving room for various explanations. For instance, it's possible that: i) the change in fluorescence was not detected due to the event occurring after the recording concluded, or ii) in some instances, this alteration simply does not occur. Nevertheless, we did not track subsequent events in the oocyte, such as egg activation, to definitively ascertain the success of fusion. Incorporation of the dye only manifests the initiation of the process.

      (16) The authors propose changes in Ca2+ as one potential mechanism to regulate midpiece contraction, however, the Ca2+ measurements during fusion are flawed, as the authors write in the discussion, by potential Ca2+ fluorophore dilution. Considering that the authors observe high Ca2+ in all sperm prior to fusion, could that be a measuring artifact? Were acrosome-intact sperm imaged with the same settings to confirm that sperm with low and high Ca2+ can be distinguished? Should [Ca2+]i changes indeed be involved in the regulation of motility cessation during fusion, could the authors speculate on how [Ca2+]i changes can simultaneously be involved in the regulation of sperm hyperactivation?

      Response 3.16: We agree with the reviewer that our experiments using calcium probes are not conclusive for many technical problems. We have toned down our conclusions in the new version of the manuscript.

      (17) 74: AE takes place for most cells in the upper segment of the oviduct, not all of them.

      Please correct.

      Response 3.17: Corrected in the new version.

      (18) 88: Achieved through, or achieved by, please correct.

      Response 3.18: Corrected in the new version.

      (19) 243: Acrosomal exocytosis initiation by progesterone, please specify.

      Response 3.19: Modified in the new version.

      (20) 277: "The actin cytoskeleton approaches the plasma membrane during the contraction of the midpiece" is misleading. The author's results show the opposite.

      Response 3.20: As suggested, this statement was modified.

      (21) 298: Why do the authors find it surprising that the F-actin network was unchanged in acrosome-intact sperm that do not present a change in midpiece diameter?

      Response 3.21: The reviewer is right. The sentence was modified.

      (22) Figures 5D,F: The provided images do not support a shift in the actin helix diameter.

      Response 3.22: The shift in the actin helix diameter is provided in Figure 5E and 5G.

      (23) Figure S5C: The authors should show representative histograms of spontaneously-, progesterone induced-, and ionomycin-induced AE. Based on the quantification the SiRactin peaks don't seem to move when the AR is induced by progesterone.

      Response 3.23: As requested, an ionomycin induced sperm is incorporated.

      (24) 392: Which experimental evidence supports that statement?

      Response 3.24: A reference was incorporated. 

      Reference 13 is published, please update. Response 3.25: updated as requested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Using the UK Biobank, this study assessed the value of nuclear magnetic resonance measured metabolites as predictors of progression to diabetes. The authors identified a panel of 9 circulating metabolites that improved the ability in risk prediction of progression from prediabetes to diabetes. In general, this is a well-performed study, and the findings may provide a new approach to identifying those at high risk of developing diabetes. I have some comments that may improve the importance of this study.

      We deeply appreciate the reviewer's invaluable time dedicated to the review of this manuscript and the insightful comments to enhance its overall quality.

      (1) It is unclear why the authors only considered the top 20 variables in the metabolite selection and why they did not set a wider threshold.

      Thank you for the comment. We set the top 20 variables in the metabolite selection balancing the performance of the final diabetes risk prediction model and the clinical applicability due to measurement costs. We have added this explanation in the “Methods” section.

      “We chose the intersection set of the top 20 most important variables selected by the three machine learning models, after balancing the performance of the final diabetes risk prediction model and the clinical applicability associated with measurement costs of metabolites.”

      (2) The methods section would benefit from a more detailed exposition of how parameter tuning was conducted and the range of parameters explored during the training of the RSF model.

      According to the reviewer’s suggestion, we have added a more detailed description of parameters tunning and the range of parameters explored during the training of the RSF model in the “Method S3” section in the Supplementary material.

      “The RSF model was fitted using the “randomForestSRC” package and the grid search method was used for hyperparameter tuning. Specifically, the grid search method was used to tune hyperparameters among the RSF model, through minimizing out-of-sample or out-of-bag error1. Each tree in the RSF is constructed from a random sample of the data, typically a bootstrap sample or 63.2% of the sample size (as in the present study). Consequently, not all observations are used to construct each tree. The observations that are not used in the construction of a tree are referred to as out-of-bag observations. In an RSF model, each tree is built from a different sample of the original data, so each observation is “out-of-bag” for some of the trees. The prediction for an observation can then be obtained using only those trees for which the observation was not used for the construction. A classification for each observation is obtained in this way and the error rate can be estimated from these predictions. The resulting error rate is referred to as the out-of-bag error. Through calculating the out-of-bag error in each iteration, the best hyperparameters were finally determined.

      The hyperparameters to be tuned and range of grid search in the present study were below: number of trees (50-1000, by 50), number of variables to possibly split at each node (3-6, by 1), and minimum size of terminal node (1-20, by 1)2.”

      (3) It is hard to understand the meaning of the decision curve analysis and the clinical implications behind the net benefit, which are required to clarify the application values of models.

      Thank you for the comment. We have added more description and discussion about the decision curve analysis in the “Methods” and “Discussion” sections.

      “Furthermore, we used decision curve analysis (DCA) to assess the clinical usefulness of prediction model-based guidance for prediabetes management, which calculates a clinical “net benefit” for one or more prediction models in comparison to default strategies of treating all or no patients3.”

      “Most importantly, a model with good discrimination does not necessarily have high clinical value. Hence, DCA was used to compare the clinical utility of the model before and after adding the metabolites, and this showed a higher net benefit for the latter than the basic model, suggesting the addition of the metabolites increased the clinical value of prediction, i.e., the potential benefit of guiding management in individuals with prediabetes3,4. These results provided novel evidence supporting the value of metabolic biomarkers in risk prediction and stratification for the progression from prediabetes to diabetes.”

      (4) Notably, the NMR platform utilized within the UK Biobank primarily focused on lipid species. This limitation should be discussed in the manuscript to provide context for interpreting the results and acknowledge the potential bias from the measuring platform.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) The manuscript should explain the potential influence of non-fasting status on the findings, particularly concerning lipoprotein particles and composition. There should be a detailed discussion of how non-fasting status may impact the measurement and the findings.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (6) Cross-platform standardization is an issue in metabolism, and further descriptions of quality control are recommended.

      Thank you for the comment. We have added more description of quality control in the “Method S1” section in the Supplementary material.

      “Metabolic biomarker profiling by Nightingale Health’s NMR platform provides consistent results over time and across spectrometers. Furthermore, the sample preparation is minimal in the Nightingale Health’s metabolic biomarker platform, circumventing all extraction steps. These aspects result in highly repeatable biomarker measurements. Pre-specified quality metrics were agreed between UK Biobank and Nightingale Health to ensure consistent results across the samples, and pilot measurements were conducted. Nightingale Health performed real-time monitoring of the measurement consistency within and between spectrometers throughout the UK Biobank samples. Two control samples provided by Nightingale Health were included in each 96-well plate for tracking the consistency across multiple spectrometers. Furthermore, two blind duplicate samples provided by the UK Biobank were included in each well plate, with the position information unlocked only after results delivery. Coefficient of variation (CV) targets across the metabolic biomarker profile were pre-specified for both Nightingale Health’s internal control samples and UK Biobank’s blind duplicates. The targets were met for each consecutively measured batch of ~25,000 samples. For the majority of the metabolic biomarkers, the CVs were below 5% (https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=3000). Further, the distributions of measured biomarkers from 5 sample batches indicated absence of batch effects (https://biobank.ctsu.ox.ac.uk/ukb/ukb/docs/nmrm_app1).”

      Reviewer #2 (Public Review):<br /> Deciphering the metabolic alterations characterizing the prediabetes-diabetes spectrum could provide early time windows for targeted preventive measures to extend precision medicine while avoiding disproportionate healthcare costs. The authors identified a panel of 9 circulating metabolites combined with basic clinical variables that significantly improved the prediction from prediabetes to diabetes. These findings provided insights into the integration of these metabolites into clinical and public health practice. However, the interpretation of these findings should take account of the following limitations.

      We appreciate the reviewer’s positive comments and encouragement.

      (1) First, the causal relationship between identified metabolites and diabetes or prediabetes deserves to be further examined particularly when the prediabetic status was partially defined. Some metabolites might be the results of prediabetes rather than the casual factors for progression to diabetes.

      Thank you for your insightful comments. We agree with you that the panel of metabolites in this study might not be the causal factor for progression from prediabetes to diabetes, which needs further validation in experimental studies. We have added this limitation in the “Discussion” section.

      “Fifth, we could not draw any conclusion about the causality between the identified metabolites and the risk for progression to diabetes due to the observational nature, which remained to be validated in further experimental studies.”

      (2) The blood samples were taken at random (not all in a non-fasting state) and so the findings were subjected to greater variability. This should be discussed in the limitations.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (3) The strength of NMR in metabolic profiling compared to other techniques (i.e., mass spectrometry [MS], another commonly used metabolic profiling method) could be added in the Discussion section.

      According to the reviewer’s suggestion, we have added the strength of NMR in metabolic profiling compared to other techniques in the “Discussion” section.

      “Circulating metabolites were quantified via NMR-based metabolome profiling within the UK Biobank, which offers metabolite qualification with relatively lower costs and better reproducibility6.”

      (4) Fourth, the applied platform focuses mostly on lipid species which may be a limitation as well.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in the “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) It is a very large group with pre-diabetes, but the results only apply to prediabetes and not to the general population. This should be clear, although the authors have also validated the predictive value of these metabolites in the general population.

      Thank you for the comment. We agree with you that the results only apply to prediabetes and not to the general population, though they also showed potential predictive value among participants with normoglycemia. We have accordingly modified the relevant expressions in the “Conclusion” section to restrict these findings to participants with prediabetes.

      “In this large prospective study among individuals with prediabetes, we detected a panel of circulating metabolites that were associated with an increased risk of progressing to diabetes.”

      Recommendations for the Authors:

      Thank you for providing the valuable feedback and the time you have dedicated to our work.

      (1) In the first paragraph of the Discussion section, please include the specific names of the metabolites selected from machine learning methods.

      Thank you for your comment and we have added accordingly in the first paragraph of the “Discussion” section.

      “More importantly, our findings suggested that adding the selected metabolites (i.e., cholesteryl esters in large HDL, cholesteryl esters in medium VLDL, triglycerides in very large VLDL, average diameter for LDL particles, triglycerides in IDL, glycine, tyrosine, glucose, and docosahexaenoic acid) could significantly improve the risk prediction of progression from prediabetes to diabetes beyond the conventional clinical variables.”

      (2) To enhance the readability and simplicity of the paper, the description of covariate collection in the methods section should be streamlined, with detailed information provided in the supplementary materials.

      Thank you for your suggestion and we have moved details about covariates collection to the “Supplementary method S2” to enhance the readability and simplicity of the paper.

      “Information on covariates was collected through a self-completed touchscreen questionnaire or verbal interview at baseline, including age, sex, ethnicity, Townsend deprivation index, household income, education, employment status, smoking status, moderate alcohol, physical activity, healthy diet score, healthy sleep score, family history of diabetes, history of cardiovascular disease (CVD), history of hypertension, history of dyslipidemia, history of chronic lung diseases (CLD), and history of cancer.

      Physical measurements included systolic (SBP) and diastolic blood pressure (DBP), height, weight, waist circumference (WC), and hip circumference (HC). Body mass index (BMI) was calculated as weight in kilograms divided by the square of height in meters (kg/m²). Missing covariates were imputed by the median value for continuous variables and a missing indicator for categorical variables. More details about covariates collection can be found in Method S2.”

      3. Title for Table 2, using Cox proportional hazards prediction models is not common. You may consider the title "Performance of Cox proportional hazards regression models in prediction of progression of prediabetes to diabetes".

      Thank you for your suggestion and we have revised it accordingly.

      4. Figure 3, did the authors consider competing risk to compute cumulative incidence function?

      Thank you for your comment. We did not consider competing risk from death when plotting the cumulative hazard curves. However, following your suggestion, we have included an additional cumulative hazard plot after considering the competing

      References

      (1) Janitza S, Hornung R. On the overestimation of random forest's out-of-bag error. PLoS One. 2018;13(8):e0201904.

      (2) Tian D, Yan HJ, Huang H, et al. Machine Learning-Based Prognostic Model for Patients After Lung Transplantation. JAMA Netw Open. 2023;6(5):e2312022.

      (3) Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18.

      (4) Li J, Xi F, Yu W, Sun C, Wang X. Real-Time Prediction of Sepsis in Critical Trauma Patients: Machine Learning-Based Modeling Study. JMIR Form Res. 2023;7:e42452.

      (5) Li-Gao R, Hughes DA, le Cessie S, et al. Assessment of reproducibility and biological variability of fasting and postprandial plasma metabolite concentrations using 1H NMR spectroscopy. PLoS One. 2019;14(6):e0218549.

      (6) Geng T-T, Chen J-X, Lu Q, et al. Nuclear Magnetic Resonance–Based Metabolomics and Risk of CKD. American Journal of Kidney Diseases. 2023.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      Major:

      In Melnick (2013) IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III? I am wondering whether other subtests were conducted and, if so, please include the results as well to have comprehensive comparisons with Melnick (2013).

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.” For further clarification, due to these reasons, we conducted only the visuo-spatial subtest.

      Minor:

      Comments:

      In the first revised version, we addressed the following recommendations in the 'Author response' file titled 'Recommendation for the authors.' It seems our response may not have reached you successfully. We would like to share and expand upon our response here:

      (1) Table 1 and Table supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table supplementary 2??

      (1.1) What are the main points of these values?

      Thank reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlap regions between two task indicates the underlying mechanism.

      (1.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight the value which support our main conclusion.

      (1.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (2) Line 27, it is unclear to me what is "the canonical theory".

      We thank reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion” (line 27)

      (3) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank reviewer for pointing this out. We have revised them.

      (4) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank reviewer for pointing this out. We have included the total number of subjects in the beginning of result section. (line 110, line 128)

      (5) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank reviewer for pointing this out. We have revised it to:” This finding is in line with prior results, which indicates that motion perception is associated with neural activity in hMT+ area, but not in EVC (primarily in V1)” (lines 156-158)

      (6) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (7) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank reviewer for pointing this out. We have revised them and include the correlation line, 95% confidence interval, r values and p values.

      (8) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. Please notes that in the revised version, we have added a figure showing the EVC (primarily in V1) MRS scanning ROI as Supplementary Figure 1. Therefore, the figures the reviewer is concerned about are Supplementary Figure 2-5. The correlation figures in Supplementary Figure 2 indicate that GABA in EVC (primarily in V1) does not show any correlation with BDT and SI, illustrating that inhibition in EVC (primarily in V1) is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 3 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or EVC (primarily in V1). Supplementary Figure 4 validates our MRS measurements. Supplementary Figure 5 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (9) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly. (line 242)

      (10) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank reviewer for pointing this out. We have included some brief description of task in the beginning of result section. (lines 116-120)

      (11) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank reviewer for the suggestion. We have included these results in Figure 3.

      (12) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank reviewer for pointing this out. We increase the size and resolution of the Figure.

      Reviewer #3 (Public Review):

      (1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity? In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      (1.1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity?

      We thank reviewer for pointing this out. We offered additional motivation and background literature in the introduction: “Frontal cortex is usually recognized as the cognitive core region (Duncan et al., 2000; Gray et al., 2003). Strong connectivity between the cognitive regions suggests a mechanism for large-scale information exchange and integration in the brain (Barbey, 2018; Cole et al., 2012).  Therefore, the potential conjunctive coding may overlap with the inhibition and/or excitation mechanism of hMT+. Taken together, we hypothesized that 3D visuo-spatial intelligence (as measured by BDT) might be predicted by the inhibitory and/or excitation mechanisms in hMT+ and the integrative functions connecting hMT+ with frontal cortex (Figure 1a).” (lines 67-74). Additionally, we have included a whole-brain analysis for validation. Functional connectivity reveals the information exchange relationships across regions, enhancing our understanding of how hMT+ and the frontal cortex collaborate when solving visual-spatial intelligence tasks.

      (1.2) In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      We thank the reviewer for this question. The localizer task was used solely for defining the hMT+ MRS scanning area. Functional connectivity was measured using resting-state fMRI. Research has shown that resting-state functional connectivity between the frontal cortex and other ROIs can further reveal the neural mechanisms underlying intelligence tasks (Song et al., 2008).

      (2) There is an obvious mismatch between the in-text description and the content of the figure:<br /> "In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 1a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 1b)."

      We thank reviewer for pointing this out. We have revised it. The revised version is :” In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 2a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 2b).” (lines 151-156)

      (3) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate. Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      (3.1) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate.

      We thank the reviewer for pointing this out. We have revised our description of the MRS scanning ROIs to Early Visual Cortex (EVC). Since the majority of our EVC ROIs are in V1 (around 70%) and almost no V2 was included, we decided to mark the EVC ROIs with the explanation "primarily in V1" for better clarification. This terminology has been widely used to better emphasize the V1-based experimental design.

      (3.2) Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      We thank the reviewer for pointing this out. The use of the left MT/V5 as a target was motivated by studies demonstrating that left MT+/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). Therefore, we chose to use the left hMT+ as our MRS ROI and maintain consistency across different models' ROIs. Additionally, our results support the notion that the visual intelligence task is right lateralized in the frontal cortex. At the resting-fMRI level, we found that significant ROIs, where functional connectivity is highly correlated with BDT scores, are in the right frontal cortex (Figure 5a, b).

      (4) "Small threshold" and "large threshold" are neither standard descriptions, and it is unclear what "small threshold" refers to in the following figure caption. Additionally, the unit (ms) is confusing. Does it refer to timing?<br /> "(f) Peason's correlation showing significant negative correlations between BDT and small threshold."

      Thank you for pointing this out; we agree with your suggestion. We have revised the terms “small threshold” and “large threshold” to “duration threshold of small grating” and “duration threshold of large grating”, respectively. The unit (ms) refers to timing. The details are described in the methods section: “The duration was adaptively adjusted in each trial, and duration thresholds were estimated using a staircase procedure. Thresholds for large and small gratings were obtained from a 160-trial block that contained four interleaved 3-down/1-up staircases. For each participant, we computed the correct rate for different stimulus durations separately for each stimulus size. These values were then fitted to a cumulative Gaussian function, and the duration threshold corresponding to the 75% correct point on the psychometric function was estimated for each stimulus size”.

      (5) In the response letter, the authors mentioned incorporating the neural efficiency hypothesis in the Introduction, but the revised Introduction does not contain such information.

      We thank the reviewer for pointing this out. In our revised version, the second paragraph of the introduction addresses the neural efficiency hypothesis: “The “neuro-efficiency” hypothesis is one explanation for individual differences in gF (Haier et al., 1988). This hypothesis puts forward that the human brain’s ability to suppress irrelevant information leads to more efficient cognitive processing. Correspondingly, using a well-known visual motion paradigm (center-surround antagonism) (Liu et al., 2016; Tadin et al., 2003), Melnick et al found a strong link between suppression index (SI) of motion perception and the scores of the block design test (BDT, a subtest of the Wechsler Adult Intelligence Scale (WAIS), which measures the visuo-spatial component (3D domain) of gF (Melnick et al., 2013). Motion surround suppression (SI), a specific function of human extrastriate cortical region, middle temporal complex (hMT+), aligns closely with this region's activities (Gautama & Van Hulle, 2001). Furthermore, hMT+ is a sensory cortex involved in visual perception processing (3D domain) (Cumming & DeAngelis, 2001). These findings suggest that hMT+ potentially plays a significant role in 3D visuo-spatial intelligence by facilitating the efficient processing of 3D visual information and suppressing irrelevant information. However, more evidence is needed to uncover how the hMT+ functions as a core region for 3D visuo-spatial intelligence.” (lines 51-66)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      In the Code availability, it states that "this paper does not report original code". It seems weird because at least the code to reproduce the figures from the data should be provided.

      Thank you for pointing this out. Almost all figures were created using software such as DPABI, BrainNet, and GraphPad Prism 9.5, which are manually operated and do not require code adjustments. However, for the MRS fitting curve, we can provide our MATLAB code for redrawing the MRS fitting. The code has been uploaded to GitHub.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The patch clamp experiments are comprehensive and overall solid but a direct demonstration of the role of these conductances in being necessary for surge generation (or at least having a direct physiological consequence on surge properties) is lacking, substantially reducing the impact of the findings.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. The construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      We thank the reviewer for recognizing our comprehensive examination of Kiss-ARH neurons through electrophysiological, molecular and computational modeling of their activity during the preovulatory surge, which as the reviewer pointed out is “conceptually novel.”  We  have bolstered our argument that Kiss1-ARH neurons transition from synchronized firing to burst firing with the E2-mediated regulation of channel expression with the addition of new experiments. We have addressed the recommendations as follows:

      Weaknesses:

      (1) The novelty of some of the experiments needs to be clarified. This reviewer's understanding is that prior experiments largely used a different OVX+E2 treatment paradigm mimicking periods of low estradiol levels, whereas the present work used a "high E2" treatment model. However, Figures 10C and D are repeated from a previous publication by the same group, according to the figure legend. Findings from "high" vs. "low" E2 treatment regimens should be labeled and clearly separated in the text. It would also help to have direct comparisons between results from low E2 and high E2 treatment conditions.

      We have revised Figures 10C and 10D to include new findings (only) on Tac2 and Vglut2 expression in OVX and E2-treated Kiss1ARH.  Most importantly, our E2 treatment regime is clearly stated in the Methods and is exactly the same that was used previously (Qiu, eLife 2016 and Qiu, eLife 2018) for the induction of the LH surge in OVX mice (Bosch, Molecular and Cellular Endocrinology 2013) .

      (2) In multiple places, links are made between the changes in conductances and the transition from peptidergic to glutamatergic neurotransmission. However, this relationship is never directly assessed. The data that come closest are the qPCR results showing reduced Tac2 and increased Vglut2 mRNA, but in the figure legend, it appears that these results are from a prior publication using a different E2 treatment regimen.

      In the revised Figure 1, we have now included a clear depiction of the transition from synchronized firing driven by NKB signaling in OVX females to burst firing driven by glutamate in E2-treated females. All of the qPCR results in the revised manuscript are new.  We have used the same E2 treatment paradigm as previously published (Qiu, eLife 2018).

      (3) Similarly, no recordings of arcuate-AVPV glutamatergic transmission are made so the statements that Kiss1ARH neurons facilitate the GnRH surge via this connection are still only conjecture and not supported by the present experiments.

      Using a horizontal hypothalamic slice preparation, we have shown that Kiss1-ARH neurons excite GnRH neurons via Kiss1ARH glutaminergic input to Kiss1AvPV/Pen neurons (summarized in Fig. 12, Qiu, eLife 2016). We did not think that it was necessary to repeat these experiments for the current manuscript.

      (4) Figure 1 is not described in the Results section and is only tenuously connected to the statement in the introduction in which it is cited. The relevance of panels C and D is not clear. In this regard, much is made of the burst firing pattern that arises after E2 treatment in the model, but this burst firing pattern is not demonstrated directly in the slice electrophysiology examples.

      We have extensively revised Figure 1 to include new whole-cell, current clamp recordings that document burst firing  in  E2-treated, OVX females, which is now cited in the Results.

      (5) In Figure 3, it would be preferable to see the raw values for R1 and R2 in each cell, to confirm that all cells were starting from a similar baseline. In addition, it is unclear why the data for TTA-P2 is not shown, or how many cells were recorded to provide this finding.

      Before initiating photo-stimulation for each Kiss1-ARH neuron, we adjust the resting membrane potential to -70 mV, as noted  in each panel in Figure 3, through current injections. We have now included new findings on the effects of the T-channel blocker TTA-P2 on slow EPSP in the revised Figure 3. The number of cells tested with each calcium channel blocker is depicted in each of the bar graphs summarizing the effects of the blockers (Figure 3E).

      (6) In Figure 5, panel C lists 11 cells in the E2 condition but panel E lists data from 37 cells. The reason for this discrepancy is not clear.

      In Figure 5D, we measured the L-, N-, P/Q and R channel currents after pretreatment with TTA-P2 to block the T-type current, whereas in Figure 5C, we measured the total current without TTA-P2.

      (7) In all histogram figures, it would be preferable to have the data for individual cells superimposed on the mean and SEM.

      In the revised Figures we have included the individual data points for the individual neurons and animals (qPCR). 

      (8) The CRISPR experiments were only performed in OVX mice, substantially limiting interpretation with respect to potential roles for TRPC5 in shaping arcuate kisspeptin neuron function during the preovulatory surge.

      The TRPC5 channels are most  important for generating slow EPSPs when expression of NKB is high in the OVX state. Conversely, the glutamatergic response becomes more significant when the expression of NKB and TRPC5 channel are muted in the E2-treated state. Therefore, the CRISPR experiments were specifically conducted in OVX mice to maximize the effects.

      (9) Furthermore, there are no demonstrations that the CRISPR manipulations impair or alter the LH surge.

      In this manuscript, our focus is on the cellular electrophysiological activity of the Kiss1ARH neurons in OVX and E2-treated OVX females. Exploration of CRISPR manipulations related to the LH surge is certainly slated for future  experiments, but these in vivo experiments are  beyond the scope of these comprehensive cellular electrophysiological and molecular studies.

      (10) The time of day of slice preparation and recording needs to be specified in the Methods.

      We have provided the times of slice preparation and recordings in the revised Methods and Materials.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels, and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense firing in glutamatergic burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology, and CRIPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards. Robust statistical analyses are provided throughout, although some experiments (illustrated in Figures 7 and 8) do have rather low sample numbers.

      The impact of E2 on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      We thank the reviewer for recognizing that the “pharmacological and electrophysiological experiments appear of the highest standards” and “the addition of the computer modeling for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.  However, we agree with the reviewer that we needed to provide a direct demonstration of “burst-like” firing of Kiss1-ARH neurons, which we have provided in Figure 1. We have addressed the other recommendations as follows:

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One has to do with the fact that "burst-like" firing that the authors postulate ARC kisspeptin neurons transition to after E2 replacement is only seen in computer simulations, and not in slice patch-clamp recordings. A more direct demonstration of the existence of this firing pattern, and of its prominence over neuropeptide-dependent sustained firing under conditions of high E2 would make a more convincing case for the authors' hypothesis.

      We have provided  a more direct demonstration of the existence of this firing pattern in the whole-cell current clamp experiments in the revised Figure 1.

      In addition, and quite importantly, the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions (the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle) under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place. This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of these ionic currents will vary during the estrous cycle.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle with the slow EPSP reaching a maximal amplitude during diestrus, which was significantly reduced during proestrus,  similar to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016).  Moreover, TRPC5 channel mRNA expression,  similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch et al., Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      Lastly, the results of some of the pharmacological and genetic experiments may be difficult to interpret as presented. For example, in Figure 3, although it is possible that blockade of individual calcium channel subtypes suppresses the slow EPSP through decreased calcium entry at the somato-dendritic compartment to sustain TRPC5 activation and the slow depolarization (as the authors imply), a reasonable alternative interpretation would be that at least some of the effects on the amplitude of the slow EPSP result from suppression of presynaptic calcium influx and, thus, decreased neurotransmitter and neuropeptide secretion. Along the same lines, in Figure 12, one possible interpretation of the observed smaller slow EPSPs seen in mice with mutant TRPC5 could be that at least some of the effect is due to decreased neurotransmitter and neuropeptide release due to the decreased excitability associated with TRPC5 knockdown.

      The reviewer raises a good point, but our previous findings clearly demonstrated that chelating intracellular calcium with BAPTA in whole-cell current clamp recordings abolishes the slow EPSP and persistent firing (Qiu et al., J. Neurosci 2021), which we have noted is the  rationale for dissecting out the contribution of T, R, N, L and P/Q calcium channels to the slow EPSP in our current studies.  The revised Figure 3 also includes the effects of T-channel blocker.

      However, to further bolster the argument for the post-synaptic contribution of the calcium channels to the slow EPSP  and eliminate the potential presynaptic effects of the calcium channel blockers on the postsynaptic slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter release, we have utilized an additional strategy. Specifically, we have measured the response to the externally administered TACR3 agonist senktide under conditions in which the extracellular calcium influx, as well as neurotransmitter and neuropeptide release, are blocked (revised Figure 3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The use of optogenetics in Figure 3 to trigger the slow EPSP could be better clarified in the text.

      We have clarified in the Methods the optogenetic protocol for generating the slow EPSP, which we have published previously (Qiu et al., eLife 2016; eLife 2018, J. Neurosci 2021).

      (2) The citation for Figure 4C in the text does not match what is shown in the figure.

      Figure 4C has been removed in the revised manuscript.

      (3) Figure 5 - it would be clearer to have panel D labeled as "model results" or similar to distinguish it from the slice recording data.

      Panel D has been labeled as "Model results”.

      (4) The text in lines 191-197 in the Results may be better suited to the Discussion.

      We have modified the text in order to present the new findings without the discussion points.

      (5) It is somewhat confusing to have figure panels cited out of order in the main text (e.g., 7H before 7G and 8H before 8G).

      We have edited the text to report the findings in the proper order of the panels in Figures 7 and 8.

      Reviewer #2 (Recommendations For The Authors):

      - The observations that E2 treatment of OVX mice has an effect on the magnitude of a number of ionic currents does not necessarily mean that these changes will be seen during the estrous cycle, in response to fluctuations in circulating E2 concentrations. Experiments comparing either different estrous cycle stages or OVX mice treated with low or high E2 would be required to gain insight into this question. As such, the relevance of the authors' findings (however interesting these are as they stand) to any potential physiological endocrine/reproductive state transition is questionable, in the reviewer's opinion. The authors should acknowledge this important caveat and moderate the interpretations of their findings and the conclusions of their manuscript accordingly.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle with the slow EPSP being large during diestrus and significantly reduced during proestrus,  similar to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016).  Moreover, TRPC5 channel mRNA expression,  similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch et al., Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      - The bursting firing pattern that the authors refer to and postulate will favor glutamate release under high E2 conditions is only seen in the computer simulations, not in patch-clamp recordings in brain slices (see also comment below). This substantially weakens some of the conclusions of the manuscript. Unless the authors can convincingly demonstrate a change in ARC kisspeptin firing pattern in response to increasing E2 using electrophysiology, these conclusions should be moderated.

      We now include examples of burst firing activity under E2-treatment conditions in Figure 1 and have included summary figure (pie chart) documenting that a significant percentage of cells exhibit this activity with E2 treatment.  

      Other comments:

      - Title: "E2 elicits distinct firing patterns" is not shown in this work. As such, the title needs to be revised.

      We now show these distinct firing patterns in Figure 1, so we think the wording in the title is an accurate reflection of our findings. 

      - Abstract: some of the interpretations are overstated, in the reviewer's opinion.

      Line 23, "... elevating the whole-cell calcium current and contributing to high-frequency firing" should be moderated, as what is shown by the authors is that blockade of calcium channel subtypes suppresses the slow EPSP and associated firing, the frequency of which is not reported (see also a later comment).

      We now include examples of burst firing activity under E2-treatment conditions in Figure 1 and have modified the abstract to state “high frequency burst firing.”

      Lines 26-28, that "mathematical modeling confirmed the importance of TRPC5 channels for initiating and sustaining synchronous firing, while GIRK channels, activated by Dyn binding to kappa opioid receptors, were responsible for repolarization" is simply not what the simulations show, in the reviewer's opinion. Indeed, there is no consideration of synchronous activity in the model, which simulates the firing of a single ARC kisspeptin neuron. Further, the model shows that TRPC5 can contribute to overall excitability (firing in response to current injection, Figure 12G) and that increasing TRPC5 conductance increases firing in response to NKB while this is decreased by adding GIRK conductance to the model (Figure 13A). Therefore, considerations of the importance of TRPC5 channels in initiating synchronous firing and the role of Dyn A-induced GIRK activity should not be included in the interpretations of the mathematical simulations.

      The significance of synchronization lies in the fact that when neuronal networks synchronize, the behavior of each neuron within the network becomes identical. In such scenarios, the firing of a single neuron mirrors the activity of the entire neuronal network. Consequently, our model simulations, based on a single-cell neuronal model, can be utilized to make reliable inferences about synchronized neuronal activity.

      Lines 31-33 (also lines 92-95), that "the transition to burst firing with high, preovulatory levels of E2 facilitates the GnRH surge through its glutamatergic synaptic connection to preoptic Kiss1 neurons" is not supported by the experiments (physiologic or computational) described in the manuscript, and is, therefore, only speculative. These statements should be removed throughout the manuscript.

      Previously, we (Qiu et al., (eLife 2016) documented a direct glutamatergic projection from Kiss1-ARH neurons to Kiss1-AVPV/PeN neurons.  Moreover, Lin et al. (Frontiers Endocrinology 2021) demonstrated that low frequency stimulation of Kiss1-ARH:ChR2 neurons, that is known to only release glutamate, boosts the LH surge, and in a follow-up paper the O’Byrne lab blocked this stimulation with ionotropic glutamate antagonists (Shen et al., Frontiers in Endocrinology 2022).  We have included these references in the Introduction and Discussion, but we did not think that it was necessary to cite these papers in the Abstract.  However, we have re-worded this final statement in the Abstract to: “the transition to burst firing with high, preovulatory levels of E2 would facilitate the GnRH surge….” 

      - Introduction: the usefulness of Figure 1 is questionable. From reading the figure legend, it is the reviewer's understanding that panels A and B are published elsewhere (there is no description of methods or results in the manuscript). Further, panels C and D are meant to illustrate that ARC kisspeptin neurons display different types of firing in OVX vs E2-treated OVX mice. The legend to C indicates that the trace illustrates "synchronous firing" but shows one cell (how can this be claimed as synchronous?) - the legend to D indicates that the trace "demonstrates" burst firing in ARC kisspeptin neurons. This part of the figure is, in the reviewer's opinion, misleading because these are only two examples (no quantifications or replicates are provided) obtained by stimulating firing in two different endocrine conditions by two different agonists. The "demonstration" of differential firing patterns would require a thorough examination of firing patterns in response to current injections (as in Figure 12 E-F) or in response to the two agonists, under the different hormonal conditions.

      Figure 1 has now been completely revised to include new data documenting the different firing patterns.  The methods detailing these experiments can be found in the Material and Methods section.

      The introduction presents a rather incomplete picture of what is known regarding how ARC kisspeptin neurons might coordinate their activity to drive episodic GnRH secretion, and it omits published work showing that blockade of glutamate receptors (in particular AMPA receptors) decreases ARC kisspeptin neuron coordinated activity in the brain slices and in vivo and suppresses pulsatile GnRH/LH secretion in mice.

      If we are not mistaken, the reviewer is referring to fiber photometry recordings of GCaMP activity, which we cite in the Discussion.  However, for the Introduction we tried to “set the stage” for our studies on measuring the individual channels underlying the different firing patterns and how they are regulated by E2.

      The introduction is also quite long with extensive descriptions of previous work by the authors and in other brain areas that would be better suited for the discussion.

      Again, we are trying to rationalize why we focused on particular ion channels based on the literature.

      - Results: lines 129-132 should be moderated, as whether calcium channels increase excitability or facilitate TRPC5 channel opening has not been directly assessed here.

      High frequency optogenetic stimulation of Kiss1-ARH neurons and NKB through its cognate receptor (TACR3) activates TRPC 5 channels (Qiu et al., eLife 2016; J. Neurosci 2021). BAPTA prevents the opening of TRPC5 channels and abrogates the slow EPSP following high frequency stimulation.  Figure 3 documents that inhibition of voltage-activated calcium channels attenuates the slow EPSP, which results in a decrease in excitability.

      Lines 145-146, one limitation of this experiment is that blockade of calcium channel subtypes will not only affect calcium entry and subsequent actions of calcium on TRPC5 channels but also impair the release of neurotransmitters and neuropeptides from kisspeptin neurons. The interpretation that "calcium channels contribute to maintaining the sustained depolarization underlying the slow EPSP" needs, therefore, to be moderated as it is not possible to extract the direct contribution of calcium channels to the activation of TRPC5 channels from these experiments.

      We cited our previous findings documenting that chelating intracellular calcium with BAPTA abolishes the slow EPSP and persistent firing (Qiu et al., J Neurosci 2021).  However, to eliminate the potential effects of calcium channel blockers on the slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter and neuropeptide secretion, we adopted a different strategy by comparing responses between Senktide and Cd2+ plus Senktide. Our findings revealed that the non-selective Ca2+ channel blocker Cd2+ significantly inhibited Senk-induced inward current (Figures 3F-H).

      Panel C should be removed from Figure 4, as it is published elsewhere.

      Figure 4C has been removed.

      Lines 168-169, "...E2 treatment led to a significant increase in the peak calcium current density in Kiss1ARH neurons, which was recapitulated as predicted by our computational modeling..." How did the model "predict" this increase in calcium current density? As no information is provided in the methods or supplementary information as to how the effect of E2 was integrated into the model, the authors will need to provide additional narration in the text to explain this statement. The "T-channel inflection" referred to in the figure legend will also need to be explained. Lastly, in Figure 5C, the current density unit should be pA/pF. 

      We have added text in the supplementary information to explain how we used the qPCR and electrophysiological data to inform the model regarding the effect that E2 has on the various ionic currents and noted in the Figure 13 legend that the increase/decrease in the conductances is physiologically mediated by E2. We have eliminated the T-channel inflection point (Figure 5D) and corrected the current density label (Figure 5C).

      Lines 198-199, please clarify "E2 does not modulate calcium channel kinetics directly but rather alters the mRNA expression to increase the conductance".

      We have clarified that “that long-term E2 treatment does not modulate calcium channel kinetics but rather alters the mRNA expression to increase the calcium channel conductance” by referring to the specific figures (i.e., Figures 4, 6) in a previous sentence.

      Figures 7 and 8 titles do not accurately reflect the contents: there is nothing about repolarization in the experiments illustrated in Figure 7 or Figure 8. The sample sizes (3 to 4 cells) are also quite small for these experiments.

      We have modified the Figure titles per the reviewer’s comments and increased the cell numbers.

      The title of Figure 9 also does not fully reflect the figure's contents. Although panel G does suggest that the M current contributes to regulating the membrane potential, the reviewer's reading of this figure panel is that the fractional contribution of the M current does not vary during a short burst of action potentials. The suggestion that "KCNQ channels play a key role in repolarizing Kiss1ARH neurons following burst firing" (line 272) and the statement that "our modeling predicted that M-current contributed to the repolarization following burst firing" (line 273) should be revised accordingly.

      The point is that the M-current contributes, albeit a small fraction, to the repolarization during burst firing.

      Line 288, please indicate what figure informs this statement.

      We have revised the statement since the modeling (Figure 13) comes later in the Results.

      Line 311-313, this sentence only superficially describes the simulation, in the reviewer's opinion. Does the model inform on how TRPC5 channels/currents do that? The supplementary information indicates that there is a tone of extracellular neurokinin B embedded in the model. This is important information that should be clearly stated in the manuscript. The authors should also consider discussing the influence of this neurokinin B tone on the contribution of TRPC5 to cell excitability. As a neurokinin B tone in the extracellular space will likely alter the firing of kisspeptin neurons in the model, readers will likely need more information about all this.

      In our current ramp simulations of the model (Fig 12 G&H) there is no involvement of neurokinin B (i.e., the NKB parameter  is set to zero), and the effect on the rheobase is solely due to the decrease of the TRPC5 conductance.  In the model, TRPC5 channels are activated by intracellular calcium levels and are therefore contributing to cell excitability even in the absence of extracellular NKB. The NKB tone is used for the simulations presented in Figure 13 where we vary the TRPC5 conductance under saturating levels of extracellular NKB.

      Lines 316-318 also read as quite superficial. More explanations of what is illustrated in Figure 13 are needed. In particular, it is unclear from the methods and supplementary information what the different ratios of conductances in OVX+E2 vs in OVX are and how they were varied in the model. Furthermore, it is unclear to the reviewer how the outcome of these simulations matches the authors' postulate that E2 enables a transition to a burst firing pattern that favors glutamate release. Looking at simulated firing in Figure 13B, E2 (by increasing calcium conductances) would tend to enable high-frequency firing within bursts (nearing 50 Hz by eye) and high burst rates (approximately 4 bursts per second), which the reviewer would argue might be expected to cause significant neuropeptide release in addition to that of glutamate.

      We have added to the text: “Furthermore, the burst firing of the OVX+E2 parameterized model was supported by elevated h- and Ca 2+-currents (Figure 13B) as well as by the high conductance of Ca2+ channels relative to the conductance of TRPC5 channels (Figure 13C).” We have also provided in the Supplemental Information (Table of Model Parameters) the specific conductances in the OVX and OVX+E2 state and how they are varied to produce the model simulations.

      Granted the high frequency firing during a burst could release peptide, but in the E2-treated, OVX females the expression of the peptides are at “rock bottom.”  Therefore, the sustained high frequency firing during the slow EPSP in the OVX state would generate maximum peptide release.

      In Figure 13C, the reviewer is unclear on the ranges of TRPC5 conductances shown. The in vitro experiments suggest that E2 suppresses Trpc5 gene expression and might suppress TRPC5 currents. The ratio of gTRPC5(OVX+E2)/gTRPC5(OVX) should, thus, be <1.0. This is not represented in the parameter space provided, making the interpretation of this simulation difficult. Please clarify what the effect of decreasing gTRPC5 will be on firing patterns in the model.

      Thank you for pointing this typographical error.  The ratio should be gTRPC5 (OVX)/TRPC5(OVX + E2) for the X-axis.

      - Discussion: many statements and conclusions are overreaching and need to be revised; for example lines 320-322, 329-330, 335-338, 369, 371-373, 391-394, 463-464, and 489-494;

      We have tempered these statements, so they are not “overreaching.”

      Lines 489-494: the authors should integrate published observations that i) ablation of ARC kisspeptin neurons results in increased LH surges in mice and rats and that ii) optogenetic stimulation of ARC kisspeptin fibers in the POA is only effective at increasing LH secretion in a surge-like manner when done at high frequencies (20 Hz), in their discussion of the role of ARC kisspeptin neurons and their firing patterns in the preovulatory surge.

      We have included the paper from the O’Byrne lab (Shen et al. Frontiers in Endocrinology 2022) in the Discussion. However, the Mittleman-Smith paper (Endocrinology, 2016) ablating KNDy neurons using NK3-saporin not only targeted KNDy neurons but other arcuate neurons that express NK3 receptors.  Therefore, we have not cited it in the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      An online database called MRAD has been developed to identify the risk or protective factors for AD.

      Strengths:

      This study is a very intriguing study of great clinical and scientific significance that provided a thorough and comprehensive evaluation with regard to risk or protective factors for AD. It also provided physicians and scientists with a very convenient, free as well as user-friendly tool for further scientific investigation.

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses:

      (1) Comment: The paper mentions that the MRAD database currently contains data only from European populations, with no mention of data from other populations or ethnicities. Given potential differences in Alzheimer's Disease (AD) across different populations, the limitations of the data should be emphasized in the discussion, along with plans to expand the database to include data from more racial and geographic regions.

      Thank you for your valuable comment. Further information regarding the limitations of populations is provided in the Conclusions section (page 19).

      The newly added text describing the limitations of populations is as follows:

      “However, in this study, since the GWAS datasets for both the exposure and the outcome traits (AD) selected were obtained from the public database (MRC IEU OpenGWAS), where the GWAS datasets for AD are only of European population, and since we use the TwoSampleMR, which requires that the populations for the exposure trait and the outcome trait be the same to satisfy the requirement for a control variable, this study currently has certain limitations in terms of population. We initiated a Mendelian randomization study on AD at clinical hospitals in China and are currently in the sample collection stage to address the limitations. In the future, we will integrate data from more populations and keep updating new progresses in AD research to explore its potential differences in different populations.”

      (2) Comment: Sufficient information should be provided to clarify the data sources, sample selection, and quality control methods used in the MRAD database. Readers may expect more detailed information about the data to ensure data reliability, representativeness, and research applicability.

      Thank you for your helpful suggestion. We appreciate you taking time and making effort in reviewing our manuscript and thank you for your insightful comments. We agree that adding more details is essential to make the manuscript more reliability, representativeness, and research applicability.

      The newly added text describing more detailed information about the data is as follows:

      (1) Sufficient information about data sources and sample selection (in the Data sources section of Methods section, page 8):

      “Exposure traits

      Inclusion criteria: datasets of the European population.

      Exclusion criteria: (i) eQTL-related datasets; (ii) AD-related datasets.

      “In this study, the GWAS datasets selected were derived from 42,335 GWAS datasets in the public database (MRC IEU OpenGWAS, https://gwas.mrcieu.ac.uk/). Based on the above inclusion and exclusion criteria, 19,942 eQTL-related datasets were excluded first, leaving 22,393 GWAS datasets. Next, the datasets with the European population were selected, and 18,117 GWAS datasets were obtained. Finally, 20 AD-related datasets were excluded; 18,097 GWAS datasets were obtained at the end as the exposure traits of this study (See Table S1 for basic information).

      Outcome traits

      Inclusion criteria: (i) datasets of patients with AD with complete information and clear data sources; (ii) datasets of the European population.

      Exclusion criteria: (i) Number of SNPs <1 million; (ii) datasets with unspecified sex; (iii) datasets with a family history of AD; (iv) datasets with dementia.

      Based on the above criteria, 16 GWAS datasets of outcome traits were selected from the MRC IEU OpenGWAS database, comprising datasets of AD from Alzheimer Disease Genetics Consortium (ADGC), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), The European Alzheimer’s Disease Initiative (EADI), and Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer’s Disease Consortium (GERAD/PERADES) 2019 (ieu-b-2); AD from Benjamin Woolf 2022 (ieu-b-5067); AD from International Genomics of Alzheimer's Project (IGAP) 2013 (ieu-a-297) as the datasets of main outcome traits for AD, as well as 13 datasets from FinnGen biobank 2021 corresponding to various AD subtypes, referred to as AD-finn subtypes. (as shown in Figure 2).”

      (2) Sufficient information about quality control methods (in the Statistical models for causal effect inference section of Methods section, page 9-10:

      “A random-effects IVW model was used in this study as the major analysis method to uncover potential risk or protective factors for AD. The random-effects IVW model as the gold standard for MR studies, its principle is to calculate the inverse of the variance of each IV as its weight, assuming all IVs are valid. The regression does not include an intercept term, and the final result is the weighted average of the effect estimates from all IVs [34]. This model indicates that the true effect values may vary across different studies due to both sampling error and the heterogeneity of the true effect. The weight of each study is jointly determined by its inverse variance and the estimated heterogeneity variance. Thus, as long as there is no pleiotropy, even when there is significant heterogeneity (p < 0.05), this method remains the best MR model.

      To assess the robustness of the IVW results, sensitivity analysis was performed using six additional models: (i) MR-Egger: MR-Egger’s biggest difference from IVW is that it considers the intercept term during regression to evaluate bias caused by horizontal pleiotropy. The intercept represents the magnitude of horizontal pleiotropy, with a value close to 0 indicating minimal pleiotropy. The primary purpose is to detect and correct for horizontal pleiotropy. Thus, when significant horizontal pleiotropy is observed (p < 0.05), this method is preferred [35,36]. (ii) Weighted median: The weighted median method is a technique for evaluating causal relationships using a majority of genetic variants (SNPs). If at least 50% of the SNPs are valid IVs, the median of the causal estimates will tend toward the true causal effect. This method provides an unbiased estimate (i.e., the “majority validity” assumption) [37]. (iii) Simple mode: Involves comparing the frequencies or proportions of genotypes or phenotypes between control and experimental groups. Moreover, it can illustrate whether the observed differences in genotypes or phenotypes between the two groups are statistically significant. (iv) Weighted mode: The weighted mode method is a technique for combining multiple Mendelian randomization estimates. This method assigns weights to the causal effect estimates of different genetic variants on the trait and then takes the weighted mode as the final estimate of the causal effect. In genetic variant estimates, the method can decrease bias caused by outliers. (v) Maximum likelihood: This method is used when it is known that a random sample follows a particular probability distribution; however, the specific parameters of that distribution remain unknown, and it involves conducting multiple experiments, observing the results, and using those results to infer the approximate values of the parameters [38]. (vi) Penalized weighted median: An enhanced version of the weighted median estimate that provides a consistent estimate of the causal effect. (vii) Heterogeneity and horizontal pleiotropy assessment use the heterogeneity tests [39] and Egger intercept tests [40], respectively.”

      (3) Comment: While the authors mention that the MRAD database offers interactive visualization interfaces, the paper lacks detailed information on how to interpret and understand these visual results. Guidelines on effectively using these visualization tools to help researchers better comprehend the data are essential.

      Thank you very much for your feedback, as we believe that our manuscript has been improved substantially as a result of your input.  Owing to space constraints, the MRAD database user guide is included in the Supplementary Material. Meanwhile, for better understanding, the subheading of the relevant content in the Supplementary Material has been revised to “MRAD User Guide” (see Supplementary Material for details, page 11). Furthermore, considering user-friendliness, the user guide has been integrated into the database and can be accessed directly from the homepage by clicking on the “User Guide” module.

      (4) Comment: In the conclusion section of the paper, it is advisable to explicitly emphasize the practical applications and potential clinical significance of the MRAD database. The paper should articulate how MRAD can contribute to the early identification, diagnosis, prevention, and treatment of AD and its potential societal and clinical value more clearly.

      Thank you for pointing this out. In the Discussion section of the revised manuscript, we have now added how MRAD can contribute to the early identification, diagnosis, prevention, and treatment of AD and its potential societal as well as clinical value. And we reorganized the structure of Discussion section to make the text easier to understand, which could be helpful to further clarify the significance of MRAD. (page 15)

      The newly added text describing the practical applications and potential clinical significance of the MRAD database is as follows:

      “(i) The current methods for identifying AD mainly rely on assessment scales, cerebrospinal fluid (CSF) examinations, and brain PET/MRI. However, assessment scales can be biased by factors such as the anxiety and nervousness of the subjects. CSF examinations require an invasive lumbar puncture, leading to low patient acceptance. PET/MRI scans are expensive and have limited equipment accessibility. These limitations restrict early AD identification. Thus, there is a pressing clinical need for readily available, time- and cost-effective, and accurate detection methods. In this study, the Medical laboratory science and Molecular trait used could be less expensive, faster to detect, easier to operate, and more accessible for widespread adoption. They hold great value for early AD identification and have the potential to become crucial tools for identifying AD in the future. (ii) Imaging acts as a powerful assistive tool for diagnosing Alzheimer’s disease. Traditional imaging examinations mainly depict changes in the brain’s macroscopic structure, while research on microstructural changes in disease-related areas is relatively limited. Studies have demonstrated that microstructural neurodegenerative processes are extensive and pronounced during AD progression. Our study results cover traditional macroscopic neuroimaging results and reveal numerous potential causal relationships between brain microstructure and AD. The combination of macroscopic and microstructural insights will provide more valuable information for clinical diagnosis. (iii) Clarifying patient’s disease, past history, and family history can aid in preventing AD at an early stage, and prevention of AD could be attained through monitoring anthropometric indicators, improving gut microbiota, and adjusting lifestyle traits. (iv) Currently, the development of new drugs for AD is mainly underscored by Aβ, Tau, and other inhibitors. Since 2000, global pharmaceutical companies have invested hundreds of billions of dollars in the development of new drugs for AD, and these drugs have not yielded successful results. AD drug development has thus been perceived as having the highest failure rate of all drug research, reaching 99.6%. Hence, further research on molecular traits to find new targets and develop new drugs for these targets will provide new pathways for AD treatment.”

      (5) Comment: Grammar and Spelling Errors: There are several spelling and grammar errors in the paper. Referring to a scientific editing service is recommended.

      We appreciate your comments and suggestions for improving our manuscript. We have now used a professional editing service offered by Taylor and Francis to revise the grammar and language, and we have obtained a certificate of proof, which is attached. Thank you for recognizing our research, we have tried our best to improve the quality of this paper to ensure that it meets the high standards required for publication in of journal elife.

      Reviewer #2 (Public Review):

      Summary:

      This MR study by Zhao et al. provides a comprehensive hypothesis-free approach to identifying risk and protective factors causal to Alzheimer's Disease (AD).

      Strengths:

      The study employs a comprehensive, hypothesis-free approach, which is novel over traditional hypothesis-driven studies. Also, causal associations between risk/protective factors and AD were addressed using genetic instruments and analysis.

      We greatly appreciate the positive feedback regarding the overall quality of our work.

      Major comments:

      (1) Comment: The authors used the inverse-variance weighted (IVW) model as the primary method and other MR methods (MR-Egger, weighted mean, etc.) for sensitivity analysis. However, each method has its own assumption, and IVW is only robust when pleiotropy and heterogeneity are not severe. Rather than using IVW imprudently across all associations, it would be more appropriate to choose the best MR method for each association based on heterogeneity/Egger intercept tests. This customized approach, based on tests of MR assumption violations, yields more stable and reliable results. For reference, please follow up on work by Milad et al. (EHJ - "Plasma lipids and risk of aortic valve stenosis: a Mendelian randomization study"). This study selected the best MR model for each association based on pleiotropy and heterogeneity tests. Given the large number of tests in this work, I suggest initially screening significant signals using IVW, as done, and then validating the results using multiple MR methods for those signals. It is common for MR estimates from different methods to vary significantly (with some being statistically significant and others not), and in such cases, the MR estimates from the best-fitted model should be trusted and highlighted.

      Thank you for your professional comments. We agree that our description of the Statistical models for causal effect inference was not specific enough. Therefore, we have included a new text describing more details about each method’s assumption and supplied a predefined approach to select the best statistical estimation from these methods in the Statistical models for causal effect inference section of Methods section (page 9-10). However, we would like to clarify our analysis method. In this study, the main analysis method used is the IVW random effects model instead of the IVW fixed effects model. The IVW random effects model indicates that the true effect values of different studies may vary, including both sampling error and heterogeneity of the true effect. The weight of each study is jointly determined by its inverse variance and the estimated heterogeneity variance. Thus, as long as there is no pleiotropy, even when there is significant heterogeneity (p < 0.05), this method is still the best MR model. We would like to thank you again for your feedback, as we believe that our manuscript has been improved substantially as a result of your input.

      The newly added text describing more details about each method’s assumption and the customized best-fitted model is as follows:

      “Statistical models for causal effect inference

      A random-effects IVW model was used in this study as the major analysis method to uncover potential risk or protective factors for AD. The random-effects IVW model as the gold standard for MR studies, its principle is to calculate the inverse of the variance of each IV as its weight, assuming all IVs are valid. The regression does not include an intercept term, and the final result is the weighted average of the effect estimates from all IVs [34]. This model indicates that the true effect values may vary across different studies due to both sampling error and the heterogeneity of the true effect. The weight of each study is jointly determined by its inverse variance and the estimated heterogeneity variance. Thus, as long as there is no pleiotropy, even when there is significant heterogeneity (p < 0.05), this method remains the best MR model.

      To assess the robustness of the IVW results, sensitivity analysis was performed using six additional models: (i) MR-Egger: MR-Egger’s biggest difference from IVW is that it considers the intercept term during regression to evaluate bias caused by horizontal pleiotropy. The intercept represents the magnitude of horizontal pleiotropy, with a value close to 0 indicating minimal pleiotropy. The primary purpose is to detect and correct for horizontal pleiotropy. Thus, when significant horizontal pleiotropy is observed (p < 0.05), this method is preferred [35,36]. (ii) Weighted median: The weighted median method is a technique for evaluating causal relationships using a majority of genetic variants (SNPs). If at least 50% of the SNPs are valid IVs, the median of the causal estimates will tend toward the true causal effect. This method provides an unbiased estimate (i.e., the “majority validity” assumption) [37]. (iii) Simple mode: Involves comparing the frequencies or proportions of genotypes or phenotypes between control and experimental groups. Moreover, it can illustrate whether the observed differences in genotypes or phenotypes between the two groups are statistically significant. (iv) Weighted mode: The weighted mode method is a technique for combining multiple Mendelian randomization estimates. This method assigns weights to the causal effect estimates of different genetic variants on the trait and then takes the weighted mode as the final estimate of the causal effect. In genetic variant estimates, the method can decrease bias caused by outliers. (v) Maximum likelihood: This method is used when it is known that a random sample follows a particular probability distribution; however, the specific parameters of that distribution remain unknown, and it involves conducting multiple experiments, observing the results, and using those results to infer the approximate values of the parameters [38]. (vi) Penalized weighted median: An enhanced version of the weighted median estimate that provides a consistent estimate of the causal effect. (vii) Heterogeneity and horizontal pleiotropy assessment use the heterogeneity tests [39] and Egger intercept tests [40], respectively.”

      (2) Comment: Lines 157-160 mentioned "But to date, AD has been reported as hypothesis-driven MR study based on a single factor, ignoring the potential role of a huge number of other risk factors. Also, due to the high degree of heterogeneity present in AD subtypes, which have different biological and genetic characteristics. Thus, the previous studies cannot offer a systematic and complete viewpoint.". This statement overlooks a similar study published in Molecular Psychiatry ("A Phenome-wide Association and Mendelian Randomization Study for Alzheimer's Disease: A Prospective Cohort Study of 502,493"), which rigorously assessed the effects of 4171 factors spanning 10 different categories on AD using observational analysis and MR. The authors should revise their statement on the novelty of their study type throughout the manuscript and discuss how their work differs from and potentially strengthens previous studies.

      Thank you for directing us to this literature. We have read this article carefully. This study shares some similarities with our study but there are significant differences with regards to sample sources and research fields. The study, as mentioned by the reviewer, used the UKB database as its sample source, and analyzed the association between 10 categories (comprising 4,171 factors) and AD, which were sociodemographic, physical measures, lifestyle and environment, health conditions, mental health, medications and operations, cognitive function, sex-specific factors, employment, and early-life factors. However, the study revealed they are restricted by the available variables from the UKB database, which lead to variables such as air pollution, blood glucose measures and so on were not included. Conversely, our study used samples from the MRC IEU OpenGWAS database, the largest open GWAS database globally. Furthermore, our research focus differs, as we primarily investigate the causal relationship between the following 10 categories (comprising 18,097 traits) and AD, which were Disease, Medical laboratory science, Imaging, Anthropometric, Treatment, Molecular trait, Gut microbiota, Past history, Family history, and Lifestyle trait. Most importantly, we have established a database encompassing all MR analysis results, allowing researchers and clinicians worldwide to conveniently and rapidly retrieve AD-associated risk factors via an online open integrated platform (MRAD, https://gwasmrad.com/mrad/).We have now added a new text in the Background section (page 6-7) describing the differences and potential strengthens towards previous studies.

      The newly added text describing the differences and novelty towards previous studies is as follows:

      “Chen et al. [30] used MR analysis to reveal the causal relationship between AD and factors including sociodemographic and early life status. However, the study revealed they are restricted by the available variables from the UKB database, which lead to variables such as air pollution, blood glucose measures and so on were not included. And also, due to the high degree of heterogeneity present in AD subtypes, which have different biological and genetic characteristics. Thus, the previous studies cannot offer a systematic and complete viewpoint. Our study uses the MRC IEU OpenGWAS database as the sample source for MR analysis to address the aforementioned limitations. The MRC IEU OpenGWAS database, the largest open GWAS database globally, has compiled 42,335 GWAS summary datasets from sources such as the UK Biobank, FinnGen Biobank, and Biobank Japan. Analyzing large-scale datasets will break new ground for MR research on AD.

      MR requires a combination of background knowledge in biology, computer science, software studies, and statistics, which often leads to a dilemma where biologists are not well-versed in computer and statistical fields, while computer science experts struggle to adopt a medical biology mindset. Consequently, the vast majority of available GWAS data have not been effectively utilized through MR. Therefore, the construction of a multi-level data platform specifically for AD based on MR analysis of massive GWAS data is of great strategic significance, and it will facilitate researchers and clinicians worldwide to conveniently and rapidly obtain risk factors that are causally associated with AD.”

      Reference:

      [30] Chen SD, Zhang W, Li YZ, et al. (2023). A Phenome-wide Association and Mendelian Randomization Study for Alzheimer's Disease: A Prospective Cohort Study of 502,493 Participants From the UK Biobank. Biol Psychiatry. 1;93(9):790-801.

      (3) Comment: Given the large number of tests, the multiple testing issue is concerning. To mitigate potential false positives, I recommend employing the Bonferroni threshold or FDR. The authors should only interpret exposures that are significant at the Bonferroni threshold.

      We sincerely appreciate the reviewer's feedback. Thank you for pointing this out. We have added the results of the Bonferroni correction to the Statistical models for the causal effect inference section of the Methods section (page 10) in response to the reviewer's feedback.

      The newly added text describing Bonferroni threshold is as follows:

      “The above analyses were performed using the TwoSampleMR[41] package in the R (version 4.1.2) software. Association of exposures with outcomes was assessed using odds ratio (OR) and 95% confidence interval (95% CI), with OR > 1 indicating a positive association (risk factor) and 0 < OR < 1 indicating a negative association (protective factor). Differences with a two-sided p < .05 were considered statistically significant. Furthermore, owing to the relatively large number of exposure and outcome traits included in this study, the multiple testing correction method Bonferroni correction was added to identify significant hits, threshold for Bonferroni-corrected was 0.05 divided by 289,552 tests (p <1.727e-07).”

      (4) Comment: In the discussion, the authors should interpret or highlight exposures that remain significant after multiple testing corrections.

      Thank you for your valuable comment. In response to reviewer feedback, we have put extra emphasis on the exposures that remained significant after multiple testing corrections in the Discussion section (page 17). We thank you again for your feedback, as we believe that our manuscript has been improved substantially as a result of your input.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Comment: In this study, the authors used the inverse-variance weighted (IVW) model as the major analysis method to perform Mendelian randomization analysis to identify various classes of risk or protective factors for AD, early-onset AD, and late-onset AD. An online database called MRAD has been thereby developed with the assistance of Shiny package. This study is a very intriguing study of great clinical and scientific significance that provided a thorough and comprehensive evaluation with regard to risk or protective factors for AD. It also provided physicians and scientists with a very convenient, free as well as user-friendly tool for further scientific investigation.

      I believe this manuscript is great research that is worth publishing with all the comments from the Public Review resolved.

      We thank the reviewer for taking the time to read and provide valuable feedback on our manuscript, which allowed us to improve the overall quality of our research. All the comments from the Public Review have been rechecked, and appropriate changes have been made in accordance with the reviewers’ suggestions. Point-by-point responses to all the comments from the Public Review can be found in the above. If there are any further issues, please do not hesitate to let us know, so that we can ensure that our manuscript meets the high standards required for publication.

      Reviewer #2 (Recommendations For The Authors):

      (1) Comment: In the middle lower left section of the graphical abstract, the overlapping positive (N=63) and overlapping negative (N=16) do not sum to the overlapping number (N=80). Could you clarify if any have both positive and negative effects? Additionally, the font size inside the circular elements is too small to read.

      We thank you for raising this issue. We have clarified this in the MRAD utility data mining section of Results section (page 12): A total of 63 exposure traits (risk factors) were positively associated with all the three main outcome traits, while 16 exposure traits (protective factors) were negatively associated with the three main outcome traits, with Ulcerative colitis (ebi-a-GCST000964) being negatively associated with the AD outcome traits of ieu-b-2 and ieu-a-297, and positively associated with the AD outcome traits of ieu-b-5067. Additionally, we apologize for the small, unreadable fonts in the graphical abstract figure. In response to reviewer feedback, we have increased the font size within the figure and enhanced the resolution to improve image readability (page 3).

      (2) Comment: The x-axis label ("Alzheimer's disease outcome") should be more descriptive. If published GWAS results are used, indicate this as XXX et al. (2022). Also, specify the AD outcome for each category (e.g., AD, early-onset AD, late-onset AD). The y-axis labels should also be clarified; remove identification codes and retain only the exposure names. Apply the same improvements to Figures 2-8.

      We appreciate your comments and suggestions for improving our manuscript.

      (i) In response to reviewer feedback, information of published GWAS such as authors and year of publication have now been added to the x-axis labels, as demonstrated in Figure 4 (page 31).

      (ii) The outcome IDs are unique. We used these IDs to represent the AD information on the x-axis to maintain a clean and clear figure. The corresponding details for each ID are explained in the Outcome traits section of the Methods section (page 8, as shown in Figure 2). AD_EO refers to early-onset AD, and AD_LO refers to late-onset AD, which are also specified in the Abbreviations (page 4).

      (iii) We sincerely appreciate the reviewers’ meticulous feedback. While exposure IDs in this study are unique, exposure names are not. A single exposure name may correspond to multiple IDs, each with a potentially different source of information (e.g., author, year, population sample). We believe obtaining consistent results across multiple IDs further strengthens the reliability of our conclusions. Hence, for better clarity of specific exposure information, the exposure IDs have been retained.

      (3) Comment: The results across Figures 1-8 are repetitive and not very informative. Consider other visualizations to condense the information into one or two figures. I would recommend using a Manhattan plot or PheWAS plot concept to effectively display many test results at once. Please display the Bonferroni threshold in the plot as a horizontal line to show which exposures are meaningful after adjusting multiple comparisons.

      We appreciate this helpful suggestion. We have now condensed Figures 1–8 into a single figure (as shown in Figure 4). Additionally, we have now displayed the Bonferroni correction results in the sensitivity analysis results figures (as shown in Figure 5, Figure S1-S7).

      (4) Comment: Consider placing Figure S1 as Figure 1, condensing Figures 1-8 into Figures 2 and 3, and placing the circular diagrams from Figure S6 as Figure 4.

      We appreciate this valuable suggestion. The sequence of the figures has been adjusted.

      (5) Comment: Create a main table summarizing robust and consistent exposures for AD that are significant at the Bonferroni threshold for readers. For each exposure, please include estimates from IVW, MR-Egger, weighted median, simple mode, weighted mode, maximum likelihood, and penalized weighted median, along with heterogeneity and horizontal pleiotropy tests. I would also highlight or bold estimates from the best-fit model/MR method to help readers identify the most reliable estimates when estimates from multiple methods are heterogeneous.

      We appreciate this helpful suggestion. Owing to the excessive amount of information in the table, we have uploaded the table covering the aforementioned information according to the reviewer’s suggestion as supplementary materials (See Table S2). (i) The corresponding id.exposure that pass the Bonferroni threshold are reflected in red font. (ii) Furthermore, according to the customized best-fitted model (as mentioned in the Statistical models for causal effect inference section of Methods section), when there is no pleiotropy or when pleiotropy is not applicable (less than 3 SNPs), random-effects IVW model is the best model. These corresponding id.exposure are shown in red font with a yellow highlight. (iii) Moreover, according to the customized best-fitted model, when there is pleiotropy, MR-Egger is the best model. These corresponding id.exposure are shown in red font with a green highlight.

      (6) Comment: Figures S4-S10: These figures are screenshots of web browsers and may not be worth showing. Consider using tools like Adobe AI or R ggplot to create more refined visualizations that are specific to the research question and improve the message of this work.

      Thank you very much for your valuable suggestion in reviewing our manuscript. In this study, Figures S4-S10 are screenshots related to the user guide. We sincerely appreciate the reviewer’s feedback and have revised the subheading of this section to MRAD User Guide to clarify its purpose. Demonstrating both text and figures in this section, we aim to help users understand ways to operate MRAD more intuitively and easily.

      (7) Comment: Additionally, please show upfront or highlight results from MR analyses based on R packages, as the author mentioned in the method section. Somehow it's difficult to find results from MR-Egger, weighted median, simple mode, weighted mode, maximum likelihood, and penalized weighted median, along with heterogeneity and horizontal pleiotropy tests in the supplementary materials. Apologies if I missed them. Please ensure these results are clearly presented.

      We appreciate your comments and suggestions for improving our manuscript. Thank you for pointing this out. We have added the results of the sensitivity analysis based on R packages (as shown in Figure 5, Figure S1-S7, and Table S2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      I am not convinced how this study relates to HIV individual HFpEF, and the study design does not seem to be well thought out. 

      This is an important point and we have modified the manuscript as mentioned below in our responses.

      The connectivity of the study experiments is loose, and data analysis and conclusions are broadly overstated and misinterpreted.

      We have modified the manuscript thoroughly so the data are interpret properly, and the conclusions are stated logically. 

      For example the study lacks any measure of diastolic contractile function, and even if performed, the relevance of TNFa treatments to cells in vitro in these immature cell contexts would remain unclear. There is surprisingly no reported molecular analyses of potential mechanisms of the calcium transient changes. The study falls short in molecular detail and instead relies on drug treatments and responses that are hard to interpret with dosages that are not well justified and treatments that are numerous. Unclear what changes in calcium transients mean functionally without a comprehensive assessment of CM biomechanical contraction and relaxation measurements, and this would also require parallel molecular investigations of potential targets of any phenotypes observed.

      As mentioned above, we have modified the manuscript so the data are interpret properly, and the conclusions are stated logically. In terms of mechanisms for the observed phenomenon, we agree that this was not the focus of studies, however, we have provided a paragraph in the discussion that covers this topic. Although Decay and downstroke time were utilized as surrogates of cardiomyocyte relaxation, direct biomechanical characterization of contraction was not conducted in this study. While cytosolic calcium concentration is a predominant factor to regulate the cell’s relaxation (Reference 52 in the manuscript), there are several mechanisms to modify the relationship, including the transition of sarcomere protein isoforms to pathogenic ones (Reference 53 in the manuscript) and the stimulation of β-adrenergic receptor on cardiomyocytes (Reference 54 in the manuscript). Since hiPSC-CMs utilized for each study is from iPS cells derived from a single donor, we believe that the patterns of sarcomere protein expression and the regulation of β-adrenergic receptor pathway should be consistent among samples, supporting their effects should be minimum in our system. We also did not elucidate molecular mechanisms underlying prolonged decay time induced by TNF-α and IFN-γ in this study. Lee et al. reported that 25 ng/ml TNFa treatment induced a longer decay portion of the calcium transient and a decreased sarcoplasmic ATPase (SERCA) expression in rabbit cardiomyocytes from pulmonary vein (Reference 55 in the manuscript), suggesting our observation in iPS-CM is also through decreased expression of SERCA though further studies remain conducted.

      Calcium transient data need to be better illustrated such as with representative peak tracings. The data overall is with too few samples, particularly given the inherent heterogeneity of iPSCM studies. The iPS-CM system as a model for diastolic dysfunction remains unestablished.

      We have now prepared several representative curves of calcium transient and their derivatives in Figure 4 D and E, H and I, and in Figure 1-figure supplement 1B. In terms of the way to collect Ca-transient data, each dot in the bar graphs represents the average of signals obtained from one well of the 96-well plates. About 75K cells were seeded in one well, and we believe that the number of cells integrated in the analyses should be sufficient for the statistical analyses. We modified our manuscript as this system does not quantifying diastolic function directly, but represents Ca measurements that indicate cardiomyocyte relaxation.

      There are unclear dose choices for the various ART drugs tested, as well as the other drugs tested such as SGLT2i. Besides the observation that SLC5A2 (SGLT2 target) is not established to be expressed in adult mammalian cardiomyocytes. 

      Thank you for the comment. The dose ranges of ART drugs were chosen to extend to 10fold above the IC50 concentrations and reflects the upper range of circulating drug concentration in patients receiving these medications (Reference 36-39 in the manuscript). For SGLT2 inhibitor concentration, we referred to a paper utilizing 1-10 μM dapagliflozin (PMID: 35818731). We conducted a preliminary study to test the effect of 1 and 10 μM of dapagliflozin on the Ca-transient of iPS-CMs, and we found that 1 μM of the drug treatment did not cause changes in Ca-transient. Marfella et al. reported that SLC5A2 (SGLT2) expresses in cardiomyocytes under diabetic condition (PMID 36096423). Since diabetes is associated with low grade systemic inflammation, HIV patients might also express SGLT2 in cardiomyocytes. Taken together, we believe that the dosages of the drugs used in our studies are relevant to the clinical therapeutical usages of the drugs.

      HIV plasma samples were not tested for cytokine levels, but this could be done to assess the validity of the final experiments. It is unclear what is being tested with these experiments. 

      This is a good point and we agree with the reviewer. However, we had limited amount of the patient serum and could not perform a comprehensive analysis of these samples. Nevertheless, we have added a section in the Discussion section providing some clinical relevance of our findings based on the papers that have assessed cytokine levels in the serum of HIV patients.

      The choice of serum controls from a second institution (UCSF) opens up concerns over batch effects unrelated to differences in diastolic dysfunction. However, there were no differences with the Northwestern samples. It is unclear why this data is included as it does not add to the impact of the study. 

      In our study, we utilized two sets of HIV patient serum samples from different institutions, supporting that our results can be reproduced. We believe that these results significantly augmented the rigor of our findings.

      There are concerns about the quality of the iPS-CMs since there is no cell imaging or molecular analyses. Figure 5 Supplement 1 images are of low quality and low resolution to assess cell quality. Overall the iPS-CM QC data is extremely sparse 

      We have now added the representative images of iPS-CMs to Figure 1- figure supplement 1A. Our group has used hiPS-CMs extensively in the past (PMID: 26439715). We also updated Fig 5 Supplement 1 with images with better resolution and added Fig 5 Supplement 2 with magnified images. 

      Reviewer2 (Public Review):

      However, there are some topics that are not well-connected, and the rationale and hypothesis are not clearly defined beforehand, such as mitochondrial membrane potential, mitochondrial ROS, and angiogenic potential. 

      We modified the manuscript so the rationale and hypothesis of the study is clearly stated. 

      As the hiPSC cardiomyocytes are treated with various reagents to measure diastolic dysfunction, it is important to confirm whether the treatment time and dose used were sufficient to exert a functional effect. Dose and time-dependent experiments are essential, or at least sufficient citations should be provided for selecting the dose for IFN and TNF. 

      We used previous publications for the dosages of the drugs used in our paper (1-4). 

      After IFN and TNF treatment, determining the expression levels of molecular markers of DD/HFpEF is crucial. Again, if sufficient evidence is available, it can be cited. 

      We have included a section in the discussion to address this issue. Briefly, Lee et al. reported that 25 ng/ml TNFa induces a longer decay of calcium transient and a decrease in sarcoplasmic ATPase (SERCA) expression in rabbit cardiomyocytes from pulmonary vein (PMID 17383682). The prolonged Cadecay time in hiPS-CM with the drug administration may be due to a decrease in SERCA expression and impaired Ca-uptake into sarcoplasmic reticulum.

      The Methods section describes TMRE colocalization and immunofluorescence, but no images are provided.

      We have performed immunofluorescence of hiPSC-CM with TMRE for the quantification of mitochondrial membrane potential (MMP). 

      The concentration of TNF and IFN in patients is critical, which was acknowledged and discussed as a limitation of the study by the authors. Authors should consider this aspect, and if not feasible, clinical reports should be cited to provide a rough estimation of their concentration. 

      Thank you for this comment. A new section detailing the points brought up by the Reviewer is now added to discussion.

      Recommendation for the authors:

      Reviewer #1 (Recommendation for the authors):

      I suggest a more comprehensive analysis of diastolic function including biomechanical studies of contraction and diastolic function. I suggest increasing the sample #'s, getting a better characterziation of the cardiomyocytes, their expression profiles, and maturation state. The team should dig more deeply into potential molecular mechanisms of the calcium transient changes. Are there changes in SERCA or other SR factors' phosphorylation state or other molecular explanations for the observed changes? I would remove the serum treatment experiments as they distract since they didn't show differences. These are a few of the suggestions I would have for the team.

      Our system for measurement of Ca-transient unfortunately does not allow to obtain data on the cellular biomechanical property. We modified the manuscript so the results are not overstated and that the interpretation is correct. Since each dot in bar-graphs for Ca-transient data represents the average of signals generated from 75 K cells, we believe that the number of cells analyzed was sufficient for the analyses. Although it is not conclusive, previous reports suggested induction of SERCA2A expression by TNF-α treatment in isolated cardiomyocytes, suggesting that the mechanism underlying the prolonged calcium decay time in our model may be due to changes in SERCA levels. We included the data from human serum samples from HIV patients since they provide a platform to assess the effects of HIV patient serum on. We believe that these data convey a significant progress understanding the process of myocardial dysfunction in HIV patients.

      References

      Amirayan-Chevillard, N., Tissot-Dupont, H., Capo, C., Brunet, C., Dignat-George, F., Obadia, Y., Gallais, H., and Mege, J. L. (2000) Impact of highly active anti-retroviral therapy (HAART) on cytokine production and monocyte subsets in HIV-infected patients. Clinical and experimental immunology 120, 107-112

      Fraietta, J. A., Mueller, Y. M., Yang, G., Boesteanu, A. C., Gracias, D. T., Do, D. H., Hope, J. L., Kathuria, N., McGettigan, S. E., Lewis, M. G., Giavedoni, L. D., Jacobson, J. M., and Katsikis, P. D. (2013) Type I interferon upregulates Bak and contributes to T cell loss during human immunodeficiency virus (HIV) infection. PLoS Pathog 9, e1003658

      Lau, S. L., Yuen, M. L., Kou, C. Y., Au, K. W., Zhou, J., and Tsui, S. K. (2012) Interferons induce the expression of IFITM1 and IFITM3 and suppress the proliferation of rat neonatal cardiomyocytes. Journal of cellular biochemistry 113, 841-847

      Stone, S. F., Price, P., Keane, N. M., Murray, R. J., and French, M. A. (2002) Levels of IL-6 and soluble IL-6 receptor are increased in HIV patients with a history of immune restoration disease after HAART. HIV Med 3, 21-27

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major comments: 

      My main concern about the manuscript is the extent of both clinical and statistical heterogeneity, which complicates the interpretation of the results. I don't understand some of the antibiotic comparisons that are included in the systematic review. For instance the study by Paul et al (50), where vancomycin (as monotherapy) is compared to co-trimoxazole (as combination therapy). Emergence (or selection) of co-trimoxazole in S. aureus is in itself much more common than vancomycin resistance. It is logical and expected to have more resistance in the co-trimoxazole group compared to the vancomycin group, however, this difference is due to the drug itself and not due to co-trimoxazole being a combination therapy. It is therefore unfair to attribute the difference in resistance to combination therapy. Another example is the study by Walsh (71) where rifampin + novobiocin is compared to rifampin + co-trimoxazole. There is more emergence of resistance in the rifampin + co-trimoxazole group but this could be attributed to novobiocin being a different type of antibiotic than co-trimoxazole instead of the difference being attributed to combination therapy. To improve interpretation and reduce heterogeneity my suggestion would be to limit the primary analyses to regimens where the antibiotics compared are the same but in one group one or more antibiotic(s) are added (i.e. A versus A+B). The other analyses are problematic in their interpretation and should be clearly labeled as secondary and their interpretation discussed. 

      Thank you for raising these important points and highlighting the need for clarification. We understand that the reviewer has concerns regarding the following points:

      (1) The structure of presenting our analyses, i.e. main analyses and sub-group analyses and their corresponding discussion and interpretation

      Our primary interest was whether combining antibiotics has an overarching effect on resistance and to identify factors that explain potential differences of the effect of combining antibiotic across pathogens/drugs. Therefore, pooling all studies, and thereby all combinations of antibiotics, is one of our main analyses. The decision to pool all studies that compare a lower number of antibiotics to a higher number of antibiotics was hence predefined in our previously published study protocol (PROSPERO CRD42020187257).

      We indeed, find that heterogeneity is high in our statistical analyses. As planned in our study protocol, we did perform several prespecified sub-group analyses and added additional ones. We now emphasize that several sub-group analyses were performed to investigate heterogeneity (L 119ff): “The overall pooled estimates are based on studies that focus on various clinical conditions/pathogens and compare different antibiotics treatments. To explore the impact of these and other potential sources of heterogeneity on the resistance estimates we performed various sub-group analyses and metaregression.” 

      The performed sub-group analyses specifically focused on specific pathogens/clinical conditions (figure 3) or explored heterogeneity due to different antibiotics in comparator arms – as suggested by the reviewer (figure 3B, SI section 6). We find that the heterogeneity remains high even if only resistances to antibiotics common to both arms are considered (SI section 6.1.8). With this analysis we excluded comparisons of different antibiotics (e.g., A vs B+C), such as those between vancomycin and cotrimoxazole named by the reviewer. While we aimed to explore heterogeneity and investigate potential factors affecting the effect of combining antibiotic on resistance, limitations arose due to limited evidence and the nature of data provided by the identified studies. Therefore, interpretability remains also limited for the subgroup analyses, which we highlight in the discussion. (L 186 ff: We accounted for many sources of heterogeneity using stratification and meta-regression, but analyses were limited by missing information and sparse data.) Further, specific subgroup analyses are discussed in more detail in the SI.

      (2) Difference in resistance development due to the type of the antibiotics or due to combination therapy?

      The reviewer raises an important point, which we also try to make: future studies should be systematically designed to compare antibiotic combination therapy, i.e. identical antibiotics in treatment arms should be used, except for additional antibiotics used in both treatment arms. We already mentioned this point in our discussion but highlight this now by emphasizing how many studies did not have identical antibiotics in their treatment arms. We write in L194ff: “19 (45%) of our included studies compared treatment arms with no antibiotics in common, and 22 studies (52%) had more than one antibiotic not identical in the treatment arms (table 1). To better evaluate the effect of combination therapy, especially more RCTs would be needed where the basic antibiotic treatment is consistent across both treatment arms, i.e. the antibiotics used in both treatment arms should be identical, except for the additional antibiotic added in the comparator arm (table 1).”

      Furthermore, we investigated the importance of the type of antibiotics with several subgroup analyses (e.g. SI sections 6.1.8 and 6.1.10). We now further highlight the concern of the type of antibiotics in the result section of the main manuscript, where we discuss the sub-group analysis with no common antibiotics in the treatment arms 131 ff: “Furthermore, a lower number of antibiotics performed better than a higher number if the compared treatment arms had no antibiotics in common (pooled OR 4.73, 95% CI 2.14 – 10.42; I2\=37%, SI table S3), which could be due to different potencies or resistance prevalences of antibiotics as discussed in SI (SI section 6.1.10).” As mentioned above we also perform sub-group analyses, where only resistances of antibiotics common to both arms are considered (SI section 6.1.8). However, as discussed in the corresponding sections, the systematic assessment of antibiotic combination therapy remains challenging as not all resistances against antibiotics used in the arms were systematically measured and reported. Furthermore, the power of these sub-group analyses is naturally a concern, as they include fewer studies. 

      Another concern is about the definition of acquisition of resistance, which is unclear to me. If for example meropenem is administered and the follow-up cultures show Enterococcus species (which is intrinsically resistant to meropenem), does this constitute acquisition of resistance? If so, it would be misleading to determine this as an acquisition of resistance, as many people are colonized with Enterococci and selection of Enterococci under therapy is very common. If this is not considered as the acquisition of resistance please include how the acquisition of resistance is defined per included study. Table S1 is not sufficiently clear because it often only contains how susceptibility testing was done but not which antibiotics were tested and how a strain was classified as resistant or susceptible. 

      Thank you for pointing out this potential ambiguity. The definition of acquisition of resistance reads now (L 275 ff): “A patient was considered to have acquired resistance if, at the follow-up culture, a resistant bacterium (as defined by the study authors) was detected that was not present in the baseline culture.” We also changed the definition accordingly in the abstract (L 36 ff). We hope that the definition of acquisition is now clearer. Our definition of “acquisition of resistance” is agnostic to bacterial species and hence intrinsically resistant species, as the example raised by the reviewer, can be included if they were only detected during the follow-up culture by the studies. Generally, it was not always clear from the studies, which pathogens were screened for and whether the selection of intrinsically resistant bacteria was reported or not. Therefore, we rely on the studies' specifications of resistant and non-resistant without further distinction from our side, i.e. classifying data into intrinsic and non-intrinsic resistance. Overall, the outcome “acquisition of resistance” can be interpreted as a risk assessment for having any resistant bacterium during or after treatment. In contrast, the outcome “emergence of resistance” is more rigorous, demanding the same species to be detected as more resistant during or after treatment.

      The information, which antibiotic susceptibility tests were performed in each individual study can be found in the main text in table 1. However, we agree that this information should be better linked and highlighted again in table S1. We therefore now refer to table 1 in the table description of table S1. L134 ff.: “See table 1 in the main text for which antibiotics the antibiotics tested and reported extractable resistance data”. Furthermore, we added the breakpoints for resistant and susceptible classification if specifically stated in the main text of the study. However, we did not do further research into old guidelines, manufactures manuals or study protocols in case the breakpoints are not specifically stated in the main text as the main goal of this table, in our opinion, is to show a justification, why the studies could be considered for a resistance outcome. We therefore decided against further breakpoint investigations for studies, where the breakpoint is not specifically stated in the main text. 

      Line 85: "Even though within-patient antibiotic resistance development is rare, it may contribute to the emergence and spread of resistance." 

      Depending on the bug-drug combination, there is great variation in the propensity to develop within-patient antibiotic resistance. For example: within-patient development of ciprofloxacin resistance in Pseudomonas is fairly common while within-patient development of methicillin resistance in S. aureus is rare. Based on these differences, large clinical heterogeneity is expected and it is questionable where these studies should be pooled. 

      We agree that our formulation neglects differences in prevalence of within-host resistance emergence depending on bug-drug combinations. We changed our statement in L 86 to: “Within-patient antibiotic resistance development, even if rare, may contribute to the emergence and spread of resistance.”

      Line 114: "The overall pooled OR for acquisition of resistance comparing a lower number of antibiotics versus a higher one was 1.23 (95% CI 0.68 - 2.25), with substantial heterogeneity between studies (I2=77.4%)" 

      What consequential measures did the authors take after determining this high heterogeneity? Did they explore the source of this large heterogeneity? Considering this large heterogeneity, do the authors consider it appropriate to pool these studies?

      Thank you for highlighting this lack of clarity. As mentioned above, we now highlight that we performed several subgroup analyses to investigate heterogeneity. (L 116ff): “The overall pooled estimates are based on studies that focus on various clinical conditions/pathogens and compare different antibiotics treatments. To explore the impact of these and other potential sources of heterogeneity on the resistance estimates we performed various subgroup analyses and meta-regression.” Nevertheless, these analyses faced limitations due to the scarcity of evidence and often still showed a high amount of heterogeneity. Given the lack of appropriate evidence, it is hard to identify the source of heterogeneity. The decision to pool all studies was pre-specified in our previously published study protocol (PROSPERO CRD42020187257) and was motivated by the question whether there is a general effect of combination therapy on resistance development or identify factors that explain potential differences of the effect of combination therapy across bug-drug combinations. Therefore, we think that the presentation of the overall pooled estimate is appropriate, as it was predefined, and potential heterogeneity is furthermore explored in the subgroup analyses. 

      Reviewer #1 (Recommendations For The Authors): 

      I want to congratulate the investigators for the rigorous approach followed and the - in my opinion - correct interpretation of the data and analysis. The disappointing outcome is independent of the quality of the approach used. Yet, the consequences of that outcome are rather limited, and will not be surprising for - at least - some in the field of antibiotic resistance. 

      Thank you for your positive and differentiated feedback.

      Reviewer #2 (Recommendations For The Authors): 

      Line 93: "The screening of the citations of the 41 studies identified one additional eligible study, for a total of 42 studies". 

      Why was this study missed in the search strategy? 

      What is the definition of "quasi-RCTs"? Why were these included in the analysis? 

      Thank you for pointing out this lack of clarity. The additional study, which was found through screening the references of included studies, was not identified with our search strategy as neither the abstract nor database specific identifiers provided any indications that resistance was measured in this study. We added an explanation in the supplementary materials L 792 ff. and refer to this explanation in the main manuscript (L 95). 

      Quasi-randomized trials are trials that use allocation methods, which are not considered truly random. We added this specification in L 95. It now reads: “….two quasi-RCTs, where the allocation method used is not truly random” and in L 252 ff: “Studies were classified as quasi-RCTs if the allocation of participants to study arms was not truly random.” For instance, the study Macnab et al. (1994) assigned patients alternately to the treatment arms. Quasi-randomized controlled trials can lead to biases and especially old studies are more likely to have used quasi-random allocation methods. This can also be seen in our study, where the two quasi-randomized controlled trials were published in 1994 and 1997. The bias is considered in the risk of bias assessment and in our conducted sensitivity analysis regarding the impact of risk of bias on our estimates (supplementary information sections 3.0 and 4.2). Furthermore, one of the two previous conducted meta-analyses comparing beta-lactam monotherapy to beta-lactam and aminoglycoside, which assessed resistance development also included quasi-randomized controlled trials Paul et al 2014. Overall, while designing the study, we decided to include quasi-randomized controlled trials to increase statistical power as we expected that limited statistical power might be a concern and decided to assess potential biases in the risk of bias assessment.  

      Line 100: "Consequently, most studies did not have the statistical power to detect a large effect on within-patient resistance development (figure 2 B, SI p 14).". 

      Small studies actually have more power to detect large effects while smaller power to detect small effects. Please rephrase. 

      Thank you for pointing out this lack of clarity. We rephrased the sentence in order to emphasize our point that the studies are underpowered even if we assume in our power analysis a large effect on resistance development between treatment arms. In this context “the small” studies include too few patients to detect a large difference in resistance development. As resistance development is a rare event, generally studies have to include a larger number of patients to estimate the effect of intervention. We rephrased the sentence in L 101ff to: “Consequently, most studies did not have the statistical power to detect differences in within-patient resistance development even if we assume that the effect on resistance development is large between treatment arms.”

      Line 108: "... and prophylaxis for blood cancer patients with four studies (10%) respectively.". 

      I would suggest using the medical term hematological malignancy patients. 

      Thank you for the suggestion, we changed it as suggested to hematological malignancy patients, also accordingly in the figures, and table 1.

      Line 117: "Since the results for the two resistance outcomes are comparable, our focus in the following is on the acquisition of resistance". 

      The first OR is 1.23 and the second is 0.74, why do you consider these outcomes as comparable? 

      Thank you for pointing out our unprecise formulation. Due to the lack of power the exact estimates need to be interpreted with care. Here, we wanted to make the point that qualitatively the results of both outcomes do not differ in the sense that our analysis shows no substantial difference between a higher and a lower number of antibiotics. We rephrased the sentence to be more precise (L 123ff): “The results for the two resistance outcomes are qualitatively comparable in the sense that individual estimates may differ, but show similar absence of evidence to support either the benefit, harm or equivalence of treating with a higher number of antibiotics. Therefore, our …”. More detailed discussion about differences in estimates can be found in the SI, when the estimates of emergence of resistance are presented (e.g. SI section 2.1).

      Line 123: "Furthermore, a lower number of antibiotics performed better than a higher number if the compared treatment arms had no antibiotics in common (pooled OR 4.73, 95% CI 2.14 - 10.42; I 2 =37%, SI p 7).". 

      How do you explain this? What does this mean? 

      We now added a more detailed explanation in the supplement (L 376ff.): “The result that if the treatment arms had no antibiotics in common a lower number of antibiotics performed better than a higher number of antibiotics could be due to different potencies of antibiotics or resistance prevalences. Further, there could be a bias to combine less potent antibiotics or antibiotics with higher resistance prevalence to ensure treatment efficacy, which couldlead to higher chances to detect resistances in the treatment arm with higher number of antibiotics, e.g. by selecting pre-existing resistance due to antibiotic treatment (see also section 6.1.9).” We furthermore already specifically mention this point in the main manuscript and refer then to the detailed explanation in the SI (L134 ff, “which could be due to different potencies or resistance prevalences of antibiotics as discussed in SI (SI section 6.1.10)”)

      Overall, we want to point out that these results need to be interpreted with caution as overall the statistical power is limited to confidently estimate the difference in effect of a higher and lower number of antibiotics.

      Line 125: ". In contrast, when restricting the analysis to studies with at least one common antibiotic in the treatment arms are pooled there was little evidence of a difference (pooled OR 0.55, 95% CI 0.28 - 1.07". 

      The difference was not statistically significant but there does seem to be an indication of a difference, please rephrase. 

      We rephrased the sentence to (L135 ff.): “In contrast, when restricting the analysis to studies with at least one common antibiotic in the treatment arms we found no evidence of a difference, only a weak indication that a higher number of antibiotics performs better (pooled OR 0.55, 95% CI 0.28 – 1.07; I2 \=74%, figure 3B).” 

      Line 190: "Similarly, today, relevant cohort studies could be analysed collaboratively using various modern statistical methods to address confounding by indication and other biases (66, 67)". 

      However, residual confounding by indication is likely. Please also mention the disadvantages of observational studies compared to RCTs. 

      We now highlight that causal inference with observational data comes with its own challenges and stress that randomized controlled trials are still considered the gold standard. L 204ff now reads: “However, even with appropriate causal inference methods, residual confounding cannot be excluded when using observational data (67). Therefore, will remain the gold standard to estimate causal relationships.”

      Line 230: "Gram-negative bacteria have an outer membrane, which is absent in grampositive bacteria for instance, therefore intrinsic resistance against antibiotics can be observed in gram-negative bacteria (11)". 

      Intrinsic resistance is not unique for Gram-negative bacteria but also exists for Grampositive bacteria. 

      We agree with the reviewer that intrinsic resistance is not unique to gram-negative bacteria and refined our writing. We additionally added that differences between gram-negative and gram-positive bacteria are not only to be expected due to differing intrinsic resistances but also due to potential differences in the mechanistic interactions of antibiotics, i.e., synergy or antagonism. The paragraph reads now (SI L289): “The gram status of a bacterium may potentially determine how effective an antibiotic, or an antibiotic combination is. Differences between gram-negative and gram-positive bacteria such as distinct bacterial surface organisation can lead to specific intrinsic resistances of gram-negative and grampositive bacteria against antibiotics (55). These structural differences can lead to varying effects of antibiotic combinations between gram-negative and gram-positive bacteria (56).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 127. Provide a few more words describing the voltage protocol. To the uninitiated, panels A and B will be difficult to understand. "The large negative step is used to first close all channels, then probe the activation function with a series of depolarizing steps to re-open them and obtain the max conductance from the peak tail current at -36 mV. "

      We have revised the text as suggested (revision lines 127 to Line 131): “From a holding potential within the gK,L activation range (here –74 mV), the cell is hyperpolarized to –124 mV, negative to EK and the activation range, producing a large inward current through open gK,L channels that rapidly decays as the channels deactivate. We use the large transient inward current as a hallmark of gK,L. The hyperpolarization closes all channels, and then the activation function is probed with a series of depolarizing steps, obtaining the max conductance from the peak tail current at –44 mV (Fig. 1A).”

      Incidentally, why does the peak tail current decay? 

      We added this text to the figure legend to explain this: “For steps positive to the midpoint voltage, tail currents are very large. As a result, K+ accumulation in the calyceal cleft reduces driving force on K+, causing currents to decay rapidly, as seen in A (Lim et al., 2011).”

      The decay of the peak tail current is a feature of gK,L (large K+ conductance) and the large enclosed synaptic cleft (which concentrates K+ that effluxes from the HC). See Govindaraju et al. (2023) and Lim et al. (2011) for modeling and experiments around this phenomenon.

      Line 217-218. For some reason, I stumbled over this wording. Perhaps rearrange as "In type II HCs absence of Kv1.8 significantly increased Rin and tauRC. There was no effect on Vrest because the conductances to which Kv1.8 contributes, gA and gDR activate positive to the resting potential. (so which K conductances establish Vrest???). 

      We kept our original wording because we wanted to discuss the baseline (Vrest) before describing responses to current injection.

      Vrest is presumably maintained by ATP-dependent Na/K exchangers (ATP1a1), HCN, Kir, and mechanotransduction currents. Repolarization is achieved by delayed rectifier and A-type K+ conductances in type II HCs.

      Figure 4, panel C - provides absolute membrane potential for voltage responses. Presumably, these were the most 'ringy' responses. Were they obtained at similar Vm in all cells (i.e., comparisons of Q values in lines 229-230). 

      We added the absolute membrane potential scale. Type II HC protocols all started with 0 pA current injection at baseline, so they were at their natural Vrest, which did not differ by genotype or zone. Consistent with Q depending on expression of conductances that activate positive to Vrest, Q did not co-vary with Vrest (Pearson’s correlation coefficient = 0.08, p = 0.47, n= 85).

      Lines 254. Staining is non-specific? Rather than non-selective? 

      Yes, thanks - Corrected (Line 264).

      Figure 6. Do you have a negative control image for Kv1.4 immuno? Is it surprising that this label is all over the cell, but Kv1.8 is restricted to the synaptic pole? 

      We don’t have a null-animal control because this immunoreactivity was done in rat. While the cuticular plate staining was most likely nonspecific because we see that with many different antibodies, it’s harder to judge the background staining in the hair cell body layer. After feedback from the reviewers, we decided to pull the KV1.4 immunostaining from the paper because of the lack of null control, high background, and inability to reproduce these results in mouse tissue. In our hands, in mouse tissue, both mouse and rabbit anti-KV1.4 antibodies failed to localize to the hair cell membrane. Further optimization or another method could improve that, but for now the single-cell expression data (McInturff et al., 2018) remain the strongest evidence for KV1.4 expression in murine type II hair cells.

      Lines 400-404. Whew, this is pretty cryptic. Expand a bit? 

      We simplified this paragraph (revision lines 411-413): “We speculate that gA and gDR(KV1.8) have different subunit composition: gA may include heteromers of KV1.8 with other subunits that confer rapid inactivation, while gDR(KV1.8) may comprise homomeric KV1.8 channels, given that they do not have N-type inactivation .”

      Line 428. 'importantly different ion channels'. I think I understand what is meant but perhaps say a bit more. 

      Revised (Line 438): “biophysically distinct and functionally different ion channels”.

      Random thought. In addition to impacting Rin and TauRC, do you think the more negative Vrest might also provide a selective advantage by increasing the driving force on K entry from endolymph? 

      When the calyx is perfectly intact, gK,L is predicted to make Vrest less negative than the values we report in our paper, where we have disturbed the calyx to access the hair cell (–80, Govindaraju et al., 2023, vs. –87 mV, here). By enhancing K+ accumulation in the calyceal cleft, the intact calyx shifts EK—and Vrest—positively (Lim et al., 2011), so the effect on driving force may not be as drastic as what you are thinking.

      Reviewer #2 (Recommendations For The Authors):

      (1) Introduction: wouldn't the small initial paragraph stating the main conclusion of the study fit better at the end of the background section, instead of at the beginning? 

      Thank you for this idea, we have tried that and settled on this direct approach to let people know in advance what the goals of the paper are.

      (2) Pg.4: The following sentence is rather confusing "Between P5 and P10, we detected no evidence of a non-gK,L KV1.8-dependent.....". Also, Suppl. Fig 1A seems to show that between P5 and P10 hair cells can display a potassium current having either a hyperpolarised or depolarised Vhalf. Thus, I am not sure I understand the above statement. 

      Thank you for pointing out unclear wording. We used the more common “delayed rectifier” term in our revision (Lines 144-147): “Between P5 and P10, some type I HCs have not yet acquired the physiologically defined conductance, gK,L.. N effects of KV1.8 deletion were seen in the delayed rectifier currents of immature type I HCs (Suppl. Fig. 1B), showing that they are not immature forms of the Kv1.8-dependent gK,L channels. ”

      (3) For the reduced Cm of hair cells from Kv1.8 knockout mice, could another reason be simply the immature state of the hair cells (i.e. lack of normal growth), rather than less channels in the membrane? 

      There were no other signs to suggest immaturity or abnormal growth in KV1.8–/– hair cells or mice. Importantly, type II HCs did not show the same Cm effect.

      We further discussed the capacitance effect in lines 160-167: “Cm scales with surface area, but soma sizes were unchanged by deletion of KV1.8 (Suppl. Table 2). Instead, Cm may be higher in KV1.8+/+ cells because of gK,L for two reasons. First, highly expressed trans-membrane proteins (see discussion of gK,L channel density in Chen and Eatock, 2000) can affect membrane thickness (Mitra et al., 2004), which is inversely proportional to specific Cm. Second, gK,L could contaminate estimations of capacitive current, which is calculated from the decay time constant of transient current evoked by small voltage steps outside the operating range of any ion channels. gK,L has such a negative operating range that, even for Vm negative to –90 mV, some gK,L channels are voltage-sensitive and could add to capacitive current.”

      (4) Methods: The electrophysiological part states that "For most recordings, we used .....". However, it is not clear what has been used for the other recordings.

      Thanks for catching this error, a holdover from an earlier ms. version.  We have deleted “For most recordings” (revision line 466).

      Also, please provide the sign for the calculated 4 mV liquid junction potential. 

      Done (revision line 476).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Some of the data in panels in Fig. 1 are hard to match up. The voltage protocols shown in A and B show steps from hyperpolarized values to -71mV (A) and -32 mV (B). However, the value from A doesn't seem to correspond with the activation curve in C.

      Thank you for catching this.  We accidentally showed the control I-X curve from a different cell than that in A. We now show the G-V relation for the cell in A.

      Also the Vhalf in D for -/- animals is ~-38 mV, which is similar to the most positive step shown in the protocol.

      The most positive step in Figure 1B is actually –25 mV. The uneven tick labels might have been confusing, so we re-labeled them to be more conventional.

      Were type I cells stepped to more positive potentials to test for the presence of voltage-activated currents at greater depolarizations? This is needed to support the statement on lines 147-148. 

      We added “no additional K+ conductance activated up to +40 mV” (revision line 149-150).  Our standard voltage-clamp protocol iterates up to ~+40 mV in KV1.8–/– hair cells, but in Figure 1 we only showed steps up to –25 mV because K+ accumulation in the synaptic cleft with the calyx distorts the current waveform even for the small residual conductances of the knockouts. KV1.8–/– hair cells have a main KV conductance with a Vhalf of ~–38 mV, as shown in Figure 1, and we did not see an additional KV conductance that activated with a more positive Vhalf up to +40 mV.

      (2) Line 151 states "While the cells of Kv1.8-/- appeared healthy..." how were epithelia assessed for health? Hair cells arise from support cells and it would be interesting to know if Kv1.8 absence influences supporting cells or neurons. 

      We added our criteria for cell health to lines 477-479: “KV1.8–/– hair cells appeared healthy in that cells had resting potentials negative to –50 mV, cells lasted a long time (20-30 minutes) in ruptured patch recordings, membranes were not fragile, and extensive blebbing was not seen.”

      Supporting cells were not routinely investigated. We characterized calyx electrical activity (passive membrane properties, voltage-gated currents, firing pattern) and didn’t detect differences between +/+, +/–, and –/– recordings (data not shown). KV1.8 was not detected in neural tissue (Lee et al., 2013). 

      (3) Several different K+ channel subtypes were found to contribute to inner hair cell K+ conductances (Dierich et al. 2020) but few additional K+ channel subtypes are considered here in vestibular hair cells. Further comments on calcium-activated conductances (lines 310-317) would be helpful since apamin-sensitive SK conductances are reported in type II hair cells (Poppi et al. 2018) and large iberiotoxin-sensitive BK conductances in type I hair cells (Contini et al. 2020). Were iberiotoxin effects studied at a range of voltages and might calcium-dependent conductances contribute to the enhanced resonance responses shown in Fig. 4? 

      We refer you to lines 310-317 in the original ms (lines 322-329 in the revised ms), where we explain possible reasons for not observing IK(Ca) in this study.

      (4) Similar to GK,L erg (Kv11) channels show significant Cs+-permeability. Were experiments using Cs+ and/or Kv11 antagonists performed to test for Kv11? 

      No. Hurley et al. (2006) used Kv11 antagonists to reveal Kv11 currents in rat utricular type I hair cells with perforated patch, which were also detected in rats with single-cell RT-PCR (Hurley et al. 2006) and in mice with single-cell RNAseq (McInturff et al., 2018).  They likely contribute to hair cell currents, alongside Kv7, Kv1.8, HCN1, and Kir. 

      (5) Mechanosensitive ("MET") channels in hair cells are mentioned on lines 234 and 472 (towards the end of the Discussion), but a sentence or two describing the sensory function of hair cells in terms of MET channels and K+ fluxes would help in the Introduction too. 

      Following this suggestion we have expanded the introduction with the following lines  (78-87): “Hair cells are known for their large outwardly rectifying K+ conductances, which repolarize membrane voltage following a mechanically evoked perturbation and in some cases contribute to sharp electrical tuning of the hair cell membrane.  Because gK,L is unusually large and unusually negatively activated, it strongly attenuates and speeds up the receptor potentials of type I HCs (Correia et al., 1996; Rüsch and Eatock, 1996b). In addition, gK,L augments a novel non-quantal transmission from type I hair cell to afferent calyx by providing open channels for K+ flow into the synaptic cleft (Contini et al., 2012, 2017, 2020; Govindaraju et al., 2023), increasing the speed and linearity of the transmitted signal (Songer and Eatock, 2013).”

      (6) Lines 258-260 state that GKL does not inactivate, but previous literature has documented a slow type of inactivation in mouse crista and utricle type I hair cells (Lim et al. 2011, Rusch and Eatock 1996) which should be considered. 

      Lim et al. (2011) concluded that K+ accumulation in the synaptic cleft can explain much of the apparent inactivation of gK,L. In our paper, we were referring to fast, N-type inactivation. We changed that line to be more specific; new revision lines 269-271: “KV1.8, like most KV1 subunits, does not show fast inactivation as a heterologously expressed homomer (Lang et al., 2000; Ranjan et al., 2019; Dierich et al., 2020), nor do the KV1.8-dependent channels in type I HCs, as we show, and in cochlear inner hair cells (Dierich et al., 2020).”

      (7) Lines 320-321 Zonal differences in inward rectifier conductances were reported previously in bird hair cells (Masetto and Correia 1997) and should be referenced here.

      Zonal differences were reported by Masetto and Correia for type II but not type I avian hair cells, which is why we emphasize that we found a zonal difference in I-H in type I hair cells. We added two citations to direct readers to type II hair cell results (lines 333-334): “The gK,L knockout allowed identification of zonal differences in IH and IKir in type I HCs, previously examined in type II HCs (Masetto and Correia, 1997; Levin and Holt, 2012).”

      Also, Horwitz et al. (2011) showed HCN channels in utricles are needed for normal balance function, so please include this reference (see line 171). 

      Done (line 184).

      (8) Fig 6A. Shows Kv1.4 staining in rat utricle but procedures for rat experiments are not described. These should be added. Also, indicate striola or extrastriola regions (if known). 

      We removed KV1.4 immunostaining from the paper, see above.

      (9) Table 6, ZD7288 is listed -was this reagent used in experiments to block Gh? If not please omit. 

      ZD7288 was used to block gH to produce a clean h-infinity curve in Figure 6, which is described in the legend.

      (10) In supplementary Fig. 5A make clear if the currents are from XE991 subtraction. Also, is the G-V data for single cell or multiple cells in B? It appears to be from 1 cell but ages P11-505 are given in legend. 

      The G-V curve in B is from XE991 subtraction, and average parameters in the figure caption are for all the KV1.8–/–  striolar type I hair cells where we observed this double Boltzmann tail G-V curve. I added detail to the figure caption to explain this better.

      (11) Supplementary Fig. 6A claims a fast activation of inward rectifier K+ channels in type II but not type I cells-not clear what exactly is measured here.

      We use “fast inward rectifier” to indicate the inward current that increases within the first 20 ms after hyperpolarization from rest (IKir, characterized in Levin & Holt, 2012) in contrast to HCN channels, which open over ~100 ms. We added panel C to show that the activation of IKir is visible in type II hair cells but not in the knockout type I hair cells that lack gK,L. IKir was a reliable cue to distinguish type I and type II hair cells in the knockout.

      For our actual measurements in Fig 6B, we quantified the current flowing after 250 ms at –124 mV because we did not pharmacologically separate IKir and IH.

      Could the XE991-sensitive current be activated and contributing?

      The XE991-sensitive current could decay (rapidly) at the onset of the hyperpolarizing step, but was not contributing to our measurement of IKir­ and IH, made after 250 ms at –124 mV, at which point any low-voltage-activated (LVA) outward rectifiers have deactivated. Additionally, the LVA XE991-sensitive currents were rare (only detected in some striolar type I hair cells) and when present did not compete with fast IKir, which is only found in type II hair cells.

      Also, did the inward rectifier conductances sustain any outward conductance at more depolarized voltage steps? 

      For the KV1.8-null mice specifically, we cannot answer the question because we did not use specific blocking agents for inward rectifiers.  However, we expect that there would only be sustained outward IR currents at voltages between EK and ~-60 mV: the foot of IKir’s I-V relation according to published data from mouse utricular hair cells – e.g., Holt and Eatock 1995, Rusch and Eatock 1996, Rusch et al. 1998, Horwitz et al., 2011, etc.  Thus, any such current would be unlikely to contaminate the residual outward rectifiers in Kv1.8-null animals, which activate positive to ~-60 mV. 

      (I-HCN is also not a problem, because it could only be outward positive to its reversal potential at ~-40 mV, which is significantly positive to its voltage activation range.)

    1. Author response:

      (1) Reviewer 1 suggested that we repeat the analyses in additional ROIs in the prefrontal cortex (PFC). We appreciate this suggestion and believe it will contribute to a comprehensive understanding of the current findings. These results will be included in the revision.

      (2) Reviewer 1 suggested that we also examine results in motor-related ROIs to rule out influences from response planning. We would like to note that our experimental design makes it unlikely that response planning would have influenced our results, as participants were unable to plan their motor responses in advance due to randomized response mapping on a trial-by-trial basis. Nevertheless, we agree with the reviewer that showing results from motor-related ROIs is important, and will include these results in the revision.

      (3) Reviewer 1 raised a question about the effect size of the results across different ROIs. In our manuscript, we tried to avoid direct comparisons of representational strength across ROIs, by focusing on the differences in representational strength between conditions within the same ROI. Nevertheless, we agree that clarifying this issue is important, which we will address in the revision.

      (4) Reviewer 2 raised a concern about the similarity between the RNN and fMRI results. We acknowledge that the complexity of our results makes it challenging to replicate all fMRI findings within a single RNN (e.g., simulating three brain regions in a single network with distinct result patterns). Nonetheless, the current RNNs effectively captured our key fMRI findings, including increased stimulus representation in frontal cortex as well as the tradeoff in category representation with varying levels of flexible control. Reviewer 2 also made several suggestions in tweaking the RNN structure and in choosing alternative analysis methods. We are happy to carry out these points as we think they could potentially increase the alignment between the two modalities.

    1. Author response:

      We are grateful to the reviewers and editors for their insightful comments. All recognized that, while mutation recurrences have been used for inferring cancer drivers, our approach has the rigor of quantitative analysis. We would like to add that, without rigorously ruling out mutational hotspots, most CDNs have not been accepted as driver mutations.

      This paper develops the theory stating that (i) recurrent point mutations are true Cancer Driving Nucleotides (CDNs); and (ii) non-recurrent mutations are unlikely to be CDNs. The reviewers question that, with the theory, we still have not discovered new driving mutations. This is done in the companion paper. Table 3 shows that, averaged across cancer types, the conventional method would identify 45 CDGs while the CDN method tallies 258 CDGs. The power of the CDN method in identifying new driver genes is evident.

      The second question is "By this theory, will we be able discover most CDNs when the sample size increases from ~ 1000 to 10,000?" This is a question of forecast and can be partially answered using GENIE data. Fig. 7 of this study shows that, when n increases from ~ 1000 to ~ 9,000, the numbers of discovered CDNs increase by 3 – 5 fold, most of which come from the two-hit class, as expected.

      Fig. 7 also addresses the queries whether we have used datasets other than TCGA. We indeed have used all public data, including GENIE, ICGC and other integrated resources such as COSMIC. For the main study, we rely on TCGA because it is unbiased for estimating the probability of CDN occurrences. In many datasets, the numerators are given but the denominators are not (the number of patients with the mutation / the total number of patients surveyed). 

      The third question is about mutation recurrences among cancer types. As stated by one reviewer, "different cancer types have unique mutational landscapes". While this is true when the analysis is done at the whole-gene level, one gets a different picture at the nucleotide level where the resolution is much higher. The pan-cancer trend of point mutations is evident in Fig. 4 of the companion paper.

      Again, we heartily appreciate the criticisms and suggestions of the reviewers and editors!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Overall the manuscript is well written, and the successful generation of the new endogenous Cac tags (Td-Tomato, Halo) and CaBeta, stj, and stolid genes with V5 tags will be powerful reagents for the field to enable new studies on calcium channels in synaptic structure, function, and plasticity. There are also some interesting, though not entirely unexpected, findings regarding how Brp and homeostatic plasticity modulate calcium channel abundance. However, a major concern is that the conclusions about how "molecular and organization diversity generate functional synaptic heterogeneity" are not really supported by the data presented in this study. In particular, the key fact that frames this study is that Cac levels are similar at Ib and Is active zones, but that Pr is higher at Is over Ib (which was previously known). While Pr can be influenced by myriad processes, the authors should have first assessed presynaptic calcium influx - if they had, they would have better framed the key questions in this study. As the authors reference from previous studies, calcium influx is at least two-fold higher per active zone at Is over Ib, and the authors likely know that this difference is more than sufficient to explain the difference in Pr at Is over Ib. Hence, there is no reason to invoke differences in "molecular and organization diversity" to explain the difference in Pr, and the authors offer no data to support that the differences in active zone structure at Is vs Ib are necessary for the differences in Pr. Indeed, the real question the authors should have investigated is why there are such differences in presynaptic calcium influx at Is over Ib despite having similar levels/abundance of Cac. This seems the real question, and is all that is needed to explain the Pr differences shown in Fig. 1. The other changes in active zone structure and organization at Is vs Ib may very well contribute to additional differences in Pr, but the authors have not shown this in the present study, and rely on other studies (such as calcium-SV coupling at Is vs Ib) to support an argument that is not necessitated by their data. At the end of this manuscript, the authors have found an interesting possibility that Stj levels are reduced at Is vs Ib, that might perhaps contribute to the difference in calcium influx. However, at present this remains speculative.

      Overall, the authors have generated powerful reagents for the field to study calcium channels and how they are regulated, but draw conclusions about active zone structure and organization contributing to functional heterogeneity that are not strongly supported by the data presented.

      Reviewer 1 raises an interesting question that we agree will form the basis of important studies. Here, we set out to address a different question, which we will work to better frame. While we and others had previously found a strong correlation between calcium channel abundance and synaptic release probability (Pr (Akbergenova et al., 2018; Gratz et al., 2019; Holderith et al., 2012; Nakamura et al., 2015; Sheng et al., 2012)), more recent studies found that calcium channel abundance does not necessarily predict synaptic strength (Aldahabi et al., 2022; Rebola et al., 2019). Our study explores this paradox and presents findings that provide an explanation: calcium channel abundance predicts Pr among individual synapses of either low-Pr type-Ib or high-Pr type-Is inputs where modulating channel number tunes synaptic strength, but does not predict Pr between the two inputs, indicating an inputspecific role for calcium channel abundance in promoting synaptic strength. Thus, we propose that calcium channel abundance predictably modulates synaptic strength among individual synapses of a single input or synapse subtype, which share similar molecular and spatial organization, but not between distinct inputs where the underlying organization of active zones differs. Consistently, in the mouse, calcium channel abundance correlates strongly with release probability specifically when assessed among homogeneous populations of connections (Aldahabi et al., 2022; Holderith et al., 2012; Nakamura et al., 2015; Rebola et al., 2019; Sheng et al., 2012).

      As Reviewer 1 notes, the two-fold difference in calcium influx at type-Is synapses is certainly an important difference underlying three-fold higher Pr. However, growing evidence indicates that calcium influx alone, like calcium channel abundance, does not reliably predict synaptic strength between inputs. For example, Rebola et al. (2019) compared cerebellar synapses formed by granule and stellate cells and found that lower Pr granule synapses exhibit both higher calcium channel abundance and calcium influx. In another example, Aldahabi et al. (2023) demonstrate that even when calcium influx is greater at high-Pr synapses, it does not necessarily explain differences in synaptic strength between inputs. Studying excitatory hippocampal CA1 synapses onto distinct interneuronal targets, they found that raising calcium entry at low-Pr inputs to high-Pr synapse levels is not sufficient to increase synaptic strength to high-Pr synapse levels. Similarly, at the Drosophila NMJ, the finding that type-Ib synapses exhibit loose calcium channel-synaptic vesicle coupling whereas type-Is synapses exhibit tight coupling suggests factors beyond calcium influx also contribute to differences in Pr between the two inputs (He et al., 2023). Consistently, a two-fold increase in external calcium does not induce a three-fold increase in release at low-Pr type-Ib synapses (He et al., 2023). Thus, upon finding that calcium channel abundance is similar at type-Ib and -Is synapses, we focused on identifying differences beyond calcium channel abundance and calcium influx that might contribute their distinct synaptic strengths. We agree that these studies, ours included, cannot definitively determine the contribution of identified organizational differences to distinct release probabilities because it is not currently possible to specifically alter subsynaptic organization, and will ensure that our language is tempered accordingly. However, in addition to the studies cited above and our findings, recent work demonstrating that homeostatic potentiation of neurotransmitter release is accompanied by greater spatial compaction of multiple active zone proteins (Dannhauser et al., 2022; Mrestani et al., 2021) and decreased calcium channel mobility (Ghelani et al., 2023) provide support for the interpretation that subsynaptic organization is a key parameter for modulating Pr.

      Reviewer #2 (Public Review):

      The authors aim to investigate how voltage-gated calcium channel number, organization, and subunit composition lead to changes in synaptic activity at tonic and phasic motor neuron terminals, or type Is and Ib motor neurons in Drosophila. These neuron subtypes generate widely different physiological outputs, and many investigations have sought to understand the molecular underpinnings responsible for these differences. Additionally, these authors explore not only static differences that exist during the third-instar larval stage of development but also use a pharmacological approach to induce homeostatic plasticity to explore how these neuronal subtypes dynamically change the structural composition and organization of key synaptic proteins contributing to physiological plasticity. The Drosophila neuromuscular junction (NMJ) is glutamatergic, the main excitatory neurotransmitter in the human brain, so these findings not only expand our understanding of the molecular and physiological mechanisms responsible for differences in motor neuron subtype activity but also contribute to our understanding of how the human brain and nervous system functions.

      The authors employ state-of-the-art tools and techniques such as single-molecule localization microscopy 3D STORM and create several novel transgenic animals using CRISPR to expand the molecular tools available for exploration of synaptic biology that will be of wide interest to the field. Additionally, the authors use a robust set of experimental approaches from active zone level resolution functional imaging from live preparations to electrophysiology and immunohistochemical analyses to explore and test their hypotheses. All data appear to be robustly acquired and analyzed using appropriate methodology. The authors make important advancements to our understanding of how the different motor neuron subtypes, phasic and tonic-like, exhibit widely varying electrical output despite the neuromuscular junctions having similar ultrastructural composition in the proteins of interest, voltage gated calcium channel cacophony (cac) and the scaffold protein Bruchpilot (brp). The authors reveal the ratio of brp:cac appears to be a critical determinant of release probability (Pr), and in particular, the packing density of VGCCs and availability of brp. Importantly, the authors demonstrate a brp-dependent increase in VGCC density following acute philanthotoxin perfusion (glutamate receptor inhibitor). This VGCC increase appears to be largely responsible for the presynaptic homeostatic plasticity (PHP) observable at the Drosophila NMJ. Lastly, the authors created several novel CRISPRtagged transgenic lines to visualize the spatial localization of VGCC subunits in Drosophila. Two of these lines, CaBV5-C and stjV5-N, express in motor neurons and in the nervous system, localize at the NMJ, and most strikingly, strongly correlate with Pr at tonic and phasic-like terminals.

      (1) The few limitations in this study could be addressed with some commentary, a few minor follow-up analyses, or experiments. The authors use a postsynaptically expressed calcium indicator (mhcGal4>UAS -GCaMP) to calculate Pr, yet do not explore the contribution that glutamate receptors, or other postsynaptic contributors (e.g. components of the postsynaptic density, PSD) may contribute. A previous publication exploring tonic vs phasic-like activity at the drosophila NMJ revealed a dynamic role for GluRII (Aponte-Santiago et al, 2020). Could the speed of GluR accumulation account for differences between neuron subtypes?

      We did observe that GCaMP signals are higher at type Is synapses, where synapses tend to form later but GluRs accumulate more rapidly upon innervation (Aponte-Santiago et al., 2020). However, because we are using our GCaMP indicator as a plus/minus readout of synaptic vesicle release at mature synapses, we do not expect differences in GluR accumulation to have a significant effect on our measures. Consistently, the difference in Pr we observe between type-Ib and -Is inputs (Fig. 1C) is similar to that previously reported (He et al., 2023; Lu et al., 2016; Newman et al., 2022).

      (2) The observation that calcium channel density and brp:cac ratio as a critical determinant of Pr is an important one. However, it is surprising that this was not observed in previous investigations of cac intensity (of which there are many). Is this purely a technical limitation of other investigations, or are other possibilities feasible? Additionally, regarding VGCC-SV coupling, the authors conclude that this packing density increases their proximity to SVs and contributes to the steeper relationship between VGCCs and Pr at phasic type Is. Is it possible that brp or other AZ components could account for these differences. The authors possess the tools to address this directly by labeling vesicles with JanellaFluor646; a stronger signal should be present at Is boutons. Additionally, many different studies have used transmission electron microscopy to explore SVs location to AZs (t-bars) at the Drosophila NMJ.

      To date, the molecular underpinnings of heterogeneity in synaptic strength have primarily been investigated among individual type-Ib synapses. However, a recent study investigating differences between type-Ib and -Is synapses also found that the Cac:Brp ratio is higher at type-Is synapses (He et al., 2023).

      At this point, we do not know which active zone components are responsible for the organizational (Figs. 1, 2) and coupling (now demonstrated by He et al., 2023) differences between type-Ib and -Is synapses or what establishes the differences in active zone protein levels we observe (Figs. 3,6), although Brp likely plays a local role. We find that Brp is required for dynamically regulating calcium channel levels during homeostatic plasticity and plays distinct roles at type-Ib and -Is synapses (Figs. 3, 4). Brp regulates a number of proteins critical for the distribution of docked synaptic vesicles near T bars of type Ib active zones, including Unc13 (Bohme et al., 2016). Extending these studies to type-Is synapses will be of great interest.

      (3) In reference to the contradictory observations that VGCC intensity does not always correlate with, or determine Pr. Previous investigations have also observed other AZ proteins or interactors (e.g. synaptotagmin mutants) critically control release, even when the correlation between cac and release remains constant while Pr dramatically precipitates.

      This is an important point as a number of molecular and organizational differences between high- and low-Pr synapses certainly contribute to baseline functional differences. The other proteins we (Figs. 3,6) and others (Dannhauser et al., 2022; Ehmann et al., 2014; He et al., 2023; Jetti et al., 2023; Mrestani et al., 2021; Newman et al., 2022) have investigated are less abundant and/or more densely organized at type-Is synapses. Investigating additional active zone proteins, including synaptic proteins, and determining how these factors combine to yield increased synaptic strength are important next steps.

      (4) To confirm the observations that lower brp levels results in a significantly higher cac:brp ratio at phasic-like synapses by organizing VGCCs; this argument could be made stronger by analyzing their existing data. By selecting a population of AZs in Ib boutons that endogenously express normal cac and lower brp levels, the Pr from these should be higher than those from within that population, but comparable to Is Pr. I believe the authors should also be able to correlate the cac:brp ratio with Pr from their data set generally; to determine if a strong correlation exists beyond their observation for cac correlation.

      We do not have simultaneous measures of Pr and Cac and Brp abundance. However, our findings suggest that distinct Cac:Brp ratios at type Ib and Is inputs reflect underlying organizational differences that contribute to distinct release probabilities between the two synaptic subtypes. In contrast, within either synaptic subtype, release probability is positively correlated with both Cac and Brp levels. Thus, the mechanisms driving functional differences between synaptic subtypes are distinct from those driving functional heterogeneity within a subtype, so we do not expect Cac:Brp ratio to correlate with Pr among individual type-Ib synapses. We will work to clarify this point in the revised text.

      (5) For the philanthotoxin induced changes in cac and brp localization underlying PHP, why do the authors not show cac accumulation after PhTx on live dissected preparations (i.e. in real time)? This also be an excellent opportunity to validate their brp:cac theory. Do the authors observe a dynamic change in brp:cac after 1, or 5 minutes; do Is boutons potentiate stronger due to proportional increases in cac and brp? Also regarding PhTx-induced PHP, their observations that stj and α2δ-3 are more abundant at Is synapses, suggests that they may also play a role in PhTx induced changes in cac. If either/both are overexpressed during PhTx, brp should increase while cac remains constant. These accessory proteins may determine cac incorporation at AZs.

      As we have previously followed Cac accumulation in live dissected preparations and found that levels increase proportionally across individual synapses (Gratz et al., 2019), we did not attempt to repeat these challenging experiments at smaller type-Is synapses. We will reanalyze our data to investigate Cac:Brp ratio at individual active zones post PhTx. However, as noted above, we do not expect changes in the Cac:Brp ratio to correlate with Pr among individual synapses of single inputs as this measure reflects organization differences between inputs and PhTx induces an increase in the abundance of both proteins at both inputs.

      Determining the effect of PhTx on Stj levels at type-Ib and -Is active zones is an excellent idea and might provide insight into how lower Stj levels correlate with higher Pr at type-Is synapses. While prior studies have demonstrated critical roles for Stj in regulating Cac accumulation during development and in promoting presynaptic homeostatic potentiation (Cunningham et al., 2022; Dickman et al., 2008; Kurshan et al., 2009; Ly et al., 2008; Wang et al., 2016), its regulation during PHP has not been investigated.

      Taken together this study generates important data-driven, conceptional, and theoretical advancements in our understanding of the molecular underpinnings of different motor neurons, and our understanding of synaptic biology generally. The data are robust, thoroughly analyzed, appropriately depicted. This study not only generates novel findings but also generated novel molecular tools which will aid future investigations and investigators progress in this field.

      References

      Akbergenova, Y., K.L. Cunningham, Y.V. Zhang, S. Weiss, and J.T. Littleton. 2018. Characterization of developmental and molecular factors underlying release heterogeneity at Drosophila synapses. eLife. 7.

      Aldahabi, M., F. Balint, N. Holderith, A. Lorincz, M. Reva, and Z. Nusser. 2022. Different priming states of synaptic vesicles underlie distinct release probabilities at hippocampal excitatory synapses. Neuron. 110:4144-4161 e4147.

      Aponte-Santiago, N.A., K.G. Ormerod, Y. Akbergenova, and J.T. Littleton. 2020. Synaptic Plasticity Induced by Differential Manipulation of Tonic and Phasic Motoneurons in Drosophila. The Journal of neuroscience : the official journal of the Society for Neuroscience. 40:6270-6288.

      Bohme, M.A., C. Beis, S. Reddy-Alla, E. Reynolds, M.M. Mampell, A.T. Grasskamp, J. Lutzkendorf, D.D. Bergeron, J.H. Driller, H. Babikir, F. Gottfert, I.M. Robinson, C.J. O'Kane, S.W. Hell, M.C. Wahl, U. Stelzl, B. Loll, A.M. Walter, and S.J. Sigrist. 2016. Active zone scaffolds differentially accumulate Unc13 isoforms to tune Ca(2+) channel-vesicle coupling. Nature neuroscience. 19:1311-1320.

      Cunningham, K.L., C.W. Sauvola, S. Tavana, and J.T. Littleton. 2022. Regulation of presynaptic Ca(2+) channel abundance at active zones through a balance of delivery and turnover. Elife. 11.

      Dannhauser, S., A. Mrestani, F. Gundelach, M. Pauli, F. Komma, P. Kollmannsberger, M. Sauer, M. Heckmann, and M.M. Paul. 2022. Endogenous tagging of Unc-13 reveals nanoscale reorganization at active zones during presynaptic homeostatic potentiation. Front Cell Neurosci. 16:1074304.

      Dickman, D.K., P.T. Kurshan, and T.L. Schwarz. 2008. Mutations in a Drosophila alpha2delta voltage gated calcium channel subunit reveal a crucial synaptic function. The Journal of neuroscience : the official journal of the Society for Neuroscience. 28:31-38.

      Ehmann, N., S. Van De Linde, A. Alon, D. Ljaschenko, X.Z. Keung, T. Holm, A. Rings, A. Diantonio, S. Hallermann, U. Ashery, M. Heckmann, M. Sauer, and R.J. Kittel. 2014. Quantitative super-resolution imaging of Bruchpilot distinguishes active zone

      states. Nature Communications. 5.

      Ghelani, T., M. Escher, U. Thomas, K. Esch, J. Lützkendorf, H. Depner, M. Maglione, P. Parutto, S. Gratz, T. Matkovic-Rachid, S. Ryglewski, A.M. Walter, D. Holcman, K. O‘Connor Giles, M. Heine, and S.J. Sigrist. 2023. Interactive nanocluster compaction of the ELKS scaffold and Cacophony Ca<sup>2+</sup> channels drives sustained active zone potentiation. Science Advances. 9:eade7804.

      Gratz, S.J., P. Goel, J.J. Bruckner, R.X. Hernandez, K. Khateeb, G.T. Macleod, D. Dickman, and K.M. O'Connor-Giles. 2019. Endogenous tagging reveals differential regulation of Ca<sup>2+</sup> channels at single AZs during presynaptic homeostatic potentiation and depression. The Journal of Neuroscience:3068-3018.

      He, K., Y. Han, X. Li, R.X. Hernandez, D.V. Riboul, T. Feghhi, K.A. Justs, O. Mahneva, S. Perry, G.T. Macleod, and D. Dickman. 2023. Physiologic and Nanoscale Distinctions Define Glutamatergic Synapses in Tonic vs Phasic Neurons. The Journal of neuroscience : the official journal of the Society for Neuroscience. 43:4598-4611.

      Holderith, N., A. Lorincz, G. Katona, B. Rózsa, A. Kulik, M. Watanabe, and Z. Nusser. 2012. Release probability of hippocampal glutamatergic terminals scales with the size of the active zone. Nature neuroscience. 15:988-997.

      Jetti, S.K., A.B. Crane, Y. Akbergenova, N.A. Aponte-Santiago, K.L. Cunningham, C.A. Whittaker, and J.T. Littleton. 2023. Molecular Logic of Synaptic Diversity Between Drosophila Tonic and Phasic Motoneurons. bioRxiv:2023.2001.2017.524447.

      Kurshan, P.T., A. Oztan, and T.L. Schwarz. 2009. Presynaptic alpha2delta-3 is required for synaptic morphogenesis independent of its Ca2+-channel functions. Nature neuroscience. 12:1415-1423.

      Lu, Z., A.K. Chouhan, J.A. Borycz, Z. Lu, A.J. Rossano, K.L. Brain, Y. Zhou, I.A. Meinertzhagen, and G.T. Macleod. 2016. High-Probability Neurotransmitter Release Sites Represent an Energy-Efficient Design. Current biology : CB. 26:2562-2571.

      Ly , C.V., C.-K. Yao , P. Verstreken , T. Ohyama , and H.J. Bellen 2008. straightjacket is required for the synaptic stabilization of cacophony, a voltage-gated calcium channel α1 subunit. Journal of Cell Biology. 181:157-170.

      Mrestani, A., M. Pauli, P. Kollmannsberger, F. Repp, R.J. Kittel, J. Eilers, S. Doose, M. Sauer, A.-L. Sirén, M. Heckmann, and M.M. Paul. 2021. Active zone compaction correlates with presynaptic homeostatic potentiation. Cell Reports. 37:109770.

      Nakamura, Y., H. Harada, N. Kamasawa, K. Matsui, Jason S. Rothman, R. Shigemoto, R.A. Silver, David A. DiGregorio, and T. Takahashi. 2015. Nanoscale Distribution of Presynaptic Ca2+ Channels and Its Impact on Vesicular Release during Development. Neuron. 85:145-158.

      Newman, Z.L., D. Bakshinskaya, R. Schultz, S.J. Kenny, S. Moon, K. Aghi, C. Stanley, N. Marnani, R. Li, J. Bleier, K. Xu, and E.Y. Isacoff. 2022. Determinants of synapse diversity revealed by superresolution quantal transmission and active zone imaging. Nature Communications. 13:229.

      Rebola, N., M. Reva, T. Kirizs, M. Szoboszlay, A. Lőrincz, G. Moneron, Z. Nusser, and D.A. Digregorio. 2019. Distinct Nanoscale Calcium Channel and Synaptic Vesicle Topographies Contribute to the Diversity of Synaptic Function. Neuron. 104:693-710.e699.

      Sheng, J., L. He, H. Zheng, L. Xue, F. Luo, W. Shin, T. Sun, T. Kuner, D.T. Yue, and L.-G. Wu. 2012. Calcium-channel number critically influences synaptic strength and plasticity at the active zone. Nature neuroscience. 15:998-1006.

      Wang, T., R.T. Jones, J.M. Whippen, and G.W. Davis. 2016. alpha2delta-3 Is Required for Rapid Transsynaptic Homeostatic Signaling. Cell Rep. 16:2875-2888.

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) A central question regarding VGCC differences at Is vs Ib active zones is why is calcium influx higher at Is active zones compared to Ib. Ideally, the authors would have started this study by showing correlations between Cac abundance, presynaptic calcium influx, and Pr at Is vs Ib active zones. If they had, they would likely find that Cac abundance scales with calcium influx and Pr within Is vs Ib, but that calcium influx is over two-fold enhanced at Is over Ib when normalized to the same Cac abundance. This is more than sufficient to explain the Pr differences, so the rest of the study should have focused on revealing why influx is different at Is over Ib despite an apparently similar level of Cac abundance. Then the examination of CaBeta, Stj, etc could have been used to help explain this conundrum. 

      A lesson might be gleaned in how to structure this narrative from the Rebola 2019 study, which the authors cite and discuss at length. Similar to the current study, that paper started with two synapses ("strong" vs "weak") and sought to explain why they were so different in synaptic strength. First, they examined presynaptic calcium influx, and surprisingly found that the strong synapse had reduced calcium influx compared to the weak. Then the rest of the paper sought to explain why synaptic strength (Pr) was higher at the strong synapse despite reduced calcium influx. The authors do not use this logical flow and narrative in the present study, despite the focus being on how Cav2 channels contribute to strong vs weak synapses - and the primary function of Cav2 channels is to pass calcium at active zones to drive vesicle fusion. 

      Although the authors did not show that presynaptic calcium influx is higher at Is vs Ib active zones in the current manuscript, other studies have previously established that calcium influx is two-fold higher at Is active zones vs Ib (as the authors cite). Rather than focusing so much on Pr at Is vs Ib active zones, which as the authors know can be influenced by myriad differences, it seems the more relevant parameter to study is simply to address presynaptic calcium influx at Is vs Ib, which is the primary function of Cac. Put more simply, if Cac levels are the same at Is vs Ib active zones, why is calcium influx at least two-fold higher at Is? 

      It would therefore seem crucial for the authors to determine presynaptic calcium influx levels (ideally at individual AZs) to really understand how Cac intensity levels correlate with calcium influx. The authors instead map Pr at individual AZs, but as the authors know there are many variables that influence whether a SV releases in addition to calcium influx. There are a number of options for this kind of imaging in Drosophila, including genetically encoded calcium indicators targeted to active zones. But since several studies have previously established that influx is higher at Is active zones over Ib, this may not be necessary. That being said, there is a lot of value in quantitatively analyzing Cac/Stj/CaBeta abundance, calcium influx, and Pr together at individual active zones.

      We appreciate the perspective that we could have focused on why Ca2+ influx is 2x greater at type Is active zones, which we agree is an important and interesting question. However, growing evidence indicates that Ca2+ influx alone, like Ca2+ channel abundance, does not reliably predict synaptic strength between inputs. So, here we focused instead on how other differences between synapses influence Pr and contribute to synaptic heterogeneity between and/or among synapses formed by strong and weak inputs. We have changed our title and framing to better reflect this focus. 

      As Reviewer 1 notes, Rebola et al. (2019) found that lower Pr granule synapses exhibit higher Ca2+ influx (and Ca2+ channel abundance). In another example, Aldahabi et al. (2022) demonstrated that even when Ca2+ influx is greater at high-Pr synapses, it does not necessarily explain differences in synaptic strength as raising Ca2+ entry at low-Pr synapses to high-Pr synapse levels was not sufficient to increase synaptic strength to high-Pr input levels. Similar findings have been reported at tonic and phasic synapses of the Crayfish NMJ (Msghina, 1999).

      Several lines of evidence argue that factors beyond Ca2+ influx also play important roles in establishing distinct release properties at the Drosophila NMJ. A recent study using using a botulinum transgene to isolate type Ib and Is synapses for electrophysiological analysis found that increasing external [Ca2+] from physiological levels (1.8 mM) to 3 mM or even 6 mM does not result in a 3-fold increase in EPSCs or quantal content at type Ib synapses despite the prediction that the increase would be even greater given the power dependence of release on between Ca2+ concentration (He et al., 2023). The authors further found that type Ib synapses are more sensitive than type Is synapses to the slow Ca2+ chelator EGTA, indicating looser Ca2+ channel-SV coupling. 

      Consistently, we find that although VGCC levels are similar at the two inputs, their density is greater at type Is active zones (Figs. 1 and 2). Our findings also reveal additional molecular differences that may contribute to the observed differences in neurotransmitter release properties between the two inputs, including lower levels of the active zone protein Brp (Fig 3) and the auxiliary subunit α2δ-3/Stj (Fig. 6) at high Pr type Is inputs. In contrast, levels of each of these proteins positively correlate with synaptic strength among active zones of a single input, whether low- or high-Pr (Figs. 1, 3, 6). Similarly, levels of each of these proteins increase during homeostatic potentiation of neurotransmitter release (Figs. 4 and 7). Thus, we propose that two broad mechanisms contribute to synaptic diversity in the nervous system: (1) spatial organization and relative molecular content establish distinct average basal release probabilities that differ between inputs and (2) among individual synapses of distinct inputs, coordinated modulation of Ca2+ channel and active zone protein abundance independently tunes Pr. These intersecting mechanisms provide a framework for understanding the extensive and dynamic synaptic diversity observed across nervous systems.

      (2) In addition to key points made above, it seems the authors should at least consider (if not experimentally test) what other differences might contribute to the higher calcium influx at Is over Ib:  

      - Distinct splice isoforms of Cac (and/or Stj/Cabeta): The recent RNAseq analysis of gene expression at Is vs Ib motor neurons from Troy Littleton's group may inform this consideration? 

      - Stj reduction at Is: Do channel studies in heterologous systems give any insight into VGCC channel function with and without a2d-3? Do Cav2 channels without a2d pass more calcium? This would then offer an obvious solution to the key conundrum underlying this study. 

      These are excellent questions that we are actively pursuing. While there is no evidence of differentially expressed splice isoforms of Stj or Ca-β in the recent RNA-seq data from Jetti et al., 2023, subtle changes in Cac isoform usage were observed that may contribute to differences in Ca2+ influx. In heterologous systems, α2δ expression generally increases Ca2+ channel membrane insertion and  Ca2+ currents. However, in vivo α2δ’s can also mediate extracellular interactions that may modulate channel function. We address these points in greater detail in the revised discussion.  

      (3) Assess Stj and CaBeta levels at AZs after PhTx: The successful generation of endogenously tagged Stj and CaBeta enables some relatively easy experiments that would be of interest, similar to what the authors present for Cac. Does Brp similarly control Stj and CaBeta at Is vs Ib compared to what they show for Cac? In addition, does homeostatic plasticity similarly change Stj and CaBeta at Is vs Ib compared to what the authors have shown for Cac? i.e., do they both similarly increase in intensity, by the same amount, as Cac? 

      We agree and have included an analysis of α2δ-3/Stj levels following PhTx exposure (Fig. 7A-C). We have also investigated the regulation of Stj during chronic presynaptic homeostatic potentiation (Fig. 7D-F). In both cases, StjV5-N levels significantly increase at type Ib and Is active zones, consistent with our finding that among AZs of either type Ib or Is inputs, Stj levels correlate with Cac abundance and, thus, Pr. Together with our and others’ findings, this suggests that coordinated increases Ca2+ channel, auxiliary subunit,  and active zone protein abundance positively tunes synaptic strength at diverse synaptic subtypes.

      Minor points: 

      (1) Including line numbers would make reviewing/commenting easier. 

      We apologize for this oversight and have added line numbers to the revised manuscript.

      (2) Fig. 2I: It is not apparent what the mean cluster density is between Ib vs Is (as it is in Fig. 2F-H graphs). The mean and error bars should be included in 2I as it is in 2G. Same with Fig. 3C. 

      Thank you for pointing this out. We have added error bars to the paired analysis in 2I as well as in 3C and 1C.

      (3) Fig. 4 - it might make more sense to normalize Brp and Cac intensity as a percentage of baseline (PhTx at Is or Ib) rather than normalizing everything to control Ib. 

      We have revised the graphs as suggested in Figure 4 and throughout.

      (4) Page 5 bottom - REFS missing after Fig. 1E. 

      Thank you for catching this. We have fixed it.

      Reviewer #2 (Recommendations For The Authors): 

      This reader found differentiating between low Pr sites (deep purple) and cac measurements (black) difficult in Fig 1B. You may consider depicting this differently. 

      Thank you for this feedback. We have changed the color scheme to improve readability.

      I found it difficult to discern the difference between experiments Fig 1E and Fig 1J. Why are individual dots distributed differently? 

      The individual data points are the same as in 1E and 1F, but we have removed the individual NMJ dimensionality to combine all Is and Ib data points together along with best fit lines for comparison of their slopes. We have added text to the revised manuscript to clarify this.

      Results section, second paragraph, add references, remove 'REF': We next investigated the correlation between Pr and VGCC levels and found that at type Is inputs, single-AZ Cac intensity positively correlates with Pr (Fig. 1E; REFS). 

      Thank you. We have corrected this error.

    1. Author response:

      Reviewer #1 (Public Review):

      Greter et al. provide an interesting and creative use of lactulose as a "microbial metabolism" inducer, combined with tracking of H2 and other fermentation end products. The topic is timely and will likely be of broad interest to researchers studying nutrition, circadian rhythm, and gut microbiota. However, a couple of moderate to major concerns were noted that may impact the interpretation of the current data:

      (1)  Much of the data relies on housing gnotobiotic mice in metabolic cages, but I couldn't find any details of methods to assess contamination during multiple days of housing outside of gnotobiotic isolators/cages. Given the complexity of the metabolic cage system used, sterility would likely be incredibly challenging to achieve. More details needed to be included about how potential contamination of the mice was assessed, ideally with 16S rRNA gene sequencing data of the endpoint samples and/or qPCR for total colonization levels relative to the more targeted data shown.

      We thank the reviewer for pointing out that we have not made the experimental setup clear in the text. One of the unique features of our metabolic cage setup is that the mice do not need to be housed outside gnotobiotic isolators, but that the whole system is placed inside an isolator. We have developed and published this system recently (Hoces et al, PLOS Biol 2022), including extensive testing for sterility/gnotobiosis. We will improve clarity in a revised version.

      Given that 16S sequencing of germ-free mice will typically produce false positive reads, we used Blautia pseudococcoides as an indicator strain for contaminations. This strain is present in our SPF mouse colony, forms spores that are highly resilient to decontamination measures, and has been the most likely contaminant in our gnotobiotic system. We have checked for presence of this strain in the cecum content of all our animals at the end of each experiment, and only included experiments which had a B. pseudococcoides signal below threshold level.

      (2)  The language could be softened to provide a more nuanced discussion of the results. While lactulose does seem to induce microbial metabolism it also could have direct effects on the host due to its osmotic activity or other off-target effects. Thus, it seems more precise to just refer to lactulose specifically in the figure titles and relevant text. Additionally, the degree to which lactulose "disrupts the diurnal rhythm" isn't clear from the data shown, especially given that the markers of circadian rhythm rapidly recover from the perturbation. It is probably more precise to instead state that lactulose transiently induces fermentation during the light phase or something to that effect. The discussion could also be expanded to address what methods are available or could be developed to build upon the concepts here; for example, the use of genetic inducers of metabolism which may avoid the more complex responses to lactulose.

      The point about language is well taken. We tried to make the argument that what we call disruption of the diurnal rhythm is acute, meaning that it is not disrupting the rhythm "chronically" (i.e., for longer), but that it recovers rapidly from this transient disruption. Given the confusion this wording is causing we are rephrasing this in a new version of the manuscript.

      We also appreciate the mention of concepts from our study that can be built on in future studies, and we will add a paragraph on potential further research.

      Despite these concerns, this was still an intriguing and valuable addition to the growing literature on the interface of the microbiome and circadian fields.

      We thank the reviewer for all their encouraging and constructive remarks!

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to investigate how microbial metabolites, such as hydrogen and short-chain fatty acids (SCFAs), influence feeding behavior and circadian gene expression in mice.

      Specifically, they sought to understand these effects in different microbial environments, including a reduced community model (EAM), germ-free mice, and SPF mice. The study was designed to explore the broader relationship between the gut microbiome and host circadian rhythms, an area that is not well understood. Through their experiments, the authors hoped to elucidate how microbial metabolism could impact circadian clock genes and feeding patterns, potentially revealing new mechanisms of gut microbiome-host interactions.

      Strengths:

      The manuscript presents a well-executed investigation into the complex relationship between microbial metabolites and circadian rhythms, with a particular focus on feeding behavior and gene expression in different mouse models. One of the major strengths of the work lies in its innovative use of a reduced community model (EAM) to isolate and examine the effects of specific microbial metabolites, which provides valuable insights into how these metabolites might influence host behavior and circadian regulation. The study also contributes to the broader understanding of the gut microbiome's role in circadian biology, an area that remains poorly understood. The experiments are thoughtfully designed, with a clear rationale that ties together the gut microbiome, metabolic products, and host physiological responses. The authors successfully highlight an intriguing paradox: the significant influence of microbial metabolites in the EAM model versus the lack of effect in germ-free and SPF mice, which adds depth to the ongoing exploration of microbial-host interactions. Despite some methodological concerns, the manuscript offers compelling data and opens up new avenues for research in the field of microbiome and circadian biology.

      We thank the reviewer for their encouraging remarks, specifically on the surprising findings that microbial metabolism seems to affect circadian clock gene expression and behavior differently in EAM and SPF mice.

      Weaknesses:

      The manuscript, while providing valuable insights, has several methodological weaknesses that impact the overall strength of the findings. First, the process for stool collection lacks clarity, raising concerns about potential biases, such as the risk of coprophagia, which could affect the dry-to-wet weight ratio analysis and compromise the validity of these measurements.

      We thank the reviewer for pointing out that our description of the specific methods used for collecting feces were presented in a somewhat confusing manner. In short, dry and wet fecal weights were determined based on fecal pellets that were freshly produced and directly collected from restrained mice. To determine total fecal output over time, we collected all fecal pellets produced in a 5 hour window in a cage, determined their dry weight, and then used the water content determined for fresh feces to calculate wet weight. Using this method, we cannot account for potential differences in coprophagia between the groups. However, this is not likely to affect the dry-to-wet ratio of fecal output in our results.

      Additionally, the use of the term "circadian" in some contexts appears inaccurate, as "diurnal" might be more appropriate, especially given the uncertainty regarding whether the observed microbiome fluctuations are truly circadian.

      Similarly to our answer to reviewer 1 above, we appreciate this remark about imprecise language and have addressed this issue in the text. Indeed, we do not think the microbiota fluctuations are truly circadian, but likely a result of the entrainment through the host's food intake.

      Another significant issue is the unexpected absence of an osmotic effect of lactulose in EAM mice, which contradicts the known properties of lactulose as an osmotic laxative. This finding requires further verification, including the use of a positive control, to ensure it is not artifactual.

      This is a good point. We have used this lactulose dosage specifically to induce microbial metabolism without causing osmotic diarrhea, and went to some lengths do demonstrate this. In response to this comment (and one by reviewer 3 below about transit time), we are planning an experiment that will use a higher lactulose dose as a positive control.

      The presentation of qRT-PCR data as log2-fold changes, with a mean denominator, could introduce bias by artificially reducing variability, potentially leading to spurious findings or increased risk of Type I error. This approach may explain the unexpected activation of both the positive and negative limbs of the circadian clock.

      While we agree that our description of the qpcr method used for measuring circadian clock gene expression was lacking detail, we do not see how log2-fold changes (as opposed to, e.g., fold change) would lead to an increased risk of Type 1 error. We did not use a mean denominator for analyzing the data but used the house-keeping data for the same sample as denominator for the respective circadian clock genes. This will be described more clearly in a revised methods section.

      Moreover, the lack of detailed information on the primers and housekeeping genes used in the experiments is concerning, particularly given the importance of using non-circadian housekeeping genes for accurate normalization.

      We apologize for this omission, it seems like the resource table got lost in the submission, leading to missing information. It will be included in the revised manuscript.

      The methods for measuring metabolic hormones, such as GLP-1 and GIP, are also not adequately described. If DPP-IV/protease inhibitor tubes were not used, the data could be unreliable due to the rapid degradation of these hormones by circulating proteases.

      We thank the reviewer for spotting this mistake. We will add details of how GLP-1 and GIP were measured to the methods section. While we did not use DPP-IV/protease inhibitor tubes, we added the inhibitors to the syringes when sampling blood, leading to the same effect.

      Finally, the manuscript does not address the collection of hormone levels during both fasting and fed phases, a critical aspect for interpreting the metabolic impact of microbial metabolites.

      We agree that it will be interesting to measure hormone levels also in the fed phase, and we will include this data in a revised version of the manuscript. Even with that data, a more thorough examination of hormone levels over the diurnal cycle, as suggested by reviewer 3, might be relevant for a full-scale follow-up. Given our data, we of course cannot exclude that there may be time-point-specific differences and therefore have softened the language around this conclusion to state that hormone levels are not acutely changed after a lactulose intervention “at the time-points examined”.

      These methodological concerns collectively weaken the robustness of the study's results and warrant careful reconsideration and clarification by the authors.

      Because of these weaknesses, the authors have partially achieved their aims by providing novel insights into the relationship between microbial metabolites and host circadian rhythms. The data do suggest that microbial metabolites can significantly influence feeding behavior and circadian gene expression in specific contexts. However, the unexpected absence of an osmotic effect of lactulose, the potential biases introduced by the log2-fold change normalization in qRT- PCR data, and the lack of clarity in critical methodological details weaken the overall conclusions. While the study provides valuable contributions to understanding the gut microbiome's role in circadian biology, the methodological weaknesses prevent a full endorsement of the authors' conclusions. Addressing these issues would be necessary to strengthen the support for their findings and fully achieve the study's aims.

      We thank the reviewer again for their careful and critical reading of our work, and for their constructive input. We hope that many of the concerns will be addressed by providing more methodological detail and additional experimental data in the revised version of our manuscript.

      Despite the methodological concerns raised, this work has the potential to make a significant impact on the field of circadian biology and microbiome research. The study's exploration of the interaction between microbial metabolites and host circadian rhythms in different microbial environments opens new avenues for understanding the complex interplay between the gut microbiome and host physiology. This research contributes to the growing body of evidence that microbial metabolites play a crucial role in regulating host behaviors and physiological processes, including feeding and circadian gene expression.

      We thank the reviewer for their encouraging remarks!

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript by Greter, et al., entitled "Acute targeted induction of gut-microbial metabolism affects host clock genes and nocturnal feeding" the authors are attempting to demonstrate that an acute exposure to a non-nutritive disaccharide (lactulose) promotes microbial metabolism that feeds back onto the host to impact circadian networks. The premise of the study is interesting and the authors have performed several thoughtful experiments to dissect these relationships, providing valuable insights for the field. However, the work presented does not necessarily support some of the conclusions that are drawn. For instance, lactulose is administered during the fasting period to mimic the impact of a feeding bout on the gut microbiota, but it would be important to perform this treatment during the fed state as well to show that the effects on food intake, etc. do not occur.

      This is a good point, and we will include an experiment addressing this in a revised version of the manuscript.

      To truly draw the conclusion that the current outcomes are directly connected to and mediated via an impact on the host circadian clock, it would be ideal to perform these studies in a circadian gene knock-out animal (i.e., Cry1 or Cry2 KO mice, or perhaps Bmal-VilCre tissue- specific KO mice). If the effects are lost in these animals, this would more concretely connect the current findings to the circadian clock gene network.

      We agree that these would be interesting experiments to follow up on the question how the observed effects are actuated by host functions. However, they would require a large amount of preparatory work (including rederiving the KO mice to get them germ-free in our gnotobiotic facility), we argue that they are beyond the scope of this study.

      Despite these reservations, the work is promising.

      We thank the reviewer for their encouraging assessment.

      Strengths:

      Attempting to disentangle nutrient acquisition from microbial fermentation and its impact on diurnal dynamics of gut microbes on host circadian rhythms is an important step for providing insights into these host-microbe interactions.

      The authors utilize a novel approach in leveraging lactulose coupled with germ-free animals and metabolic cages fitted with detectors that can measure microbial byproducts of fermentation, particularly hydrogen, in real-time.

      The authors consider several interesting aspects of lactulose delivery, including how it shifts osmotic balance as well as provides calculations that attempt to explain the caloric contribution of fermentation to the animal in the context of reduced food intake. This provides interesting fundamental insights into the role of microbial outputs on host metabolism.

      Thank you!

      Weaknesses:

      While the authors have done a large amount of work to examine the osmotic vs. metabolic influence of lactulose delivery, the authors have not accounted for the enlarged cecum and increased cecal surface area in germ-free mice. The authors could consider an additional control of cecectomy in germ-free mice.

      We thank the reviewer for pointing out the potential effect of the anatomical differences of germ- free and conventionally colonized mice. We agree that when comparing germ-free mice to SPF mice, the enlarged cecum area in germ-free animals could lead to differences in water release or uptake. However, this is not the case in the gnotobiotic mice colonized with our minimal microbiota, which have comparable cecum sizes to germ-free mice, and thus comparing water transport over the cecum wall between those groups can be done without correcting for cecal surface areas. We will add information on cecum sizes in the different experimental groups to a revised version of the manuscript.

      The authors have examined GI hormones as one possible mechanism for how food intake is altered by microbial fermentation of lactulose. However, the authors measure PYY and GLP-1 only at a single time point, stating that there are no differences between groups. Given the goal of the studies is to tie these findings back into circadian rhythms, it would be important to show if the diurnal patterns of these GI hormones are altered.

      We fully agree that a deeper investigation of the diurnal fluctuations of hormone levels would be an interesting next step in studying whether perturbations in food intake can disturb these rhythms. Doing this for the whole rhythm would really require a full second study. For a revised version of this manuscript, we will add a second time-point of hormone measurements (during the fed phase) to this study. In addition, we will soften the statements made around these data to point out just that hormone level fluctuations could not be detected during specific time points after lactulose treatment, and therefore do not seem to explain the imminent behavioral changes.

      Considerations of other factors, such as conjugated vs. deconjugated bile acids, microbial bile salt hydrolase activity, and bile acid resorption, might be an important consideration for how lactulose elicits more influence on ileal circadian clock genes relative to cecum and colon.

      We absolutely agree that investigation of microbial bile acid modification and their metabolism by the host would be an interesting topic for a follow-up study.

      Measurements of GI transit time (both whole gut and regional) would be an important for consideration for how lactulose might be impacting the ileum vs. cecum vs. colon.

      This is also an interesting point, and we will add an assessment of transit time to a revised version of the manuscript.

    1. Author response:

      General comment:

      "This important study examined neuronal activity in the dentate nucleus of the cerebellum when monkeys performed a difficult perceptual decision-making task. The authors provide convincing evidence that the cerebellum represents sensory, motor, and behavioral outcome signals that are sent to the attentional system, but further analysis focusing on the disparity of performance between animals would improve the quality of the paper. This paper is of great general interest in that it shows the involvement of the cerebellum in cognitive processes at the neuronal level."

      We thank you for these general comments, and we agree with all of them. 

      Public Reviews (Reviewer #1):

      Summary:

      Recordings were made from the dentate nucleus of two monkeys during a decision-making task. Correlates of stimulus position and stimulus information were found to varying degrees in the neuronal activities. 

      We agree with this summary.

      Strengths:

      A difficult decision-making task was examined in two monkeys.

      We agree with this statement.

      Weaknesses:

      One of the monkeys did not fully learn the task. The manuscript lacked a coherent hypothesis to be tested, and no attempt was made to consider the possibility that this part of the brain may have little to do with the task that was being studied. 

      We understand these comments. It is correct that one of the monkeys did not fully learn the task, but it should be noted that both monkeys learned significantly above chance level, and we therefore find the recordings of both monkeys useful. We tested the hypothesis that neurons of the nucleus dentate can dynamically modulate their activity during a visual attention task, comprising not only sensorimotor but also cognitive attentional components. We agree that this hypothesis should be spelled out more explicitly in the introduction, which we will do in the revised version. We also appreciate the comment of this Reviewer that in our original submission we did not show our attempt to consider the possibility that this part of the brain may have little to do with the task that was being studied. We in fact did consider this possibility in that we applied muscimol to the dentate nucleus in one of the monkeys. The data of this one successful experiment show that the behaviour was reversibly affected in line with our hypothesis. Given that this only concerned one of the monkeys, we preferred not to present these data in the article. However, as the Reviewer correctly points out that this question remains hanging in the air, we will show them in our formal rebuttal letter. Please note that we decided to focus at the end of our research project on the tracing experiments, showing in both monkeys the connections of the dentate nucleus with the regions that are involved in attention. As a result, both monkeys have been sacrificed and we cannot expand upon our muscimol experiments anymore (which would have been useful indeed).

      Last but not least, given the comments of the Reviewers, we will also add a Supplementary figure to Figure 2, in which we will present the data for both monkeys separately and provide our interpretation. This may help to strengthen our conclusions. 

      Public Reviews (Reviewer #2):

      The authors trained monkeys to discriminate peripheral visual cues and associate them with planning future saccades of an indicated direction. At the same time, the authors recorded single-unit neural activity in the cerebellar dentate nucleus. They demonstrated that substantial fractions of DN cells exhibited sustained modulation of spike rates spanning task epochs and carrying information about stimulus, response, and trial outcome. Finally, tracer injections demonstrated this region of the DN projects to a large number of targets including several known to interconnect the visual attention network. The data compellingly demonstrate the authors' central claims, and the analyses are well-suited to support the conclusions. Importantly, the study demonstrates that DN cells convey many motor and nonmotor variables related to task execution, event sequencing, visual attention, and arguably decision-making/working memory.

      We thank the Reviewer for this positive and constructive feedback.

    1. Author response:

      We would like to thank the reviewers for their time and for their kind comments about our work. We expect that their comments will help us to improve the manuscript and so will plan the following experiments/revisions to address some of their comments:

      Reviewer 1 (Public Review):

      (1) The cutoffs the authors used to define "conditionally essential" mutants are not reported. The results also lack validation for lethality using a titratable system. It would be ideal to validate several genes in each dataset to determine cutoffs (i.e. 5-fold decrease in insertion mutants) for conditional lethality. It was not done (or described) here.

      We will report the cutoffs used when we generate the revised manuscript. Our experiments identified hundreds of lethal combinations and we have six datasets, validation of several genes from each would require generation of at least 20 depletion strains and subsequent testing of each. Validation using a depletion system would therefore be a significant undertaking and is typically not the standard when using these approaches. However, should time permit then we will attempt a subset of these experiments.

      (2) Also, two mutations that both make the cells sick could provide an additive effect (i.e. dapF and BamB), which doesn't necessarily mean the pathways are linked. The authors should revise their wording. They have not shown genetic linkage in some cases.

      We will revise the text to address this.

      (3) Mutations throughout the manuscript are not complemented. It would be ideal to add complementation data to show the gene-phenotype relationship is specific.

      We thank the reviewers for highlighting this and will complete the complementation experiments.

      (4) Also, I would argue the term "conditionally essential genes" should be replaced with "synthetically lethal". Strains were compared in the same conditions but with different genetic backgrounds.

      We take the reviewers point and will revise the text accordingly.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) An important control in any genetic interaction study is to do complementation tests to demonstrate that the phenotype observed is indeed due to the missing gene under analysis. Although the Keio library was designed to avoid polar effects, it is impossible to predict other undesirable effects of the deletions (hitting of a non-annotated sRNA or RNA stability effects, for example). Thus, before one can safely conclude that a proposed genetic interaction is real, complementation tests should be carried out. This seems particularly important in the case of a new and surprising interaction, such as that between bamB and DNA replication and repair genes.

      We thank the reviewers for highlighting this and will complete the complementation experiments.

      (2) Why not include the suppressor interactions in the work? There are probably plenty, and in principle, they should be as informative as the conditional essential (or synthetic lethal) ones. The only one highlighted in the paper is that between bamB and diaA, since it nicely fits with the synthetic lethal effects with initiation inhibitors seqA and hda. Even if the authors cannot make sense of the suppressor interactions, their inclusion in the paper should make the dataset richer and more valuable to the community.

      These data are available in supplementary table 1. However, we appreciate this is not obvious and so will make a new supplementary table and include a brief description of the data for the revised paper.

      (3) The enrichment analysis in Figure 2B deserves some clarification. What is the meaning of gene ratio? How can single genes of a pathway yield an enrichment signal? Why weren´t seqA and hda included in the DNA replication class in 2B?

      We apologise for the confusion caused and will include a description of the analysis in the methods section.

      (4) The writing puts too much emphasis on demonstrating that bam lipoproteins and chaperones are specialized instead of fully redundant. However, I have the impression this is a long-settled conclusion in the field, as the manuscript itself describes at several points when reviewing the literature.

      We will revise the text to reduce this emphasis.

      Reviewer #3 (Public Review):

      In this work, Bryant, et al. investigate genetic interactions between non-essential members of the outer membrane protein biogenesis pathway and other genes in the genome using a transposon-directed insertion sequencing (TraDIS) approach in E. coli K-12. The authors identify interactions with other components of the envelope including LPS, peptidoglycan, and enterobacterial common antigen biogenesis, and they tie these interactions to specific members of the outer membrane biogenesis pathway. Although many of these interactions are known and have been previously investigated in the field, the study provides several synthetic phenotypes that could be useful for further investigations.

      The strengths of the paper include their unbiased, TraDIS approach, and follow up on the interactions they observe. The interactions with genes of unknown function also are of interest as they may suggest experiments to find the functions of these genes. The largest weakness of this paper is the use of a gene deletion allele for bamB that is known to be polar leading to decreased expression of an essential gene. This largely invalidates all results related to DNA replication. In addition, it is a weakness that the paper does not adequately address its place in the field through discussion of existing results on the interactions they investigate.

      We appreciate the reviewers’ comments and concerns about the bamB allele, and we will address these concerns by completing complementation experiments for the CRISPRi depletion experiments and the run-out assays. However, despite the statement that it is known to be polar, several previous studies have also used the bamB Keio library strain. Many of these studies transfer the allele to a clean background and use the derivative in which the cassette has been removed as we have done here (Cox et al., 2017, Gunasinghe et al., 2018, Psonis et al., 2019, Storek et al., 2019, Ranava et al. 2021, Steenhuis et al., 2021, Thewasano et al., 2023). Therefore, we feel somewhat justified in our choice of strain.

      We are unable to find a reference for the Keio bamB strain causing polar effects and would have appreciated the reviewers’ guidance here. However, we believe the concern about polar effects stems from the observations of Ruiz et al., (2005), in which it was observed that a yfgL::ISE1 allele causes polar effects. This was hypothesised to be due to the ORF contained within the IS being transcribed in the opposite orientation to yfgL and the downstream der gene. They subsequently observed that a strain carrying a Tn5KAN-I-SceI insertion in yfgL (yfgL::kan) did not cause polar effects and this was hypothesised to be due to the kan cassette being co-oriented with yfgL. In addition, Charlson et al., 2006 generated a yfgL deletion by replacing the majority of the gene with a kan cassette in a manner similar to that of the Keio library that was subsequently flipped out. This study also found no evidence of polar effects on der. In theory, the strain used here, and in previous studies by other groups, should provide minimal disruption to transcription through generation of a mini-gene from the original bamB sequence to maintain operon expression. This is in contrast to the disruption caused by the yfgL::ISE1 allele.

      While we do appreciate the concern, several pieces of evidence lend themselves to counter the statement that our strain choice largely invalidates the results. The der GTPase is essential, hence the concern about polar effects leading to the bamB phenotypes we see. However, depletion of der leads to cold sensitivity, whereas we find that the bamB strain used here actually performs better in colder temperatures. In addition, the der depletion is sensitive to doxycycline, whereas the bamB mutant has increased fitness in this condition (Fig 1) (Bharat and Brown, 2015, Hwang and Inouye, 2008). Hence, should the mutation lead to decreased expression of der then we would expect the bamB strain to phenocopy the der depletion, which it does not. Regardless of this information, we will still address these concerns by completing complementation experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weakness 1. Enhancing Reproducibility and Robustness: To enhance the reproducibility and robustness of the findings, it would be valuable for the authors to provide specific numbers of animals used in each experiment. Explicitly stating the penetrance of the rod-like neurocranial shape in dact1/2-/- animals would provide a clearer understanding of the consistency of this phenotype. 

      In Fig. 3 and Fig. 4 animal numbers were added to the figure and figure legend (line 1111). In Fig. 5 animal numbers were added to the figure. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). 

      Weakness 2. Strengthening Single-Cell Data Interpretation: To further validate the single-cell data and strengthen the interpretation of the gene expression patterns, I recommend the following: 

      -Provide a more thorough explanation of the rationale for comparing dact1/2 double mutants with gpc4 mutants.

      -Employ genotyping techniques after embryo collection to ensure the accuracy of animal selection based on phenotype and address the potential for contamination of wild-type "delayed" animals.

      -Supplement the single-cell data with secondary validation using RNA in situ or immunohistochemistry techniques. 

      An explanation of our rationale was added to the results section (Lines 391403) and a summary schematic was added to Figure 6 (panel A).

      Genotyping of the embryos was not possible but quality control analysis by considering the top 2000 most variable genes across the dataset showed good clustering by genotype, indicating the reproducibility of individuals in each group (See Supplemental Fig. 4).

      The gene expression profiles obtained in our single-cell data analysis for gpc4, dact1, and dact2 correlate closely with our in situ hybridization analyses. Further, our data is consistent with published zebrafish single-cell data. We validated our finding of increased capn8 expression in dact1/2 mutants by in situ hybridization. Therefore we are confident in the robustness of our single-cell data.  

      Weakness 3. Directly Investigating Non-Cell-Autonomous Effects: To directly assess the proposed non-cell-autonomous role of dact1/2, I suggest conducting transplantation experiments to examine the ability of ectodermal/neural crest cells from dact1/2 double mutants to form wild-type-like neurocranium.  

      The reviewer’s suggestion is an excellent experiment and something to consider for future work. Cell transplant experiments between animals of specific genotypes are challenging and require large numbers. It is not possible to determine the genotype of the donor and recipient embryos at the early timepoint of 1,000 cell stage where the transplants would have to be done in the zebrafish. So that each transplant will have to be carried out blind to genotype from a dact1+/-; dact2+/- or dact1-/-; dact2+/- intercross and then both animals have to be genotyped at a subsequent time point, and the phenotype of the transplant recipient be analyzed. While possible, this is a monumental undertaking and beyond the scope of the current study.

      Weakness 4. Further Elucidating Calpain 8's Role: To strengthen the evidence supporting the critical role of Calpain 8, I recommend conducting overexpression experiments using a sensitized background to enhance the statistical significance of the findings. 

      We thank the reviewer for their suggestion and have now performed capn8 overexpression experiments in embryos generated from dact1/2 double heterozygous breeding. We found a statistically significant effect of capn8 overexpression in the dact1+/-,dact2+/- fish (Lines 462-464 and Fig. 8C,D). 

      Minor Comments:  

      Comment: Creating the manuscript without numbered pages, lines, or figures makes orientation and referencing harder.  

      Revised

      Comment: Authors are inconsistent in the use of font and adverbs, which requires extra effort from the reader. ("wntIIf2 vs wnt11f2 vs wnt11f2l"; "dact1/2-/- vs dact1/dact2 -/-"; "whole-mount vs wholemount vs whole mount").  

      Revised throughout.

      Comment: Multiple sentences in the "Results" belong to the "Materials and Methods" or the "Discussion" section. 

      We have worked to ensure that sentences are within the appropriate sections of the manuscript.

      Comment: Abstract:

      "wnt11f2l" should be "wnt11f2"  

      Revised (Line 24).

      Comment: Main text:

      Page 5 - citation Waxman, Hocking et al. 2004 is used 3x without interruption any other citation. 

      Revised (Line 112).

      Page 9 - "dsh" mutant is mentioned once in the whole manuscript - is this a mistake?

      Revised, Rewritten (Line 196).

      Page 10 - Fig 2B does not show ISH.

      Revised (Line 229).

      Page 11 - "kyn" mutant is mentioned here for the first time but defined on page 15.

      Revised (Line 245). Now first described on page 4.

      Page 14 - "cranial CNN" should be CNCC.

      Revised. (Line 334)

      Page 16 - dact1/dact2/gpc4: Fig. 5C is used but it should be Fig 5E.

      Revised. (Line 381)

      Page 18 - dact1/2-/- or dact1-/-, dact2-/-. 

      Revised. (Line 428)

      Comment: Methods:

      Page 24 - ZIRC () "dot" is missing. ChopChop ")" is missing. "located near the 5' end of the gene" - In the Supplementary Figure 1 looks like in the middle of the gene.

      Revised. (Lines 600, 609, 611, respectively).

      Page 25 - WISH -not used in the main text.

      Revised. (Line 346).

      Page 26 - 4% (v/v) formaldehyde; at 4C - 4{degree sign}C; 50% (v/v) ethanol; 3% (w/v) methylcellulose.

      Revised. (Lines 659, 660, 662).

      Page 27 - 0.1% (w/v) BSA. 

      Revised. (Line 668).

      Comment: Discussion:

      The overall discussion requires more references and additional hypotheses. On page 20, when mentioning 'as single mutants develop normally,' does this refer to the entire animals or solely the craniofacial domain? Are these mutants viable? If they are, it's crucial to discuss this phenomenon in relation to prior morpholino studies and genetic compensation.

      Observing how the authors interpret previously documented changes in nodal and shh signaling would be beneficial. While Smad1 is discussed, what about other downstream genes? Is shh signaling altered in the dact1/2 double mutants? 

      We have revised the Discussion to include more references (Lines 473, 476, 483, 488, 491, 499, 501, 502, 510, 515, 529, 557, 558) and additional hypotheses (Lines 503-505, 511-519, 522-525). We have added more specific information regarding the single mutants (Lines 270-275, 480-493, Fig. S3). We have added discussion of other downstream genes, including smad1 (Lines 561-572) and shh (Lines 572-580).

      Comment: Figures:

      Appreciating differences between specimens when eyes were or were not removed is quite hard.

      Yes this was an unfortunate oversight, however, the key phenotype is the EP shown in the dissections.

      Fig 1. - wntIIf2 vs wnt11f2? C - Thisse 2001 - correct is Thisse et al. 2001.

      Revised typo in Fig 1. (And Line 1083).

      Fig 1E: These plots are hard to understand without previous and detailed knowledge. Authors should include at least some demarcations for the cephalic mesoderm, neural ectoderm, mesenchyme, and muscle. Missing color code.

      We have moved this data to supplementary figure S1 and have added labels of the relevant cell types and have added the color code.

      Comment:- Fig 2 - In the legend for C - "wildtype and dact2-/- mutant" and "dact1/2 mutant"; in the picture is dact1-/-, dact2-/-.

      Revised (Line 1105).

      Fig 2 - B - it is a mistake in 6th condition dact1: 2x +/+, heterozygote (+/-) is missing.

      Revised Figure 2B.

      Fig 4. - Typo in the legend: dact1/"t"2-/- .

      Revised. (Line 1127).

      Fig 8C - In my view, when the condition gfp mRNA says "0/197, " none of the animals show this phenotype. I assume the authors wanted to say that all the animals show this phenotype; therefore, "197/197" should be used.

      We have removed this data from the figure as there were concerns by the reviewers regarding reproducibility. 

      Fig S1 - Missing legend for the 28 + 250, 380 + 387 peaks? RT-qPCR - is not mentioned in the Materials and Methods. In D - ratio of 25% (legend), but 35% (graph).

      Revised.(Line 1203, Line 625, Line 1213, respectively).

      Fig S2 - The word "identified" - 2x in one sentence. 

      Revised. (Line 1230).

      Reviewer #2 (Public Review):

      Weakness(1) While the qualitative data show altered morphologies in each mutant, quantifications of these phenotypes are lacking in several instances, making it difficult to gauge reproducibility and penetrance, as well as to assess the novel ANC forms described in certain mutants.  

      In Fig. 3 and Fig. 4 animal numbers were added to the figure legend. In Fig. 5 animal numbers were added to the figure to demonstrate reproducibility. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). As the altered morphologies that we report are qualitatively significant from wildtype we did not find it necessary to make quantitative measurements. For experiments in which it was necessary to in-cross triple heterozygotes (Fig 3, Fig. 5), we dissected and visually analyzed the ANC of at least 3 compound mutant individuals. At least one individual was dissected for the previously published or described genotypes/phenotypes (i.e. wt, wntllf2-/-, dact1/2-/-, gpc4-/-, wls/-). We realize quantitative measurements may identify subtle differences between genotypes. However, the sheer number of embryos needed to generate these relatively rare combinatorial genotypes and the amount of genotyping required prevented quantitative analyses. 

      Weakness 2) Germline mutations limit the authors' ability to study a gene's spatiotemporal functional requirement. They therefore cannot concretely attribute nor separate early-stage phenotypes (during gastrulation) to/from late-stage phenotypes (ANC morphological changes). 

      We agree that we cannot concretely attribute nor separate early and latestage phenotypes. Conditional mutants to provide temporal or cell-specific analysis are beyond the scope of this work. Here we speculate based on evidence obtained by comparing and contrasting embryos with grossly similar early phenotypes and divergent late-stage phenotypes. We believe our findings contribute to the existing body of literature on zebrafish mutants with both early convergent extension defects and craniofacial abnormalities.   

      Weakness (3) Given that dact1/2 can regulate both canonical and non-canonical wnt signaling, this study did not specifically test which of these pathways is altered in the dact1/2 mutants, and it is currently unclear whether disrupted canonical wnt signaling contributes to the craniofacial phenotypes, even though these phenotypes are typical non-canonical wnt phenotypes. 

      Previous literature has attributed canonical wnt, non-canonical wnt, and nonwnt functions to dact, and each of these likely contributes to the dact mutant phenotype (Lines 87-89). We performed cursory analyses of tcf/lef:gfp expression in the dact mutants and did not find evidence to support further analysis of canonical wnt signaling in these fish. Single-cell RNAseq did not identify differential expression of any canonical or non-canonical wnt genes in the dact1/2 mutants.

      Further research is needed to parse out the intracellular roles of dact1 and dact2 in response to wnt and tgf-beta signaling. Here we find that dact may also have a role in calcium signaling, and further experiments are needed to elaborate this role.      

      Weakness (4) The use of single-cell RNA sequencing unveiled genes and processes that are uniquely altered in the dact1/2 mutants, but not in the gpc4 mutants during gastrulation. However, how these changes lead to the manifested ANC phenotype later during craniofacial development remains unclear. The authors showed that calpain 8 is significantly upregulated in the mutant, but the fact that only 1 out of 142 calpainoverexpressing animals phenocopied dact1/2 mutants indicates the complexity of the system. 

      To further test whether capn8 overexpression may contribute to the ANC phenotype we performed overexpression experiments in the resultant embryos of dact1/dact2 double het incross. We found the addition of capn8 caused a small but statistically significant occurrence of the mutant phenotype in dact1/2 double heterozygotes (Fig.8D). We agree with the reviewer that our results indicate a complex system of dysregulation that leads to the mutant phenotype. We hypothesize that a combination of gene dysregulation may be required to recapitulate the mutant ANC phenotype. Further, as capn8 activity is regulated by calcium levels, overexpression of the mRNA alone likely has a small effect on the manifestation of the phenotype. 

      Weakness (5) Craniofacial phenotypes observed in this study are attributed to convergent extension defects but convergent extension cell movement itself was not directly examined, leaving open if changes in other cellular processes, such as cell differentiation, proliferation, or oriented division, could cause distinct phenotypes between different mutants. 

      Although convergent extension cell movements were not directly examined, our phenotypic analyses of the dact1/2 mutant are consistent with previous literature where axis extension anomalies were attributed to defects in convergent extension (Waxman 2004, Xing 2018, Topczewski 2001). We do not attribute the axis defect to differentiation differences as in situ analyses of established cell type markers show the existence of these cells, only displaced relative to wildtype (Figure 1). We agree that we cannot rule out a role for differences in apoptosis or proliferation however, we did not detect transcriptional differences in dact1/2 mutants that would indicate this in the single-cell RNAseq dataset. Defects in directed division are possible, but alone would not explain that dact1/2 mutant phenotype, particularly the widened dorsal axis (Figure 1).

      Major comments:  

      Comment (1) The author examined and showed convergent extension phenotype (CE) during body axis elongation in dact1/dact2-/- homozygous mutants. Given that dact2-/- single mutants also displayed shortened axis, the authors should either explain why they didn't analyze CE in dact2-/- (perhaps because that has been looked at in previously published dact2 morphants?) or additionally show whether CE phenotypes are present in dact1 and dact2 single mutants.  

      The authors should quantify the CE phenotype in both dact2-/- single mutants and dact1/dact2-/- double mutants, and examine whether the CE phenotypes are exacerbated in the double mutants, which may lend support to the authors' idea that dact1 can contribute to CE. The authors stated in the discussion that they "posit that dact1 expression in the mesoderm is required for dorsal CE during gastrulation through its role in noncanonical Wnt/PCP signaling". However, no evidence was presented in the paper to show that dact1 influences CE during body axis elongation.  

      Because any axis shortening in shortening in dact2-/- single mutants was overcome during the course of development and at 5 dpf there was no noticeable phenotype, we did not analyze the single mutants further.  

      We have added data to demonstrate the resulting phenotype of each combinatorial genotype to provide a more clear and detailed description of the single and compound mutants (Fig. S3). 

      Our hypothesis that dact1 may contribute to convergent extension is based on its apparent ability to compensate (either directly or indirectly) for dact2 loss in the dact2-/- single mutant. 

      Comment (2) Except in Fig. 2, I could not find n numbers given in other experiments. It is therefore unclear if these mutant phenotypes were fully or partially penetrant. In general, there is also a lack of quantifications to help support the qualitative results. For example, in Fig. 4, n numbers should be given and cell movements and/or contributions to the ANC should be quantified to statistically demonstrate that the second stream of CNCC failed to contribute to the ANC.  

      Similarly, while the fan-shaped and the rod-shaped ANCs are very distinct, the various rod-shaped ANCs need to be quantified (e.g. morphometry or measurements of morphological features) in order for the authors to claim that these are "novel ANC forms", such as in the dact1/2-/-, gpc4/dact1/2-/-, and wls/dact1/2-/- mutants (Fig. 5).  

      We have added n numbers for each experiment and stated that the rod-like phenotype of the dact1/2-/- mutant was fully penetrant. 

      Regarding CNCC experiments, we repeated the analysis on 3 individual controls and mutants and did not find evidence that CNCC migration was directly affected in the dact1/2 mutant. Rather, differences in ANC development are likely secondary to defects in floor plate and eye field morphometry. Therefore we did not do any further analyses of the CNCCs.

      Regarding figure 5, we have added n numbers. We dissected and analyzed a minimum of three triple mutants (dact1/2-/-,gpc4-/- and dact1/2-/-,wls-/-) and numerous dact1/s double mutants and found that the triple mutant ANC phenotype was consistent and recognizably different enough from the dact1/2-/-, or gpc4 or wls single mutant that morphometry measurements were not needed. Further, the triple mutant phenotype (narrow and shortened) appears to be a simple combination of dact1/2 (narrow) and gpc4/wls (shortened) phenotypes. As we did not find evidence of genetic epistasis, we did not analyze the novel ANC forms further.

      Comment (3): The authors have attributed the ANC phenotypes in dact1/2-/- to CE defects and altered noncanonical wnt signaling. However, no evidence was presented to support either. The authors can perhaps utilize diI labelling, photoconversionmediated lineage tracing, or live imaging to study cell movement in the ANC and compare that with the cell movement change in the gpc4-/- , and gpc4/dact1/2-/- mutants in order to first establish that dact1/2 affect CE and then examine how dact1/2 mutations can modulate the CE phenotypes in gpc4-/- mutants.  

      Concurrently, given that dact1 and dact2 can affect (perhaps differentially) both canonical and non-canonical wnt signaling, the authors are encouraged to also test whether canonical wnt signaling is affected in the ANC or surrounding tissues, or at minimum, discuss the potential role/contribution of canonical wnt signaling in this context.  

      Given the substantial body of research on the role of noncanonical wnt signaling and planar cell polarity pathway on convergent extension during axis formation (reviewed by Yang and Mlodzik 2015, Roszko et al., 2009) and the resulting phenotypes of various zebrafish mutants (i.e. Xing 2018, Topczewski 2001), including previous research on dact1 and 2 morphants (Waxman 2004), we did not find it necessary to analyze CE cell movements directly.  

      Our finding that CNCC migration was not defective in the dact1/2 mutants and the knowledge that various zebrafish mutants with anterior patterning defects (slb, smo, cyc) have a similar craniofacial abnormality led us to conclude that the rod-like ANC in the dact1/2 mutant was secondary to an early patterning defect (abnormal eye field morphology). Therefore, testing dact1/2 and convergent extension or wnt signaling in the ANC itself was not an aim of this paper.  

      Comment (4) The authors also have not ruled out other possibilities that could cause the dact1/2-/- ANC phenotype. For example, increased cell death or reduced proliferation in the ANC may result in the phenotype, and changes in cell fate specification or differentiation in the second CNCC stream may also result in their inability to contribute to the ANC. 

      We agree that we cannot rule out whether cell death or proliferation is different in the dact1/2 mutant ANC. However, because we do not find the second CNCC stream within the ANC, this is the most likely explanation for the abnormal ANC shape. Because the first stream of CNCC are able to populate the ANC and differentiate normally, it is most likely that the inability of the second stream to populate the ANC is due to steric hindrance imposed by the abnormal cranial/eye field morphology. These hypotheses would need to be tested, ideally with an inducible dact1/2 mutant, however, this is beyond the scope of this paper.     

      Comment (5) The last paragraph of the section "Genetic interaction of dact1/2 with Wnt regulators..." misuses terms and conflates phenotypes observed. For instance, the authors wrote "dact2 haploinsuffciency in the context of dact1-/-; gpc4-/- double mutant produced ANC in the opposite phenotypic spectrum of ANC morphology, appearing similar to the gpc4-/- mutant phenotype". However, if heterozygous dact2 is not modulating phenotypes in this genetic background, its function is not "haploinsuffcient". The authors then said, "These results show that dact1 and dact2 do not have redundant function during craniofacial morphogenesis, and that dact2 function is more indispensable than dact1". However this statement should be confined to the context of modulating gpc4 phenotypes, which is not clearly stated. 

      Revised (Lines 380, 382).   

      Comment (6) For the scRNA-seq analysis, the authors should show the population distribution in the UMAP for the 3 genotypes, even if there are no obvious changes. The authors are encouraged, although not required, to perform pseudotime or RNA velocity analysis to determine if differentiation trajectories are changed in the NC populations, in light of what they found in Fig. 4. The authors can also check the expression of reporter genes downstream of certain pathways, e.g. axin2 in canonical wnt signaling, to query if these signaling activities are changed (also related to point #3 above). 

      We have added population distribution data for the 3 genotypes to Supplemental Figure 4. Although RNA velocity analysis would be an interesting additional analysis, we would hypothesize that the NC population is not driving the differences in phenotype. Rather these are likely changes in the anterior neural plate and mesoderm. 

      Comment (7) While the phenotypic difference between gpc4-/- and dact1/2-/- are in the ANC at a later stage, ssRNA-seq was performed using younger embryos. The authors should better explain the rationale and discuss how transcriptomic differences in these younger embryos can explain later phenotypes. Importantly, dact1, dact2, and capn8 expression were not shown in and around the ANC during its development and this information is crucial for interpreting some of the results shown in this paper. For example, if dact1 and dact2 are expressed during ANC development, they may have specific functions during that stage. Alternatively, if dact1 and dact2 are not expressed when the second stream CNCCs are found to be outside the ANC, then the ANC phenotype may be due to dact1/2's functions at an earlier time point. The author's statement in the discussion that "embryonic fields determined during gastrulation effect the CNCC ability to contribute to the craniofacial skeleton" is currently speculative. 

      We have reworded our rationale and hypothesis to increase clarity (Lines 391-405). We believe that the ANC phenotype of the dact1/2 mutants is secondary to defective CE and anterior axis lengthening, as has been reported for the slb mutant (Heisenberg 1997, 2000). We utilized the gpc4 mutant as a foil to the dact1/2 mutant, as the gpc4 mutant has defective CE and axis extension without the same craniofacial phenotype.

      We have added dact1 and dact2 WISH of 24 and 48 hpf (Fig1. D,E) to show expression during ANC development. 

      Comment (8) The functional testing of capn8 did not yield a result that would suggest a strong effect, as only 1 in 142 animals phenocopied dact1/2. Therefore, while the result is interesting, the authors should tone down its importance. Alternatively, the authors can try knocking down capn8 in the dact1/2 mutants to test how that affects the CE phenotype during axis elongation, as well as ANC morphogenesis. 

      As overexpression of capn8 in wildtype animals did not result in a significant phenotype, we tested capn8 overexpression in compound dact1/2 mutants as these have a sensitized background. We found a small but statistically significant effect of exogenous capn8 in dact1+/-,dact2+/- animals. While the effect is not what one would expect comparing to Mendelian genetic ratios, the rod-like ANC phenotype is an extreme craniofacial dysmorphology not observed in wildtype or mRNA injected embryos hence significant. The experiment is limited by the available technology of over-expressing mRNA broadly without temporal or cell specificity control. It is possible that if capn8 over-expression was restricted to specific cells (floor plate, notochord or mesoderm) and at the optimal time period during gastrulation/segmentation that the aberrant ANC phenotype would be more robust. We agree with the reviewer that although the finding of a new role for capn8 during development is interesting, its importance in the context of dact should be toned down and we have altered the manuscript accordingly (Lines 455-467).  

      Comment (9) A difference between the two images in Fig. 8B is hard to distinguish.

      Consider showing flat-mount images. 

      We have added flat-mount images to Fig. 8B

      Minor comments:

      Comment (1) wnt11f2 is spelled incorrectly in a couple of places, e.g. "wnt11f2l" in the abstract and "wntllf2" in the discussion. 

      Revised throughout.

      Comment (2) For Fig. 1D, the white dact1 and yellow dact2 are hard to distinguish in the merged image. Consider changing one of their colors to a different one and only merge dact1 and dact2 without irf6 to better show their complementarity.  

      We agree with the reviewer that the expression patterns of dact1 and dact2 are difficult to distinguish in the merged image. We have added outlines of the cartilage elements to the images to facilitate comparisons of dact1 and dact2 expression (Fig 1F). 

      Comment (3) For Fig. 1E, please label the clusters mentioned in the text so readers can better compare expressions in these cell populations.  

      We have moved this data to supplementary figure S1 and have added labels.

      Comment (4) The citing and labelling of certain figures can be more specific. For example, Fig. S1A, B, and Fig. S1C should be used instead of just Fig. S1 (under the section titled dact1 and dact2 contribute to axis extension...". Similarly, Fig. 4 can be better labeled with alphabets and cited at the relevant places in the text.  

      We have modified the labeling of the figures according to the reviewer’s suggestion (Fig S2 (previously S1), Fig4) and have added reference to these labels in the text (Lines 202, 204, 212, 328, 334, 336). 

      Comment (5) For Fig. 2B, the (+/+,-/-) on x-axis should be (+/-,-/-).  

      Revised in Figure 2B.

      Comment (6) Several figures are incorrectly cited. Fig. 2C is not cited, and the "Fig. 2C" and "Fig. 2D" cited in the text should be "Fig. 2D" and "Fig. 2E" respectively. Similarly, Fig. 5C and D are not cited in the text and the cited Fig. 5C should be 5E. The VC images in Fig. 5 are not talked about in the text. Finally, Fig. 7C was also not mentioned in the text.  

      We have corrected the labeling and have added descriptions of each panel in the Results (Fig.2 Line 231, 237, 242, Fig 5 Line 373, 381, Fig 7 line 431). 

      Comment (7) In the main text, it is indicated that zebrafish at 3ss were used for ssRNAseq, but in the figure legend, it says 4ss. 

      Revised (Line 682)

      Comment (8) No error bars in Fig. S1B and the difference between the black and grey shades in Fig. S1D is not explained.  

      Error bars are not included in the graphs of qPCR results (now Fig S2C) as these are results of a pool of 8 embryos performed one time. We have added a legend to explain the gray vs. black bars (now Fig S2E). 

      Reviewer #3 (Public Review):  

      Weaknesses: The hypotheses are very poorly defined and misinterpret key previous findings surrounding the roles of wnt11 and gpc4, which results in a very confusing manuscript. Many of the results are not novel and focus on secondary defects. The most novel result of overexpressing calpain8 in dact1/2 mutants is preliminary and not convincing.  

      We apologize for not presenting the question more clearly. The Introduction was revised with particular attention to distinguish this work using genetic germline mutants from prior morpholino studies. Please refer to pages 4-5, lines 106-121.

      Weakness 1) One major problem throughout the paper is that the authors misrepresent the fact that wnt11f2 and gpc4 act in different cell populations at different times. Gastrulation defects in these mutants are not similar: wnt11 is required for anterior mesoderm CE during gastrulation but not during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and later craniofacial cartilage morphogenesis (LeClair et al., 2009). Overall, the non-overlapping functions of wnt11 and gpc4, both temporally and spatially, suggest that they are not part of the same pathway.  

      We have reworded the text to add clarity. While the loss of wnt11 versus the loss of gpc4 may affect different cell populations, the overall effect is a shortened body axis. We stressed that it is this similar impaired axis elongation phenotype but discrepant ANC morphology phenotypes in the opposite ends of the ANC morphologic spectrum that is very interesting and leads us to investigate dact1/2 in the genetic contexts of wnt11f2 and gpc4.  Pls refer to page 4, lines 73-84. Further, the reviewer’s comment that wnt11 and gpc4 are spatially and temporally distinct is untested. We think the reviewer’s claim of gpc4 acting in the posterior mesoderm refers to its requirement in the tailbud (Marlow 2004). However this does not exclude gpc4 from acting elsewhere as well. Further experiments would be necessary. Both wnt11f2 and gpc4 regulate non-canonical wnt signaling and are coexpressed during some points of gastrulation and CF development (Gupta et al., 2013; Sisson 2015). This data supports the possibility of overlapping roles. 

      Weakness 2) There are also serious problems surrounding attempts to relate single-cell data with the other data in the manuscript and many claims that lack validation. For example, in Fig 1 it is entirely unclear how the Daniocell scRNA-seq data have been used to compare dact1/2 with wnt11f2 or gpc4. With no labeling in panel 1E of this figure these comparisons are impossible to follow. Similarly, the comparisons between dact1/2 and gpc4 in scRNA-seq data in Fig. 6 as well as the choices of DEGs in dact1/2 or gpc4 mutants in Fig. 7 seem arbitrary and do not make a convincing case for any specific developmental hypothesis. Are dact1 and gpc4 or dact2 and wnt11 coexpressed in individual cells? Eyeballing similarity is not acceptable.  

      We have moved the previously published Daniocell data to Figure S1 and have added labeling. These data are meant to complement and support the WISH results and demonstrate the utility of using available public Daniocell data. Please recommend how we can do this better or recommend how we can remediate this work with specific comment. 

      Regarding our own scRNA-seq data, we have added rationale (line 391-403) and details of the results to increase clarity (Lines 419-436). We have added a panel to Figure 6 (panel A) to help illustrate or rationale for comparing dact1/2 to gpc4 mutants to wt. The DEGs displayed in Fig.7A are the top 50 most differentially expressed genes between dact1/2 mutants and WT (Figure 7 legend, line 422-424).   

      We have looked at our scRNA-seq gene expression results for our clusters of interest (lateral plate mesoderm, paraxial mesoderm, and ectoderm). We find dact1, dact2, and gpc4 co-expression within these clusters. Knowing whether these genes are coexpressed within the same individual cell would require going back and analyzing the raw expression data. We do not find this to be necessary to support our conclusions. The expression pattern of wnt11f2 is irrelevant here.   

      Weakness 3) Many of the results in the paper are not novel and either confirm previous findings, particularly Waxman et al (2004), or even contradict them without good evidence. The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription or vice versa. Testing genetic interactions, including investigating the expression of wnt11f2 in dact1/2 mutants, dact1/2 expression in wnt11f2 mutants, or the ability of dact1/2 to rescue wnt11f2 loss of function would give this work a more novel, mechanistic angle.

      We clarified here that the prior work carried out by Waxman using morppholinos, while acceptable at the time in 2004, does not meet the rigor of developmental studies today which is to generate germline mutants. The reviewer’s acceptance of the prior work at face value fails to take the limitation of prior work into account. Further, the prior paper from Waxman et al did not analyze craniofacial morphology other than eyeballing the shape of the head and eyes. Please compare the Waxman paper and this work figure for figure and the additional detail of this study should be clear. Again, this is by no means any criticism of prior work as the prior study suffered from the technological limitations of 2004, just as this study also is the best we can do using the tools we have today. Any discrepancies in results are likely due to differences in morpholino versus genetic disruption and most reviewers would favor the phenotype analysis from the germline genetic context. We have addressed these concerns as objectively as we can in the text (Lines 482-493). The fact that dact1/2 double mutants display a craniofacial phenotype while the single mutants do not, suggests compensation (Lines 503-505), but not necessarily at the mRNA expression level (Fig. S2C). 

      This paper tests genetic interaction through phenotyping the wntll/dact1/dact2 mutant.

      Our results support the previous literature that dact1/2 act downstream of wnt11 signaling. There is no evidence of cross-regulation of gene expression. We do not expect that changes in wnt11 or dact would result in expression changes in the others.

      RNA-seq of the dact1/2 mutants did not show changes in wnt11 gene expression. Unless dact1 and/or dact2 mRNA are under expressed in the wnt11 mutant, we would not expect a rescue experiment to be informative. And as wnt11 is not a focus of this paper, we have not performed the experiment.  

      Weakness 4) The identification of calpain 8 overexpression in Dact1/2 mutants is interesting, but getting 1/142 phenotypes from mRNA injections does not meet reproducibility standards.

      As the occurrence of the mutant phenotype in wildtype animals with exogenous capn8 expression was below what would meet reproducibility standards, we performed an additional experiment where capn8 was overexpressed in embryos resulting from dact1/dact2 double heterozygotes incross (Fig. 8). We reasoned that an effect of capn8 overexpression may be more robust on a sensitized background. We found a statistically significant effect of capn8 in dact1/2 double heterozygotes, though the occurrence was still relatively rare (6/80). These data suggest dysregulation of capn8 contributes to the mutant ANC phenotype, though there are likely other factors involved. 

      Comment: The manuscript title is not representative of the findings of this study.  

      We revised the title to strictly describe that we generated and carried out genetic analysis in loss of function compound mutants (Genetic requirement) and that we found capn8 was important which modified this requirement.

      Introduction: p.4:

      Comment: Anterior neurocranium (ANC) - it has to be stated that this refers to the combined ethmoid plate and trabecular cartilages. 

      Thank you, we agree that the ANC and ethmoid plate terminology has been confusing in the literature and we should endeavor to more clearly describe that the phenotypes in question are all in the ethmoid plate and the trabeculae are not affected. ANC has been replaced with ethmoid plate (EP) throughout the manuscript and figures. We also describe that all the observed phenotypes affect the ethmoid plate and not the trabeculae, (pages 13, Lines 265-267).

      Comment: Transverse dimension is incorrect terminology - replace with medio-lateral.

      Revised (Lines 69, 74).

      Comment: Improper way of explaining the relationship between mutant and gene..."Another mutant knypek, later identified as gpc4..." a better  way to explain this would be that the knypek mutation was found to be a non-sense mutation in the gpc4 gene.  

      Revised (Line 71)

      Comment: "...the gpc4 mutant formed an ANC that is wider in the transverse dimension than the wildtype, in the opposite end of the ANC phenotypic spectrum compared to wnt11f2...These observations beg the question how defects in early patterning and convergent extension of the embryo may be associated with later craniofacial morphogenesis."

      This statement is broadly representative of the general failure to distinguish primary from secondary defects in this manuscript. Focusing on secondary defects may be useful to understand the etiology of a human disease, but it is misleading to focus on secondary defects when studying gene function. The rod-like ethmoid of slb mutant results from a CE defect of anterior mesoderm during gastrulation(Heisenberg et al. 1997, 2000), while the wide ethmoid plate of kny mutants results from CE defects of cartilage precursors (Rochard et al., 2016). Based on this evidence, wnt11f2 and gpc4 act in different cell populations at different times.  

      It is true that the slb mutant craniofacial phenotype has been stated as secondary to the CE defect during gastrulation and the kny phenotype as primary to chondrocyte CE defects in the ethmoid, however the direct experimental evidence to conclude only primary or only secondary effects does not yet exist. There is no experiment to our knowledge where wnt11f2 was found to not affect ethmoid chondrocytes directly. Likewise, there is no experiment having demonstrated that dysregulated CE in gpc4 mutants does not contribute to a secondary abnormality in the ethmoid. 

      Here, we are analyzing the CE and craniofacial phenotypes of the dact1/2 mutants without any assumptions about primary or secondary effects and without drawing any conclusions about wnt11f2 or gpc4 cellular mechanisms.     

      Comment: "The observation that wnt11f2 and gpc4 mutants share similar gastrulation and axis extension phenotypes but contrasting ANC morphologies supports a hypothesis that convergent extension mechanisms regulated by these Wnt pathway genes are specific to the temporal and spatial context during embryogenesis."

      This sentence is quite vague and potentially misleading. The gastrulation defects of these 2 mutants are not similar - wnt11 is required for anterior mesoderm CE during gastrulation and has not been shown to be active during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and craniofacial cartilage morphogenesis (LeClair et al., 2009). Here again, the non-spatially overlapping functions of wnt11 and gpc4 suggest that are not part of the same pathway.  

      Though the cells displaying defective CE in wnt11f2 and gpc4 mutants are different, the effects on the body axis are similar. The dact1/2 showed a similar axis extension defect (grossly) to these mutants. Our aim with the scRNA-seq experiment was to determine which cells and gene programs are disrupted in dact1/2 mutants. We found that some cell types and programs were disrupted similarly in dact1/2 mutants and gpc4 mutants, while other cells and programs were specific to dact1/2 versus gpc4 mutants. We can speculate that these that were specific to dact1/2 versus gpc4 may be attributed to CE in the anterior mesoderm, as is the case for wnt11. 

      p.5

      Comment: "We examined the connection between convergent extension governing gastrulation, body axis segmentation, and craniofacial morphogenesis." A statement focused on the mechanistic findings of this paper would be welcome here, instead of a claim for a "connection" that is vague and hard to find in the manuscript.  

      We have rewritten this statement (Line 125).

      p.7 Results:

      Comment: It is unclear why Farrel et al., 2018 and Lange et al., 2023 are appropriate references for WISH. Please justify or edit.  

      This was a mistake and has been edited (Page 9).

      Comment: " Further, dact gene expression was distinct from wnt11f2." This statement is inaccurate in light of the data shown in Fig1A and the following statements - please edit to reflect the partially overlapping expression patterns.  

      We have edited to clarify (Lines 142-143).

      p.8

      Comment: "...we examined dact1 and 2 expression in the developing orofacial tissues. We found that at 72hpf..." - expression at 72hpf is not relevant to craniofacial morphogenesis, which takes place between 48h-60hpf (Kimmel et al., 1998; Rochard et al., 2016; Le Pabic et al., 2014).  

      We have included images and discussion of dact1 and dact2 expression at earlier time points that are important to craniofacial development (Lines 160-171)(Fig 1D,E). 

      Comment: "This is in line with our prior finding of decreased dact2 expression in irf6 null embryos". - This statement is too vague. How are th.e two observations "in line".  

      We have removed this statement from the manuscript.

      Comment: Incomplete sentence (no verb) - "The differences in expression pattern between dact1 and dact2...".  

      Revised (Line 172).

      Comment: "During embryogenesis..." - Please label the named structures in Fig.1E.

      Please be more precise with the described expression time. Also, it would be useful to integrate the scRNAseq data with the WISH data to create an overall picture instead of treating each dataset separately.  

      We have moved the previously published Daniocell data to supplementary figure S1 and have labeled the key cell types. 

      p.9

      Comment: "The specificity of the gene disruption was demonstrated by phenotypic rescue with the injection of dact1 or dact2 mRNA (Fig. S1)." - please describe what is considered a phenotypic rescue.

      -The body axis reduction of dact mutants needs to be documented in a figure. Head pictures are not sufficient. Is the head alone affected, or both the head and trunk/tail? Fig.2E suggests that both head and trunk/tail are affected - please include a live embryos picture at a later stage.  

      We have added a description of how phenotypic rescue was determined (Line 208). We have added a figure with representative images of the whole body of dact1/2 mutants. Measurements of body length found a shortening in dact1/2 double mutants versus wildtype, however differences were not found to be significantly different by ANOVA (Fig. 3C, Fig. S3, Line 270-275).

      p. 11

      Comment: "These dact1-/-;dact2-/- CE phenotypes were similar to findings in other Wnt mutants, such as slb and kny (Heisenberg, Tada et al., 2000; Topczewski, Sepich et al., 2001)." The similarity between slb and kny phenotypes should be mentioned with caution as CE defects affect different regions in these 2 mutants. It is misleading to combine them into one phenotype category as wnt11 and gpc4 are most likely not acting in the same pathway based on these spatially distinct phenotypes.  

      Here we are referring to the grossly similar axis extension defects in slb and kny mutants. We refer to these mutants to illustrate that dact1 and or 2 deficiency could affect axis extension through diverse mechanisms. We have added text for clarity (Lines 249-252).  

      Comment: "No craniofacial phenotype was observed in dact1 or dact2 single mutants. However, in-crossing to generate [...] compound homozygotes resulted in dramatic craniofacial deformity."

      This result is intriguing in light of (1) the similar craniofacial phenotype previously reported by Waxman et al (2004) using morpholino- based knock-down of dact2, and the phenomenon of genetic compensation demonstrated by Jakutis and Stainier 2001 (https://doi.org/10.1146/annurev-genet-071719-020342). The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription, as such compensation could lead to inaccurate conclusions if ignored.  

      We agree with the reviewer that genetic compensation of dact2 by dact1 likely explains the different result found in the dact2 morphant versus CRISPR mutant. We found increased dact1 mRNA expression in the dact2-/- mutant (Fig S2X) however a more thorough examination is required to draw a conclusion. Interestingly, we found that in wildtype embryos dact1 and dact2 expression patterns are distinct though with some overlap. It would be informative to investigate whether the dact1 expression pattern changes in dact2-/- mutants to account for dact2 loss.   

      Comment: "Lineage tracing of NCC movements in dact1/2 mutants reveals ANC composition" - the title is misleading - ANC composition was previously investigated by lineage tracing (Eberhardt et al., 2006; Wada et al., 2005).  

      This has been reworded (Line 292)

      p.13

      Comment: There is no frontonasal prominence in zebrafish.  

      This is true, texts have been changed to frontal prominence.  (Lines 293,

      299, 320)

      Comment: The rationale for investigating NC migration in mutants where there is a gastrula-stage failure of head mesoderm convergent extension is unclear. The whole head is deformed even before neural crest cells migrate as the eye field does not get split in two (Heisenberg et al., 1997; 2000), suggesting that the rod-like ethmoid plate is a secondary defect of this gastrula-stage defect. In addition, neural crest migration and cartilage morphogenesis are different processes, with clear temporal and spatial distinctions.  

      We carried out the lineage tracing experiment to determine which NC streams contributed to the aberrantly shaped EP, whether the anteromost NC stream frontal prominence, the second NC stream of maxillary prominence, or both.  We found that the anteromost NCC did contribute to the rod-like EP, which is different from when hedgehod signaling is disrupted,  So while it is possible that the gastrula-effect head mesoderm CE caused a secondary effect on NC migration, how the anterior NC stream and second NC stream are affected differently between dact1/2 and shh pathway is interesting.  We added discussion of this observation to the manuscript (page 23, Lines 514-520). 

      p. 14-16

      Comment: Based on the heavy suspicion that the rod-like ethmoid plate of the dact1/2 mutant results from a gastrulation defect, not a primary defect in later craniofacial morphogenesis, the prospect of crossing dact1/2 mutants with other wnt-pathway mutants for which craniofacial defects result from craniofacial morphogenetic defects is at the very least unlikely to generate any useful mechanistic information, and at most very likely to generate lots of confusion. Both predictions seem to take form here.  

      However, the ethmoid plate phenotype observed in the gpc4-/-; dact1+/-; dact2-/- mutants (Fig. 5E) does suggest that gpc4 may interact with dact1/2 during gastrulation, but that is the case only if dact1+/-; dact2-/- mutants do not have an ethmoid cartilage defect, which I could not find in the manuscript. Please clarify.  

      The perspective that the rod-like EP of the dact1/2 is due to gastrulation defect is being examined here. Why would other mutants such as wnt11f2 and gpc4 that have gastrulation CE defects have very different EP morphology, whether primary or secondary NCC effect?  Further dact1 and dact2 were reported as modifiers of Wnt signaling, so it is logical to genetically test the relationship between dact1, dact2, wnt11f2, gpc4 and wls. The experiment had to be done to investigate how these genetic combinations impact EP morphology. This study found that combined loss of dact1, dact2 and wls or gpc4 yielded new EP morphology different than those previously observed in either dact1/2, wls, gpc4, or any other mutant is important, suggesting that there are distinct roles for each of these genes contributing to facial morphology, that is not explained by CE defect alone.   

      Comment: I encourage the authors to explore ways to test whether the rod-like ethmoid of dact1/2 mutants is more than a secondary effect of the CE failure of the head mesoderm during gastrulation. Without this evidence, the phenotypes of dact1/2 -gpc4 or - wls are not going to convince us that these factors actually interact.  

      Actually, we find our results to support the hypothesis that the ethmoid of the dact1/2 mutants is a secondary effect of defective gastrulation and anterior extension of the body axis. However, our findings suggest (by contrasting to another mutant with impaired CE during gastrulation) that this CE defect alone cannot explain the dysmorphic ethmoid plate. Our single-cell RNA seq results and the discovery of dysregulated capn8 expression and proteolytic processes presents new wnt-regulated mechanisms for axis extension.    

      p. 20 Discussion

      Comment: "Here we show that dact1 and dact2 are required for axis extension during gastrulation and show a new example of CE defects during gastrulation associated with craniofacial defects."

      Waxman et al. (2004) previously showed that dact2 is involved in CE during gastrulation.

      Heisenberg et al. (1997, 2000), previously showed with the slb mutant how a CE defect during gastrulation causes a craniofacial defect.  

      The Waxman paper using morpholino to disrupt dact2 is produced limited analysis of CE and no analysis of craniofacial morphogenesis. We generated genetic mutants here to validate the earlier morpholino results and to analyze the craniofacial phenotype in detail. We have removed the word “new” to make the statement more clear (Line 475).

      Comment: "Our data supports the hypothesis that CE gastrulation defects are not causal to the craniofacial defect of medially displaced eyes and midfacial hypoplasia and that an additional morphological process is disrupted."

      It is unclear to me how the authors reached this conclusion. I find the view that medially displaced eyes and midfacial hypoplasia are secondary to the CE gastrulation defects unchallenged by the data presented. 

      This statement was removed and the discussion was reworded.

      Comment: The discussion should include a detailed comparison of this study's findings with those of zebrafish morpholino studies.  

      We have added more discussion to compare ours to the previous morpholino findings (Lines 476-484).

      Comment: The discussion should try to reconcile the different expression patterns of dact1 and dact2, and the functional redundancy suggested by the absence of phenotype of single mutants. Genetic compensation should be considered (and perhaps tested).  

      The different expression patterns of dact1 and dact2 along with our finding that dact1 and dact2 genetic deficiency differently affect the gpc4 mutant phenotype suggest that dact1 and dact2 are not functionally redundant during normal development. This is in line with the previously published data showing different phenotypes of dact1 or dact2 knockdown. However, our results that genetic ablation of both dact1 and dact2 are required for a mutant phenotype suggests that these genes can compensate upon loss of the other. This would suggest then that the expression pattern of dact1 would be changed in the dact2 mutant and visa versa. We find that this line of investigation would be interesting in future studies. We have addressed this in the Discussion (Lines 485498).

      Comment: "Based on the data...Conversely, we propose...ascribed to wnt11f2 "

      Functional data always prevail overexpression data for inferring functional requirements.  

      This is true.

      p.21

      Comment: "Our results underscore the crucial roles of dact1 and dact2 in embryonic development, specifically in the connection between CE during gastrulation and ultimate craniofacial development."

      How is this novel in light of previous studies, especially by Waxman et al. (2004) and Heisenberg et al. (1997, 2000). In this study, the authors fail to present compelling evidence that craniofacial defects are not secondary to the early gastrulation defects resulting from dact1/2 mutations.  p. 22

      We have not claimed that the craniofacial defects are not secondary to the gastrulation defects. In fact, we state that there is a “connection”. Further, we do not claim that this is the first or only such finding. We believe our findings have validated the previous dact morpholino experiments and have contributed to the body of literature concerning wnt signaling during embryogenesis. 

      Comment: The section on Smad1 discusses a result not reported in the results section. Any data discussed in the discussion section needs to be reported first in the results section.  

      We have added a comment on the differential expression of smad1 to the results section (Lines 446-448).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript entitled "Hexokinase regulates Mondo-mediated longevity via the PPP and organellar dynamics", Laboy and colleagues investigated upstream regulators of MML-1/Mondo, a key transcription factor that regulates aging and metabolism, using the nematode C. elegans and cultured mammalian cells. By performing a targeted RNAi screen for genes encoding enzymes in glucose metabolism, the authors found that two hexokinases, HXK-1 and HXK-2, regulate nuclear localization of MML-1 in C. elegans. The authors showed that knockdown of hxk-1 and hxk-2 suppressed longevity caused by germline-deficient glp-1 mutations. The authors demonstrated that genetic or pharmacological inhibition of hexokinases decreased nuclear localization of MML-1, via promoting mitochondrial β-oxidation of fatty acids. They found that genetic inhibition of hxk-2 changed the localization of MML-1 from the nucleus to mitochondria and lipid droplets by activating pentose phosphate pathway (PPP). The authors further showed that the inhibition of PPP increased the nuclear localization of mammalian MondoA in cultured human cells under starvation conditions, suggesting the underlying mechanism is evolutionarily conserved. This paper provides compelling evidence for the mechanisms by which novel upstream metabolic pathways regulate MML-1/Mondo, a key transcription factor for longevity and glucose homeostasis, through altering organelle communications, using two different experimental systems, C. elegans and mammalian cells. This paper will be of interest to a broad range of biologists who work on aging, metabolism, and transcriptional regulation. 

      Reviewer #2 (Public Review):

      Raymond Laboy et.al explored how transcriptional Mondo/Max-like complex (MML-1/MXL-2) is regulated by glucose metabolic signals using germ-line removal longevity model. They believed that MML-1/MXL-2 integrated multiple longevity pathways through nutrient sensing and therefore screened the glucose metabolic enzymes that regulated MML-1 nuclear localization. Hexokinase 1 and 2 were identified as the most vigorous regulators, which function through mitochondrial beta-oxidation and the pentose phosphate pathway (PPP), respectively. MML-1 localized to mitochondria associated with lipid droplets (LD), and MML-1 nuclear localization was correlated with LD size and metabolism. Their findings are interesting and may help us to further explore the mechanisms in multiple longevity models, however, the study is not complete and the working model remains obscure. For example, the exact metabolites that account for the direct regulation of MML-1 were not identified, and more detailed studies of the related cellular processes are needed. 

      The identification of responsible metabolites is necessary since multiple pieces of evidence from the study suggests that lipid other than glucose metabolites may be more likely to be the direct regulator of MML-1 and HXK regulate MML-1 indirectly by affecting the lipid metabolism: 1) inhibiting the PPP is sufficient to rescue MML-1 function independent of G6P levels; 2) HXK-1 regulates MML-1 by increasing fatty acid beta-oxidation; 3) LD size correlates with MML-1 nuclear localization and LD metabolism can directly regulate MML-1. The identification of metabolites will be helpful for understanding the mechanism. 

      Beta-oxidation and the PPP are involved in the regulation of MML-1 by HXK-1 and HXK-2, respectively. But how these two pathways participate in the regulation is not clear. Is it the beta-oxidation rate or the intermediate metabolites that matters? As for the PPP, it provides substrates for nucleotide synthesis and also its product NADPH is essential for redox balance. Is one of the metabolites or the NADPH levels involved in MML-1 regulation? More studies are needed to provide answers to these concerns. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Following are my comments that the authors may want to address to further improve this excellent paper.

      Major comments 

      (1) Although the authors provided evidence that hexokinases in glucose metabolism are associated with germline-deficient glp-1(-) mutants, they did not mention why they focused on glp-1(-) mutants rather than other longevity mutants. In their previous study (Nakamura et al., 2016), they showed that MML-1 is required for multiple longevity pathways in C. elegans, including reduced mitochondrial respiration and insulin/IGF-1 signaling. Please discuss why the authors focused on glp-1(-) mutants in this paper. It will be even better if the authors test the roles of hexokinases in some other longevity regimens. 

      Many thanks for this astute comment. Previously we had shown that mml-1 is required for glp-1, daf-2, and isp-1 longevity, and Johnson et al. had shown a requirement for eat-2, hence the idea that MML-1 is a convergent transcription factor. We first focused on glp-1 because that was the starting point of our screen, and the result was clear and simple: hexokinases regulate MML‑1 nuclear localization and activity in glp-1 and are required for longevity. Naturally, the question arises: do hexokinases behave like MML-1 as convergent longevity regulators across pathways? To address this, we examined the interaction of hxk-1 and hxk-2 with isp-1, daf-2, and raga-1.  Specifically, we now show that:

      A. Like glp-1(e2141) mutants, isp-1(qm150) mutants stimulate MML-1 nuclear localization, and the hexokinases are required for isp-1 longevity (Figure 1G-H).

      B. daf-2(e1370) mutants do not further stimulate MML-1 nuclear localization beyond basal levels, yet MML-1 is strongly required for daf-2 longevity (Nakamura et al., 2016, Supplementary Figure 1L-M). However, the hexokinases are not required for daf-2 longevity (Supplementary Figure 1M), suggesting that the signaling pathway is wired differently in daf-2, and that other pathways regulate MML-1 activity.

      C. raga-1(ok701) mutants stimulate MML-1 nuclear localization and mml-1 is required for raga-1 longevity, suggesting that MML-1 acts downstream of TORC1 signaling (Supplementary Figure 1N-O). However, hexokinases are not required for raga-1 longevity, suggesting that raga-1 acts downstream or parallel to hexokinase signaling (Supplementary Figure 1P).

      D. We performed untargeted metabolomics in glp-1, daf-2, and mml-1 single and double mutants and observed that hexose phosphates, which have been shown to regulate MML-1 human homologs MondoA/ChREBP, were differentially regulated between mutants.

      Author response image 1.

      E. Altogether these experiments reveal that though MML-1 promotes longevity in most pathways, the hexokinases are only required in some (glp-1, isp-1), but not others (raga-1, daf-2). Furthermore, strong MML-1 nuclear localization is often but not always associated with longevity (e.g. daf-2), and the wiring of the signaling pathway is different for various longevity regimens. Consistently, mTOR and Insulin signaling are more functionally linked and therefore may show a more similar genetic profile. Differences in hexose phosphate between glp-1 and daf-2 could explain why MML-1 requires hexokinase function in glp-1 to promote longevity but not in daf-2. However, considerably more work is required to rigorously validate this hypothesis.

      (2) In figure 5, the authors investigated whether the association between PPP and MML‑1/MondoA, tested in C. elegans, is conserved in mammals under starvation conditions. The authors should clarify why they tested the MondoA localization upon starvation in cultured human cells. This comment is related to my comment #1 as the authors could determine the roles of hexokinases under dietary restriction (DR)-conditions or in DR-mimetic in eat-2(-) mutants. 

      In this case, the actual translatability to a worm longevity pathway was not our goal. Rather, we examined MondoA in cell culture under contrasting conditions of MondoA subcellular localization, where high glucose media had cytosolic/nuclear localization and starvation conditions cytosolic localization. We then showed that similar to our data in worms, PPP inhibition with 6-AN induced MondoA nuclear localization and activity. We now mention this rationale in the results section, lines 352-356.

      (3) In figure 2, the authors showed that HXK-2 regulates mitochondrial localization of MML-1, and HXK-1 regulates nuclear localization of MML-1 through mitochondrial β-oxidation in glp‑1(-) mutants. Can the authors test whether mitochondrial β-oxidation affects the effects of hxk RNAi on longevity of glp-1(-) mutants? 

      Excellent suggestion. We tried to test this idea and found that acs-2 RNAi alone abolished glp-1 longevity, making epistasis experiments difficult to interpret. This is consistent with published data showing that glp-1 longevity requires NHR-49, a transcription factor that regulates mitochondrial b‑oxidation, that drives acs-2 expression (Ratnappan et al., 2014). It could well be that b‑oxidation inhibition promotes MML-1 nuclear localization but abolishes lifespan extension because of epistatic effects on other transcription factors or processes. Further investigation would be required to elucidate the exact mechanism that goes beyond the scope of the paper.

      (4) The authors showed that 2-deoxy-glucose, which decreases the activity of HXK, decreased the nuclear localization of MML-1, and this is consistent with their genetic data. Based on these data, 2-deoxy-glucose is expected to decrease longevity. Interestingly, however, 2-deoxy-glucose has been reported to increase lifespan by restricting glucose, whereas extra glucose intake decreases lifespan in C. elegans, shown by multiple research groups, including M. Ristow, C. Kenyon, and S.J.V. Lee labs. This is seemingly paradoxical and worth discussing with key references, especially because MondoA and Chrebp are known as glucose-responsive transcription factors. 

      Thank you for this important comment. 2-DG has been shown to extend lifespan by suppressing glucose metabolism at concentrations ranging from 0.1 to 5 mM, higher concentrations ranging from 20 to 50 mM had the opposite effect decreasing lifespan (Schulz et al., 2007). The concentration we tested was 50 mM 2-DG and observed decreased MML-1 nuclear localization, which is consistent with the previous data showing decreased longevity. We now raise this point in the discussion suggesting that mild inhibition of glucose metabolism has beneficial effects on longevity, while strong suppression causes a shortening of the lifespan (lines 411-414).

      Minor comments 

      (1) The current Introduction does not include the explicit statement about that MML-1 and MondoA are homologs. Please clarify this as naive readers may be confused.

      Thank you for pointing this out. We now say in the intro that MondoA and MML-1 are homologs (lines 59-60).

      (2) In figure 1, the effects of hxk-3 on nuclear localization of MML-1 is small compared to those of hxk-1 and hxk-2. Please add speculation about why HXK-3 has different roles in nuclear localization of MML-1 compared to HXK-1 and HXK-2. 

      According to GExplore 1.4 (Hutter & Suh, 2016), hxk-3 expression declines during larval development and is low expressed in the adult. Perhaps it has little effect in the young adult, and the other hexokinases suffice to support MML-1 nuclear localization. It also remains possible that hxk-3 is not required in glp-1, but required in other longevity pathways.

      (3) The authors tested the effects of genetic inhibition of hxk-1 and hxk-2 on the regulation of MML-1 localization and lifespan of glp-1(-) mutants by using RNAi. I wonder whether the authors can perform the experiments with hxk-1 or hxk-2 loss (or reduction) of function mutants. If they cannot, please discuss the reason and the limitations of RNAi. 

      This is an important point raised by the reviewer. We found that RNAi was most effective for phenotypes related to MML-1 nuclear localization and longevity, likely because it results in acute knockdown. We also showed that pharmacological inhibition of hexokinase function with 3BrP and 2‑DG (Supplementary Figure 1B and 1C) and the PPP with 6-AN (Figure 3B) had consistent results with our observation with RNAi.

      We generated hexokinase KO mutants by deleting the coding sequence of each hexokinase by CRISPR/Cas9. First, we measured the expression of each hexokinase isozyme in each mutant. Notably, hxk-1(syb1271) null mutant had higher expression of hxk-2 and hxk-3, hxk-2(syb1261) did not significantly affect the expression of hxk-1 and hxk-3, and hxk-3(syb1267) had a mild increase in hxk-2 expression. We followed up on the hxk-1(syb1271) and hxk-2(syb1261) and crossed these mutants with our MML-1::GFP reporter. We observed a modest but significant reduction in MML-1 nuclear localization in both strains. The effect with RNAi is much stronger in comparison to the null mutants, potentially due to a compensatory upregulation of the other hexokinases in the mutants that we do not observe with RNAi (Supplementary Figure 1D-E). Another alternative is that there is a threshold in the effects of hexokinase function on MML-1 nuclear localization. We tried to generate a hxk-1; hxk-2 double mutant but it was lethal and therefore did not pursue this further.

      Author response image 2.

      (4) Please correct minor typos throughout the manuscript. Following are some examples. <br /> - On page 4, line 111, please correct "Supplementary Figure D-E" to "Supplementary Figure 1D-E". 

      - On page 9, line 272, please correct "3A-B" to "4A-B". 

      - On page 9, line 275, please correct "S4" to "4". 

      - On page 10, line 309, please correct "4A" to "4B" 

      Corrected.

      (5) In Fig. 3E, please add the information about the scale bars in figure legends.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      Here are some detailed suggestions for the authors:

      (1) Since MML-1/MXL-2 complex functions in multiple longevity models, e.g. DR, ILS, what are the roles of HXK-1 and HXK-2 in these models? 

      We now show that although mml-1 is required in most longevity pathways, hxk-1 and hxk-2 are required in some pathways (glp-1, isp-1) but not others (daf-2, raga-1). See above for more details.

      (2) As for the metabolites screening, the lipid metabolic genes can be included. Not only for the above reasons, also previous study had found that the mml-1 mRNA levels and MML-1 GFP nuclear localization were all increased in the glp-1 model, while mml-1 mRNA levels were unaffected by hxk knockdown, suggesting more pathways be involved. 

      We agree with the reviewer that understanding what metabolites regulate MML-1 nuclear localization and activity is an important, yet challenging question. Our studies demonstrate a role of glucose metabolism, in particular, hexokinase in this process, consistent with hexose-p being activators of MondoA. Our data also suggest mechanisms beyond hexose-p regulate MML-1, since knockdown of the PPP components stimulates MML-1 even when hxk-2 is depleted and low G6P, and inhibition of the PPP with 6-AN stimulates MondoA nuclear localization under starvation conditions in mammalian cell culture. We tested redox regulation, nucleoside, and lipid metabolism as candidate processes (see below). Notably, our data suggest this other mechanism is tied to lipid metabolism through droplet size since various perturbations that impact LD size and number (atgl-1, dgat-2, tkt-1, Figure 4) affected MML-1 nuclear localization. It remains an open question whether MML-1 is regulated by other metabolites through a ligand-protein interaction or not. We cannot exclude that beyond lipid droplet regulation, specific lipids, other metabolites, or metabolic modules linked to the PPP might regulate MML-1 nuclear localization and activity.

      We employed genetic manipulation and pharmacological inhibition to understand the upstream signals that regulate MML-1. These approaches will not be sufficient to determine whether other metabolite(s) are involved in MML-1/MondoA translocation to the nucleus through a direct interaction. Novel technologies that determine protein-metabolite interactions (e.g. MIDAS) will help us answer this question in future work, and go beyond the scope of this paper. As a compromise, we discuss possible metabolites that may orchestrate this based on our observations based on MML‑1 subcellular localization at LD/mitochondria (including PPP and TCA cycle intermediates).

      (3) Line 238, it should be "NADPH". 

      Corrected.

      (4) RNAi targeting enzymes of different branches of PPP can be performed

      In our initial screen, we examined the effect of various enzymes of the PPP on MML-1 nuclear localization (Figure 1A, Supplementary Table S1) and found that knockdown of enzymes in both the oxidative phase (PGDH/T25B9.9) and non-oxidative phase (transketolase/TKT-1) affect MML-1 nuclear localization. In line, 6-AN treatment, which affects the oxidative phase, also stimulated MML‑1 nuclear localization (Figure 3B). We also observed that knockdown of enzymes involved in ribose 5P conversion to ribose, ribose 1P, and phosphoribosyl pyrophosphate, an intermediate in nucleotide biosynthesis, decreased MML-1 nuclear localization (rpia-1, F07A11._5, _Y43F4B.5, _R151._2; Supplementary Table S1). Whether MML‑1/MondoA responds to nucleotide pool remains elusive.

      (5) As for PPP, these are many possibilities that can be tested. For example, as PPP supplies NADPH for oxidative balance, does MML-1 respond to ROS? Also, it appears the genes in the non-oxidative arm of PPP regulate MML-1, so is nucleotide synthesis involved? 

      Thank you for the suggestion. We tested other enzymes involved in NADPH production from the folate cycle and observed a mild but significant reduction of MML-1 nuclear localization upon dao-3i (Supplementary Table S1). Moreover, we tested whether MML-1 nuclear localization is responsive to ROS. While paraquat exposure induced oxidative stress by measuring the transcriptional reporter gst‑4p::GFP (Supplementary Figure 3A), paraquat exposure did not significantly affect MML-1 nuclear localization (Supplementary Figure 3B). Therefore we think it less likely that NADPH production acting through redox regulation is the main effect.

      We also tried supplementation with some of the metabolite outputs of PPP including ribose, ribulose, and xylulose, as well as nucleosides (see below), but saw no effect on MML-1 nuclear localization. We agree that further studies are required to pinpoint whether there is another metabolic moiety regulating MML-1 at the protein-ligand level, but this goes beyond the scope of the current investigation.

      Author response image 2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study reports the deep evolutionary conservation of a core genetic program regulating spermatogenesis in flies, mice, and humans. The data presented are supportive of the main conclusion and generally convincing. This work will be of interest to evolutionary and reproductive biologists.

      The Authors would like to thank the Senior Editor and the two Reviewers for their positive assessment of our work, as well as for the helpful suggestions. Collectively, these suggestions provided insight that was instrumental in shaping the final version of the manuscript (see below for our point-by-point comments). The Authors believe that the refinements introduced to the final document clearly translate into an improved version of our work. Hence, we would like to thank all those involved in the peer review process for their encouraging words and constructive criticism.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      By combining an analysis of the evolutionary age of the genes expressed in male germ cells, a study of genes associated with spermatocyte protein-protein interaction networks and functional experiments in Drosophila, Brattig-Correia and colleagues provide evidence for an ancient origin of the genetic program underlying metazoan spermatogenesis. This leads to identifying a relatively small core set of functional interactions between deeply conserved gene expression regulators, whose impairment is then shown to be associated with cases of human male infertility.

      Strengths: 

      In my opinion, the work is important for three different reasons. First, it shows that, even though reproductive genes can evolve rapidly and male germ cells display a significant level of transcriptional noise, it is still possible to obtain convincing evidence that a conserved core of functionally interacting genes lies at the basis of the male germ transcriptome. Second, it reports an experimental strategy that could also be applied to gene networks involved in different biological problems. Third, the authors make a compelling case that, due to its effects on human spermatogenesis, disruption of the male germ cell orthoBackbone can be exploited to identify new genetic causes of infertility.

      We thank the Reviewer for their positive assessment. Indeed, it was our main objective to convincingly demonstrate these three points.

      Weaknesses: 

      The main strength of the general approach followed by the authors is, inevitably, also a weakness. This is because a study rooted in comparative biology is unlikely to identify newly emerged genes that may adopt key roles in processes such as species-specific gamete recognition. Additionally, using a TPM >1 threshold for protein-coding transcripts may exclude genes, such as those encoding proteins required for gamete fusion, which are thought to be expressed at a very low level. Although these considerations raise the possibility that the chosen approach may miss information that, depending on the species, could be potentially highly functionally important, this by no means reduces its value in identifying genes belonging to the conserved genetic program of spermatogenesis.

      The Authors acknowledge the points raised by the Reviewer as inevitable trade-offs of the focus of our study (to uncover the deeply conserved genetic basis of spermatogenesis). Certainly, our pipeline could, in the future, be adapted to look for newly emerged genes or to employ different minimum expression cut-offs. To this end, we made all computational data and custom scripts easily available to the community. We would, nevertheless, kindly emphasize the challenge associated with the use of less restrictive TPM cut-offs, given the substantial level of transcriptional noise associated with this cell type. An abridged version of this discussion can be found in lines 512-515 of the manuscript.

      Reviewer #2 (Public Review):

      Summary: 

      This is a tour de force study that aims to understand the genetic basis of male germ cell development across three animal species (human, mouse, and flies) by performing a genetic program conservation analysis (using phylostratigraphy and network science) with a special emphasis on genes that peak or decline during mitosis-to-meiosis. This analysis, in agreement with previous findings, reveals that several genes active during and before meiosis are deeply conserved across species, suggesting ancient regulatory mechanisms. To identify critical genes in germ cell development, the investigators integrated clinical genetics data, performing gene knockdown and knockout experiments in both mice and flies. Specifically, over 900 conserved genes were investigated in flies, with three of these genes further studied in mice. Of the 900 genes in flies, ~250 RNAi knockdowns had fertility phenotypes. The fertility phenotypes for the fly data can be viewed using the following browser link:https://pages.igc.pt/meionav. The scope of target gene validation is impressive. Below are a few minor comments.

      We thank the Reviewer for their positive appraisal of our work.

      (1) In Supplemental Figure 2, it is notable that enterocyte transcriptomes are predominantly composed of younger genes, contrasting with the genetic age profile observed in brain and muscle cells. This difference is an intriguing observation and it would be curious to hear the author's comments.

      Indeed, this is an intriguing observation for which we can only provide a speculative answer. Enterocytes are specialized to absorb nutrients, hence their genetic program is finely tuned to maximize uptake under specific dietary conditions. In this regard, we can posit that variations in nutrient preference/availability in the course of each species’ evolutionary history (associated with habitat, environmental and/or behavioral changes) may have exerted a selective pressure for the emergence of new genes that could provide enterocytes with more efficient uptake capabilities under new circumstances. The application of evolutionary thinking to the rapidly expanding field of nutrigenomics could shed light on this possibility.

      (2) Regarding the document, the figures provided only include supplemental data; none of the main text figures are in the full PDF. 

      We thank the Reviewer for this helpful comment. We will ensure that the three main figures are correctly formatted in the final version of the manuscript.

      (3) Lastly, it would be great to section and stain mouse testis to classify the different stages of arrest during meiosis for each of the mouse mutants in order to compare more precisely to flies.

      We agree with the Reviewer that adding more mouse data would further improve what can already be considered an extensive body of experimental work. Given the costs associated with the generation of such data (in terms of resources and otherwise), the Authors believe such a study would be best suited to a follow-up manuscript.

      This paper serves as a vital resource, emphasizing that only through the analysis of hundreds of genes can we prioritize essential genes for germ cell development. its remarkable that about 60% of conserved genes have no apparent phenotype during germ cell development.

      Once again, we thank the Reviewer for their positive assessment of our work. Clarifying the degree of functional redundancy in an essential biological process such as male gametogenesis represents an exciting (and experimentally complex) future challenge.

      Strengths:

      The high-throughput screening was conducted on a conserved network of 920 genes expressed during the mitosis-to-meiosis transition. Approximately 250 of these genes were associated with fertility phenotypes. Notably, mutations in 5 of the 250 genes have been identified in human male infertility patients. Furthermore, 3 of these genes were modeled in mice, where they were also linked to infertility.

      This study establishes a crucial groundwork for future investigations into germ cell development genes, aiming to delineate their essential roles and functions.

      The Authors thank the Reviewer for emphasizing the potential usefulness of our results to the community, as that was one of the main motivations behind this project.

      Weaknesses: 

      The fertility phenotyping in this study is limited, yet dissecting the mechanistic roles of these proteins falls beyond its scope. Nevertheless, this work serves as an invaluable resource for further exploration of specific genes of interest.

      Please see the previous point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Although the manuscript already includes a significant amount of data, there are two aspects that the authors may consider exploring: 

      (1) I understand that the choice of species whose gene expression was analyzed in the study was largely influenced by the quality of the corresponding genome annotations. However, since in evolutionary terms humans and mice are much closer to each other than Drosophila (as also shown in Figure 1c and Supplementary Figure 1), I found the statement "three evolutionarily distant gonochoric species" partially questionable. Have the authors considered adding an additional established animal model, such as for example zebrafish, to provide further coverage of the evolutionary space? Or, alternatively, could a posteriori analysis of the transcriptome of such an additional species be used to cross-validate their findings? The authors touch upon this point in the Discussion, but I wonder if they actually tried something in this direction, or simply decided that the currently available expression data from other organisms was too poor to be used for this purpose.

      We thank the Reviewer for bringing up this point, as it echoes one of our main concerns in terms of our approach (as discussed in lines 487-492). Indeed, when we were designing our study, we extensively discussed whether zebrafish and C. elegans datasets should be included, as high-quality expression and phenotypical data were available for both species. We ended up not including them for one main reason: the sexual system of these species deviates from that of humans, mice and fruit flies (all gonochoric species). More specifically, C. elegans are hermaphrodites and although zebrafish is a gonochoric species at the adult stage, they start their lifecycle as juvenile hermaphrodites (they first develop juvenile ovaries that later degenerate into a testis in males). Since it is largely unknown to what extent the transcriptome of male germ cells from these species deviates from the gonochoric program (by retaining oogenesis-related characteristics, for example), we decided to avoid possible confounding effects by excluding the two species. Undoubtedly, as more transcriptomic data from non-model organisms become available, these (and other) questions can be extensively revisited as our pipeline was designed to easily accommodate new data.

      (2) Although the use of the STRING database is a sensible choice given the general purpose of this work, in my experience the reliability of its individual interactions can vary significantly. I wonder if the authors have considered exploiting AlphaFold-Multimer as a parallel approach to estimate what proportion of the 79 functional interactions that they identified may reflect direct protein-protein contacts.

      We thank the Reviewer for this question and suggestion, as we were also concerned about STRING's reliability for individual interactions. For that
reason, we only utilized protein-protein interactions with a STRING combined confidence score ≥0.5
(corresponding to the estimated likelihood of a given association being
true), as described in more detail in the "Protein-protein interaction
(PPI) network construction" subsection. In addition, to make sure we were not biasing results towards conserved genes (which could arguably be overrepresented in STRING) we pursued a random rewiring test of degree
centrality and page rank, as detailed in section "Deeply conserved genes
are central components of the male germ cell transcriptome". We very much like the suggestion of using AlphaFold-Multimer to estimate the proportion of
direct protein-protein contacts for the 79 core interactions, but given
the already quite complex analytical pipeline of the present work, we will leave such analysis for a follow-up study. The final version of the manuscript now contains a reference to such an approach (lines 499-502).

      Finally, probably because my primary focus is not on gene regulation, I must say that I found the manuscript somewhat heavy to read. The integration of various data types and analyses, while enriching, also complicates the ability to clearly recall the main conclusions of each result section by the time one reaches the summary at the beginning of the Discussion. Given the relative brevity of the latter, expanding it to both reiterate what these conclusions are and illustrate how all the components converge to support the central message of the study would, in my opinion, benefit a general readership.  

      We thank the Reviewer for their fresh perspective on our document and for this most welcome suggestion. The final version of the manuscript now includes a longer discussion, containing an initial paragraph (lines 467-479) that summarizes our main findings and how they converge into a coherent body of work.

      Additionally, on a minor note, I suggest that the concept of phylostratigraphy be briefly explained when first mentioned in the Introduction, rather than later in the manuscript. This early clarification would aid comprehension for readers unfamiliar with the term. 

      To safeguard the flow of the manuscript, we have slightly tweaked the introduction section to avoid the use of highly specific terminology (such as phylostratigraphy) this early in the text. We replaced it with “comparison of genome sequences” (line 85). Phylostratigraphy is later explained in full detail in the corresponding section of the manuscript. We thank the Reviewer for this helpful suggestion.

      Reviewer #2 (Recommendations For The Authors): 

      Major concern - the absence of main text figures.

      We thank the Reviewer for this helpful comment. We will ensure that the three main figures are correctly formatted in the final version of the manuscript.

      Typos throughout - this will need your attention. 

      The Authors thank the Reviewer for the thorough and attentive assessment of our work. We have carefully revised the text to ensure a pleasant reading experience free of typographical errors.

    1. Author response:

      We want to thank the reviewers for their constructive feedback.

      General

      The recall values of our method range between 78.6% for all urine cases to 83.3% for feces (and not between 70-80%, as stated by reviewer #2), with a mean precision of 85.6%. This is rather similar to other machine learning-based methods commonly used for the analysis of complicated behavioral readouts. For example, in the paper presenting DeepSqueak for analysis of mouse ultrasonic vocalizations (Coffey et al. DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacol. 44, 859–868 (2019). https://doi.org/10.1038/s41386-018-0303-6), the recall values reported for both DeepSqueak, Mupet and Ultravox (Fig. 2c, f) are very similar to our method.

      We have analyzed and reported all the types of errors made by our methods, which are mostly technical. For example, depositions that overlap the mouse blob for too long till getting cold will be associated with the mouse and therefore will not be detected (“miss” events). These technical errors are not supposed to create a bias for a specific biological condition and, hence, shouldn’t interfere with the use of our method. A video showing all of the mistakes made by our algorithm on the test set was submitted (Figure 2-video 1).

      Below we will to relate to specific points and describe our plan to revise the manuscript accordingly.

      Detection accuracy

      a. It should be noted that when large urine spots are considered, our algorithm got 100% correct classification (Figure 2, supplement 1, panel b). However, small urine deposits are very similar to feces in their appearance in the thermal picture. In fact,  if the feces are not shifted, discrimination can be quite challenging even for human annotators. To demonstrate the accuracy of the proposed method relative to human annotators, we plan to compare its results with the accuracy of a second human annotator.

      b. As part of the revision, we plan to test general machine learning-based object detectors such as faster-RCNN or YOLO (as suggested by Reviewer 2) and compare them with our method.

      c. To check if our method may introduce bias to the results, we plan to check if the errors are distributed evenly across time, space, and genders.

      Design choices

      (A) The preliminary detection algorithm has several significant parameters. These are:

      a. Minimal temperature rise for detection: 1.1°C rise during 5 sec.

      b. Size limits of the detection: 2 - 900 pixels.

      c. Minimal cooldown during 40 sec: 1.1°C and at least half the rise.

      d. Minimal time between detections in the same location: 30 sec.

      We chose to use low thresholds for the preliminary detection to allow detection of very small urinations and to minimize the number of “miss” events, relying on the classifier to robustly reject false alarms. Indeed, we achieved a low rate of miss events: 5 miss events for the entire test set (1 miss event per ~90 minutes of video). We attribute these 5 “miss” events to partial occlusion of the detection by the mouse.

      To adjust the preliminary detection parameters to a new environment, one will need to calibrate these parameters in their own setup. Mainly, the size of the detection depends on the resolution of the video, and the cooldown rate might be affected by the material of the floor, as well as the room temperature.

      We plan to explore the robustness of these parameters in our setup and report the influence on the accuracy of the preliminary algorithm.

      (B) We chose to feed the classifier with 71 seconds of videos (11 seconds before the event and 60 seconds after it) as we wanted the classifier to be able to capture the moment of the deposition, the cooldown process, as well as urine smearing or feces shifting which might give an additional clue for the classification. In the revised paper we plan to report accuracy when using a shorter video for classification.

      Generability

      a. In the revised version, we plan to report the accuracy of the method used on a different strain of mice (C57), with a different arena color (white arena instead of black).

      Statistics

      a. In the revised paper, we will explain why we chose each time window for analysis. Also, we will report statistics for different time windows, as suggested by Reviewer 3.

      b. Unlike reviewer #2, we don’t think that the small difference in recall rate between urine and feces (78.6% vs. 83.3%, respectively) creates a bias between them. Moreover, we don’t compare the urine rate to the feces rate.

      c. In the revised manuscript we will explicitly report the precision scores, although they also appear in our manuscript in Fig. 2- Supplement 1b.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      • Although ROC AUC is a widely used metric. Other metrics such as precision, recall, sensitivity, and specificity are not reported in this work. The last two metrics would help readers understand the model’s potential implications in the context of clinical research.

      In response to this comment and related ones by Reviewer 2, we have overhauled how we evaluate our models. In the revised version, we have removed Micro ROC-AUC, as this evaluation metric is hard to interpret in the recommender system setting. Instead, the updated version fully focuses on two metrics: ROC-AUC and Precision at 1 of the negative class, both computed per spectrum and then averaged (equivalent to the instance-wise metrics in the previous version of the manuscript). We believe these metrics best reflect the use-case of AMR recommenders. In addition, we have kept (drug-)macro ROC-AUC as a complementary evaluation metric. As the ROC-AUC can be decomposed into sensitivity and specificity (at different prediction probability thresholds), we have added a ROC curve where sensitivity and specificity are indicated in Figure 8 (Appendices).

      • The authors did not hypothesize or describe in any way what an acceptable performance of their recommender system should be in order to be adopted by clinicians.

      In Section 4.3, we have extended our experiments to include a baseline that represents a “simulated expert”. In short, given a species, an expert can already make some best guesses as to what drugs will be effective or not. To simulate this, we count resistance frequencies per species and per drug in the training set, and use this as predictions of a “simulated expert”.

      We now mention in our manuscript that any performance above this level results in a real-world information gain for clinical diagnostic labs.

      • Related to the previous comment, this work would strongly benefit from the inclusion of 1-2 real-life applications of their method that could showcase the benefits of their strategy for designing antibiotic treatment in a clinical setting.

      While we think this would be valuable to try out, we are an in silico research lab, and the study we propose is an initial proof-of-concept focusing on the methodology. Because of this, we feel a real-life application of the model is out-of-scope for the present study.

      • The authors do not offer information about the model features associated with resistance. This information may offer insights about mechanisms of antimicrobial resistance and how conserved they are across species.

      In general, MALDI-TOF mass spectra are somewhat hard to interpret. Because of a limited body of work analyzing resistance mechanisms with MALDI-TOF MS, it is hard to link peaks back to specific pathways. For this reason, we have chosen to forego such an analysis. After all, as far as we know, typical MALDI-TOF MS manufacturers’ software for bacterial identification also does not provide interpretability results or insights into peaks, but merely gives an identification and confidence score.

      However, we do feel that the whole topic revolving around “the degree of biological insight a data modality might give versus actual performance and usability” merits further discussion. We have ultimately decided not to include a segment in our discussion section as it is hard to discuss this matter concisely.

      • Comparison of AUC values across models lacks information regarding statistical significance. Without this information it is hard for a reader to figure out which differences are marginal and which ones are meaningful (for example, it is unclear if a difference in average AUC of 0.02 is significant). This applied to Figure 2, Figure 3, and Table 2 (and the associated supplementary figures).

      To make trends a bit more clear and easier to discern, in our revised manuscript, all models are run for 5 replicates (as opposed to 3 in the previous version).

      There is an ongoing debate in the ML community whether statistical tests are useful for comparing machine learning models. A simple argument against them is that model runs are typically not independent from each other, as they are all trained on the same data. The assumptions of traditional statistical tests are therefore violated (t-test, Wilcoxon test, etc.). With such tests statistical significance of the smallest differences can simply be achieved by increasing the number of replicates (i.e. training the same models more times).

      More complicated but more appropriate statistical tests also exist, such as the 5x2 cross-validated t-test of Dietterich: “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural computation 1998. However, these tests are typically not considered in deep learning, because only 10% of the data can be used for training, which is practically not desirable. The Friedman test of Demšar "On the appropriateness of statistical tests in machine learning." Workshop on Evaluation Methods for Machine Learning in conjunction with ICML. 2008., in combination with posthoc pairwise tests, is still frequently used in machine learning, but that test is only applicable in studies where many datasets are tested.

      For those reasons, most deep learning papers that only analyse a few datasets typically do not consider any statistical tests. For the same reasons, we are also not convinced of the added value of statistical tests in our study.

      • One key claim of this work was that their single recommender system outperformed specialist (single species-antibiotic) models. However, in its current status, it is not possible to determine that in fact that is the case (see comment above). Moreover, comparisons to species-level models (that combine all data and antibiotic susceptibility profiles for a given species) would help to illustrate the putative advantages of the dual branch neural network model over species-based models. This analysis will also inform the species (and perhaps datasets) for which specialist models would be useful to consider.

      We thank the reviewer for this excellent suggestion. In our new manuscript, we have dedicated an entire section of experiments to testing such species-specific recommender models (Section 4.2). We find that species-specific recommender systems generally outperform the models trained globally across all species. As a result, our manuscript has been majorly reworked.

      • Taking into account that the clustering of spectra embeddings seemed to be species-driven (Figure 4), one may hypothesize that there is limited transfer of information between species, and therefore the neural network model may be working as an ensemble of species models. Thus, this work would deeply benefit from a comparison between the authors' general model and an ensemble model in which the species is first identified and then the relevant species recommender is applied. If authors had identified cases to illustrate how data from one species positively influence the results for another species, they should include some of those examples.

      See the answer to the remark above.

      • The authors should check that all abbreviations are properly introduced in the text so readers understand exactly what they mean. For example, the Prec@1 metric is a little confusing.

      See the answer to a remark above for how we have overhauled our evaluation metrics in the revised version. In addition, in the revised version, we have bundled our explanations on evaluation metrics together in Section 3.2. We feel that having these explanations in a separate section will improve overall comprehensibility of the manuscript.

      • The authors should include information about statistical significance in figures and tables that compare performance across models.

      See answer above.

      • An extra panel showing species labels would help readers understand Figure 11.

      We have tried to play around with including species labels in these plots, but could not make it work without overcrowding the figure. Instead, we have added a reminder in the caption that readers should refer back to an earlier figure for species labels.

      • The authors initially stated that molecular structure information is not informative. However, in a second analysis, the authors stated that molecular structures are useful for less common drugs. Please explain in more detail with specific examples what you mean.

      In the previous version of our manuscript, we found that one-hot embedding-based models were superior to structure-based drug embedders for general performance. The latter however, delivered better transfer learning performance.

      In our new experiments however, we perform early stopping on “spectrum-macro” ROC-AUC (as opposed to micro ROC-AUC in the previous version). As a consequence, our results are different. In the new version of our manuscript, Morgan Fingerprints-based drug embedders generally outperform others both “in general” and for transfer learning. Hence, our previously conflicting statements are not applicable to our new results.

      • The authors may want to consider adding a few sentences that summarize the 'Related work' section into the introduction, and converting the 'Related work' section into an appendix.

      While we acknowledge that such a section is uncommon in biology, in machine learning research, a “related work” section is very common. As this research lies on the intersection of the two, we have decided to keep the section as such.

      Reviewer 2:

      • Are the specialist models re-trained on the whole set of spectra? It was shown by Weis et al. that pooling spectra from different species hinders performance. It would then be better to compare directly to the models developed by Weis et al, using their splitting logic since it could be that the decay in performance from specialists comes from the pooling. See the section "Species-stratified learning yields superior predictions" in https://doi.org/10.1038/s41591-021-01619-9.

      We train our “specialist” (or now-called “species-drug classifiers”) just as described in Weis et al.: All labels for a drug are taken, and then subsetted for a single species. We have clarified this a bit better in our new manuscript. The text now reads:

      “Previous studies have studied AMR prediction in specific species-drug combinations. For this reason, it is useful to compare how the dual-branch setup weighs up against training separate models for separate species and drugs. In Weis et al. (2020b), for example, binary AMR classifiers are trained for the following three combinations: (1) E. coli with Ceftriaxone, (2) K. pneumoniae with Ceftriaxone, and (3) S. aureus with Oxacillin. Here, such "species-drug-specific classifiers" are trained for the 200 most-common combinations of species and drugs in the training dataset.

      • Going back to Weis et al. a high variance in performance between species/drug pairs was observed. The metrics in Table 2 do not offer any measurement of variance or statistical testing. Indeed, some values are quite close e.g. Macro AUROC of Specialist MLP-XL vs One-hot M.

      See our answer to a remark of Reviewer 1 for our viewpoint on statistical significance testing in machine learning.

      • Since this is a recommendation task, why were no recommendation system metrics used, e.g. mAP@K, mRR, and so (apart from precision@1 for the negative class)? Additionally, since there is a high label imbalance in this task (~80% negatives) a simple model would achieve a very high precision@1.

      See the answer to a remark above for how we have overhauled our evaluation metrics in the revised version. In addition, in choosing our metrics, we wanted metrics that are both (1) appropriate (i.e. recommender system metrics), but also (2) easy to interpret for clinicians. For this reason, we have not included metrics such as mAP@K or mRR. We feel that “spectrum-macro” ROC-AUC and precision@1 cover a sufficiently broad evaluation set of metrics but are easy enough to interpret.

      • A highly similar approach was recently published (https://doi.org/10.1093/bioinformatics/btad717). Since it is quite close to the publication date of this paper, it could be discussed as concurrent work.

      We thank the reviewer for bringing our attention to this study. We have added a paragraph in our revised version discussing this paper as concurrent work.

      • It is difficult to observe a general trend from Figure 2. A statistical test would be advised here.

      See our answer to a remark of Reviewer 1 for our viewpoint on statistical significance testing in machine learning.

      • Figure 5. UMAPs generally don't lead to robust quantitative conclusions. However, the analysis of the embedding space is indeed interesting. Here I would recommend some quantitative measures directly using embedding distances to accompany the UMAP visualizations. E.g. clustering coefficients, distribution of pairwise distances, etc.

      In accordance with this recommendation, we have computed many statistics on the MALDI-TOF spectra embedding spaces. However, we could not come up with any statistic that illuminated us more than the visualization itself. For this reason, we have kept this section as is, and let the figure speak for itself.

      • Weis et al. also perform a transfer learning analysis. How does the transfer learning capacity of the proposed models differ from those in Weis et al?

      Weis et al. perform experiments towards “transferability”, not actual transfer learning. In essence, they use a model trained on data from one diagnostic lab towards prediction on data from another. However, they do not conduct experiments to learn how much data such a pre-trained classifier needs to fine-tune it for adequate performance on the new diagnostic lab, as we do. The end of Section 4.4 discusses how our proposed models specifically shine in transfer learning. The paragraph reads:

      “Lowering the amount of data required is paramount to expedite the uptake of AMR models in clinical diagnostics. The transfer learning qualities of dual-branch models may be ascribed to multiple properties. First of all, since different hospitals use much of the same drugs, transferred drug embedders allow for expressively representing drugs out of the box. Secondly, owing to multi-task learning, even with a limited number of spectra, a considerable fine-tuning dataset may be obtained, as all available data is "thrown on one pile".”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      In the revision the authors addressed all the points from this reviewer and most from other reviewers. The method is now described practically and in detail. The only thing this reviewer still misses is number of subtomograms for each structure. How many subtomograms did the authors extract by Dynamo from how many rootlets? How many out of them were valid in K-mean classification and used for sub-averages? Was the subaverage used for training by TomoSeg or each subtomograms belonging to the class? By clarifying it, this work will be referred by those who would take the same approach for other biological structures. 

      We now added the particle numbers of all structures to the corresponding text, figure legends and methods and elaborate on this below. We also clarify how we trained the TomoSeg network.

      Particle numbers:

      We extracted 591,453 subtomograms from 14 tomograms. This initial set was rigorously cleaned with Zcleaning, reducing it to 358,863 particles. Further cross-correlation and cluster cleaning yielded a final set of 180,252 particles. 

      This refined set was used for the structures presented in Figures 3E, F and S5A, B, as well as for the classification shown in Figure S5C. Of the classified particles, 34,490 particles contributed to classaverage 5 in Figure 3G and S5D, E. The detailed particle distribution of this classification is added as a supplementary table: 

      We further clarified the numbers in the results, method, and supplementary material section:

      Results:

      Page 7: “Figure 3. … (E) The initial average after alignment of 180,252 particles with a wide spherical alignment mask. (F) The initial average of particles aligned with a narrower cylindrical mask. (G) A class average of 34,490 particles, aligned and classified with a narrow mask.”

      Page 7/8: “We manually defined the D1-bands as surfaces in Dynamo (Castaño-Díez et al, 2017) and then approximated the number of filaments per surface area. We extracted 591,453 subtomograms from 14 tomograms, approximately four times as many subtomograms as the expected number of filaments. This initial set was rigorously cleaned to discard particles that did not have a filament in their center or had distorted striations, reducing it to 358,863 particles. Further cross-correlation and cluster cleaning yielded a final set of 180,252 particles.”

      Page 8: “We directly unbinned the data to a pixel size of 5.55 Å/pixel and used the rigorously cleaned set of 180,252 particles.”

      Page 8: “The resulting class averages contained a twist along the filament length in classes 2, 3 and 4 and most prominently in class 5. These four classes contain 72.29% of the particles, highlighting the prevalence of the twist-feature (Fig S5C, Table S2). Class 5 contained 19.27% of the data, i.e. 34,490 particles, and revealed the twist is formed by a filament of 2 nm thick by 5 nm wide with a helical groove along its length (Fig 3G).”

      Methods: 

      Page 13: “Surface triangulation was set to result in 591,453 extraction coordinates approximately 4 times the number of expected filaments.”

      Page 13: “Particles with no filament in their center, or particles that originated from regions in the rootlet with distorted striations (at the edge of a grid hole) were discarded, resulting in a particle set of 358,863 particles. Cluster- and careful per-tomogram cross-correlation cleaning were applied to remove particle duplicates, remaining particles with no filaments, and particles with disordered D-bands. This resulted in a final cleaned particle dataset of 180,252 particles.”

      Page 13: “For the final subtomogram class-average that contained the twist, the cleaned particle dataset motl with 180,252 particles was converted to a STAR file compatible with RELION 4.0 Alpha (Zivanov et al, 2022).”

      Supplementary material: 

      Page 17: “Table S1. Particle distribution of RELION 4.0 Alpha classification with alignment.”

      Page 22: “Figure S5: (C) Class averages of a classification with alignment of particles from Fig S5A. Their particle distribution is shown in Table S2.”

      For the initial classification, to identify a homogeneous subset, we used the original set of 591,453 picked particles (Fig S5A). The class distribution for this set is added as a supplementary table.

      We further clarified this in the results, methods and supplementary material:

      Results:

      Page 8: “To ask if there were any recurring arrangements of neighboring filaments in the data that could allow us to average a homogeneous subset, we resorted to classification of the original set of 591,453 particles (Fig S5A, Table S1).”

      Methods:

      Page 13: “Prior to classification in subTOM, alignments with limited X/Y/Z shifts and increasingly finer in-plane rotations were performed on the original dataset with 591,453 particles.”

      Supplementary material:

      Page 17: “Table S2. Particle distribution of subTOM classification for particle heterogeneity.”

      Page 22: “Figure S5: … The surfaces of a cross-section through the filament classes are shown in orange. The particle distribution is provided in Table S1. (B) …”

      TomoSeg network training

      The subtomograms and the class averages presented at the end of the manuscript were not used as input for training the TomoSeg network. TomoSeg training requires positive and negative sets of segmented 2D regions of interest within tomogram slices. These areas were selected and segmented within the Eman2 TomoSeg GUI, iteratively increasing the size of the training sets until satisfactory performance was achieved. 

      We have clarified the TomoSeg training process in the methods section to avoid confusion:

      Methods: 

      Page 13: “The tomograms were then preprocessed in EMAN2.2 for training of the TomoSeg CNN (Chen et al, 2017). Here, the features (filaments, D-bands, A-bands, gold fiducials, actin, membranes, membrane-associated densities and ice contaminations) were individually trained for each tomogram. This involved manually tracing a training set of 10-20 positive and 100-150 negative boxed areas per feature. We iteratively expanded and curated the training set until the segmentations were accurate, as recommended in the software manuals. Segmented maps were allowed to compete for the assignment of pixels in the tomograms, cleaned up in Amira (Thermo Fisher Scientific) and converted to object files.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) One issue that needs to be considered is the nomenclature of the enhancer. The authors have presented data to show this enhancer controls the expression of Ctnnb1 in the stomach, intestine, and colon tissues. However, the name proposed by the authors, ieCtnnb1 (intestinal enhancer of Ctnnb1), doesn't represent its functions. It might be more appropriate to call it a different name, such as gieCtnnb1 (gastrointestinal enhancer of Ctnnb1).

      We thank the reviewer for the insightful suggestion and agree that wholemount reporter assays indicated ieCtnnb1 and ieCTNNB1 indeed display activity in the stomach. However, in current study, we focused on the cellular distribution and the function in intestinal epithelia. After careful consideration, we reasoned that the current designation, ieCtnnb1, would be more appropriately represent its expression pattern and functions based on provided evidence. We hope the reviewer could understand our reasoning.  

      (2) The writing of this manuscript can be improved in a few places. 

      a) The definitions or full names for the abbreviations of some terms, e.g., Ctnnb1, ieCtnnb1, in both abstract and main text, are needed when they first appear. Specifically, Line 108 should be moved to Lines 26 and 95. Lines 125126 are redundant. ieCtnnb1 in Line 130 needs to be defined.

      We appreciate the suggestion. In the revision, we have included the definition of Ctnnb1 and the full name of ieCtnnb1 when they first appear in the abstract and the main text. Lines 125-126 were deleted in the revision.

      b) Line 192-194, the description of the result needs to be rewritten to reflect

      the higher expression of LacZ transcript in eGFP+ cells. 

      We would like to emphasize that the key point of this part is that the enhancer activity of ieCtnnb1 is present in both Lgr5-eGFP+ and Lgr5-eGFP- cells. This was validated by single-cell sequencing, which revealed the presence of LacZ transcripts in the Paneth cells. Moreover, we could not confidently conclude that eGFP+ cells have higher expression levels of LacZ, as these measurements were obtained from separate, semi-quantitative RTqPCR experiments.

      c)  More details are needed for how the data using human tumor samples were generated and how they were analyzed. 

      We thank the suggestion. In the revision, we have provided additional details regarding the data and subsequent analyses of human CRC samples as follows: “We previously conducted paired analyses of chromatin immunoprecipitation sequencing (ChIP-seq) for H3K27ac and H3K4me3, alongside RNA-seq on 68 CRC samples and their adjacent normal (native) tissue (Li et al., 2021).  In the current study, we performed analyses for the enrichment of H3K27ac and H3K4me3 at ieCTNNB1 and CTNNB1 promoter regions, as well as the expression levels of CTNNB1, followed by combined analyses (Figure. 5A, Figure 5 - figure supplement 1).”

      d) The genomic structures from multiple species are presented at the bottom of Figure 1a. However, the description and explanation are lacking in both the main text and the figure legend.

      We apologize for not presenting clearly. We have added related description in the legend of Figure 1A as “The sequence conservation of the indicated species is shown at the bottom as vertical lines”. We also added an explanation in lines 162-163 of the main text: “Notably, unlike neCtnnb1, the primary sequence of ieCtnnb1 is not conserved among vertebrates (Figure 1A, bottom)”.

      Reviewer #2:

      (1) One of the main issues emerging during reading concerns the interpretation of the consequence of deleting the ieCtnnb1 enhancer. The authors write on line 235 that the deletion of ieCtnnb1 "undermined" Wnt signaling in the intestinal epithelium. This feels too strong, as the status of the pathway is only mildly affected, testified by the observation that mice with homozygous deletion on ieCtnnb1 are alive and well. The enhancer likely "only" drives higher Ctnnb1 expression, and it does not affect Wnt signaling by other mechanisms. The reduction of Wnt target gene expression upon its deletion is easily interpreted as the consequence of reduced β-catenin. Also the title, in my opinion, allows this ambiguity to stick in readers' minds. In other words, the authors present no evidence that the ieCtnnb1 enhancer controls Wnt signaling dosage via any mechanism other than its upregulation of Ctnnb1 expression in the intestinal epithelium. Reduced Ctnnb1, in turn, could explain the observed reduction of Wnt signaling output and the interesting downstream physiological consequences. Unless the authors think otherwise, I suggest they clarify this throughout the text, including necessary modifications to the title.

      We greatly appreciate the reviewer’s important comments and suggestion. We agree that ieCtnnb1’s direct effect on the canonical Wnt signaling is to regulate the transcription of Ctnnb1 in the intestinal epithelia. Therefore, knockout of ieCtnnb1 leads to compromised expression of Ctnnb1 and, consequently, reduced Wnt signaling.  The term “undermined” is indeed too strong and has been revised to “compromised” in the revision (line 237). Similar revisions have been made throughout the manuscript. Particularly, the title was changed into “A Ctnnb1 enhancer transcriptionally regulates Wnt signaling dosage to balance homeostasis and tumorigenesis of intestinal epithelia”. However, as we state in the following point, decreased levels of β-catenin on ieCtnnb1 loss could lead to indirect effect, including the reduced expression of Bambi, which might cause a more significant decrease of nuclear β-catenin.

      (2) It is unclear how the reduction of Ctnnb1 mRNA caused by deletion of ieCtnnb1 in mice could lead to a preferential decrease of nuclear more than membranous β-catenin (Fig. 1K and L). This might reflect a general cell autonomous reduction in Wnt signaling activation; yet, it is not clear how this could occur. Do the authors have any explanations for this?

      It's a very important question. We observed that in inCtnnb1 knockout epithelia, the expression of Bambi (BMP and activin membrane-bound inhibitor) was significantly downregulated. Since BAMBI has been reported to stabilize β-catenin and facilitate its nuclear translocation, it is likely that the reduced level of BAMBI resulting from the loss of ieCtnnb1 further decreased nuclear βcatenin. In the revision, the expression change of Bambi has been added in Figure 1M. Moreover, the related content was extensively discussed with proper citations: “We noticed that after knocking out ieCtnnb1, the level of βcatenin in the nuclei of small intestinal crypt cells of Ctnnb1Δi.enh mice decreased more significantly compared to that in the cytoplasm (49.5% vs. 29.8%). Although the loss of ieCtnnb1 should not directly lead to reduced nuclear translocation of β-catenin, RNA-seq results showed that the loss of ieCtnnb1 causes a reduction in the expression of Bambi (BMP and activin membranebound inhibitor), a target gene in the canonical Wnt signaling pathway (Figure 1M). BAMBI promotes the binding of Frizzled to Dishevelled, thereby stabilizing β-catenin and facilitating its nuclear translocation (Lin et al., 2008; Liu et al., 2014; Mai et al., 2014; Zhang et al., 2015). Thus, it is likely that the decreased level of BAMBI resulting from the loss of ieCtnnb1 further reduced nuclear βcatenin”. 

      (3) In Figure 1 K-L the authors show β-catenin protein level. Why not show its mRNA?

      The mRNA levels of Ctnnb1 in small and large intestinal crypts were shown in Figure 1I and 1J, demonstrating reduced expression of Ctnnb1 upon ieCtnnb1 knockout. We hope the reviewer understands that it is unnecessary to measure the nuclear and cytosolic levels of Ctnnb1 transcripts, as the total mRNA level generally reflects the protein level. 

      (4) Concerning the GSEA of Figure 1 that includes the Wnt pathway components: a) it would be interesting to see which components and to what extent is their expression affected; b) why should the expression of Wnt components that are not Wnt target genes be affected in the first place? It is odd to see this described uncritically and used to support the idea of downregulated Wnt signaling.

      We appreciate the suggestion and apologize for any lack of clarity. The affected components of the Wnt signaling pathway and the extent of their changes are summarized in Figure 1 – figure supplement 3. Additionally, we have provided explanations for their downregulation. For instance, the reduced expression of Wnt3 and Wnt2b ligands in ieCtnnb1-KO crypts may be attributed to the decreased numbers of Paneth cells.  

      (5) In lines 251-252 the authors refer to "certain technical issues" in the isolation of cell type from the intestinal epithelium. Why this part should be obscure in the characterization of a tissue for which there are several established protocols of isolation and analysis is not clear. I would rather describe what these issues have been and how they protocol of isolation and analysis is not clear. I would rather describe what these issues have been and how they might have affected the data presented.

      We thank the reviewer for pointing this out. The single-cell preparation and sequencing of small intestinal cryptal epithelial cells were carried out largely according to reported protocols with slight modification. The enrichment of live crypt epithelial cells (EpCAM+DAPI-) by flow cytometry and cell filtering after single-cell sequencing were appropriate (Figure 2 – figure supplement 1A1C). We would like to emphasize a few points: 1) Unlike other protocols, we did not exclude immune cells, erythrocytes, or endothelial cells using negative sorting antibodies. 2) When defining cell populations, we focused exclusively on epithelial cell types and did not consider other cell types, such as immune cells. As a result, the so-called “undefined” cells include a mixture of nonepithelial cells. Indeed, markers for erythrocytes (AY036118/Erf1, PMID:12894589) and immune cells (Gm42418 and Lars2, PMID:30940803, PMID: 35659337) were the top three enriched genes in the “undefined” cluster (Figure 2 – figure supplement 1D). 3) Nonetheless, the overall findings remain robust, as key observations such as the loss of Paneth cells and reduced cell proliferation were validated through histological studies. This information has been incorporated into the revised manuscript with related references cited (lines 254-259). 

      (6) It is interesting that human SNPs exist that seem to fall within the ieCTNNB1 enhancer and affect the gastrointestinal expression of CTNNB1. Could the author report or investigate whether this SNP is present in human populations that have been considered in large-scale studies for colorectal cancer susceptibility? It seems to me a rather obvious next step of extreme importance to be ignored.

      (7) From Figure 5A a reader could conclude that colorectal tumor cells have a higher expression of CTNNB1 mRNA than in normal epithelium. This is the first time I have seen this observation which somewhat undermines our general understanding of Wnt-induced carcinogenesis exclusively initiated by APC mutations whereby it is β-catenin's protein level, not expression of its mRNA, of crucial importance. I find this to be potentially the most interesting observation of the current study, which could be linked to the activity of the enhancer discovered, and I suggest the authors elaborate more on this and perhaps consider it for future experimental follow-ups.

      We appreciate the comments and suggestions.  We therefore added related content in the revision (lines 470-475): “Importantly, ieCTNNB1 displayed higher enhancer activity in most CRC samples collected in the study. Moreover, the SNP rs15981379 (C>T) within ieCTNNB1 is associated with the expression of CTNNB1 in the GI tract. Future population studies could investigate how the enhancer activity of ieCTNNB1 and this particular SNP are associated with CRC susceptibility and prognosis”.

      (8) I am surprised that the authors, who seem to have dedicated lots of resources to this study, are satisfied by analyzing their ChIP experiments with qPCR rather than sequencing (Figure 6). ChIP-seq would produce a more reliable profile of the HNF4a and CREB1 binding sites on these loci and in other control regions, lending credibility to the whole experiment and binding site identification. Sequencing would also take care of the two following conceptual problems in primer design. 

      First: while the strategy to divide enhancer and promoter in 6 regions to improve the resolution of their finding is commendable, I wonder how the difference in signal reflects primers' efficiency rather than HNF4/CREB1 exact positioning. The possibility of distinguishing between regions 2 and 3, for example, in a ChIP-qPCR experiment, also depends on the average DNA fragment length after sonication, a parameter that is not specified here. 

      Second: what are the primers designed to detect the ieCtnnb1 enhancer amplifying in the yellow-columns samples of Figure 6G? In this sample, the enhancer is deleted, and no amplification should be possible, yet it seems that a value is obtained and set to 1 as a reference value.

      This is indeed a crucial point, and we fully agree with the reviewer that “ChIP-seq would produce a more reliable profile of the HNF4a and CREB1 binding sites on these loci and in other control regions”. However, we believe that our current ChIP-qPCR experiments have adequately addressed the potential concerns raised by the reviewers. (1) We have ensured that the DNA fragment length after sonication falls within the range of 200 bp to 500 bp, with an average length of approximately 300 bp (Author response image 1A). We have stated the point in the revised methods section (line 633). (2) We have randomly inspected 14 out of 26 primer sets used in Figure 6 and its supplemental figure (Author response image 1B-E), confirming that all primer sets demonstrate equal amplification efficiency (ranging from 90% to 110%). This information has also been included in the revised methods section (line 650). (3) Figures 6G and 6H show reduced enrichment of HNF4𝛼 (6G) and p-S133-CREB1 (6H) at the Ctnnb1 promoter in ieCtnnb1 knockout ApcMin/+ tumor tissues. The ChIP-qPCR primers used were positioned at the Ctnnb1 promoter, not at ieCtnnb1, with IgG control enrichment serving as the reference values on the Y-axes. 

      Author response image 1.

      (A) Agarose gel electrophoresis of sonicated DNA. (B-E) Tests of amplification efficiency for primer sets used in ChIP-qPCR.

      (9) The ChIP-qPCR showing preferential binding of pS133-CREB1 in small intestinal crypts and CHT15 cells (line 393) should be shown. 

      The ChIP-qPCR results demonstrating preferential binding of p-S133-

      CREB1 over CREB1 have been added in revised Figure 6C, 6D and Figure 6 – Supplement 1C.

      (10) It is not entirely clear what the blue tracks represent at the bottom of Figures 6C-D and Figure 6 - Figure Supplement 1C-D. The ChIP-seq profiles of both CREB1 and HNF4a shown in Figures 6A and Figure 6 - Figure Supplement 1A do not seem to match. Taking HNF4a, for example from Figure 6 - Figure Supplement 1A it seems to bind on the Ctnnb1 promoter, while in Figure 6 - Figure Supplement 1D the peaks are within the first intron. I realize this might all be a problem with a different scale across figure panels, but I suggest producing a cleared figure.

      We apologize for the confusion. We have revised Figure 6C-6D, Figure 6 - figure supplement 1C-D, and the corresponding legends to enhance clarity. (1) The top panels of Figures 6C and 6D respectively highlight shaded regions of ieCTNNB1 (pink) and the CTNNB1 promoter (grey) in Figure 6A, emphasizing the enrichment of p-S133-CREB1.  (2) The top panels of Figure 6 – figure supplement 1C and 1D respectively highlight shaded regions of ieCtnnb1 (pink) and the Ctnnb1 promoter (grey) in Figure 6A – figure supplement 1A, emphasizing the enrichment of HNF4α. (3) Because Figures 6C-6D and Figure 6 - figure supplement 1C-1D respectively correspond to human and mouse genomes, the positions of peaks and scales differ.  

      (11) In the intro the authors refer to "TCF-4". I suggest they use the more recent unambiguous nomenclature for this family of transcription factors and call it TCF7L2.

      TCF-4 has been changed into TCF7L2 in the revision (line 81)

      (12) In lines 121-122, the authors write "Although numerous putative enhancers...only a fraction of them were functionally annotated". To what study/studies are the authors referring? Please provide references.

      References were added in the revision (line 124)

      (13) In some parts the authors use strong words that should in my opinion be attenuated. Examples are: (i) at line 224, "maintains" would be better substituted with "contribute", as in the absence of ieCtnnb1, Ctnnb1 is still abundantly expressed; (ii) at line 266 "compromised" when the proliferative capacity of CFCs and TACs seems to be only mildly reduced; (iii) at line 286 "disrupts", the genes are simply downregulated.

      We thank these great suggestions. 1) On lines 224-225, the sentence was revised to: “These data suggest that ieCtnnb1 plays a specific role in regulating the transcription of Ctnnb1 in intestinal epithelia”. 2) On line 271, “compromised” were replaced with “mildly reduced”. 3) In ieCtnnb1 knockout epithelial cells of small intestine, genes related to secretory functions were decreased, while genes related to absorptive functions were increased. Therefore, the term 'disrupts' is more appropriate than 'downregulates'. 

      Reviewer #3:

      Line 81, c-Myc should be human MYC (italics) to agree with the other human gene names in this sentence. 

      c-Myc has been changed into MYC in the revision (line 82)

      Line 215, wildtype should be wild-type. 

      “wildtype” has been changed into “wild-type” in the revision (line 215)

      Line 224, Elimination of the enhancer did not abolish expression of Ctnnb1; therefore, it would be better to say that it "helps to maintain Ctnnb1 transcription" 

      The sentence was changed into “These data suggest that ieCtnnb1 plays a specific role in regulating the transcription of Ctnnb1 in intestinal epithelia” in revision (lines 224-225)

      Line 228, perhaps "to activate transcription" is meant. 

      “active” has been changed into “activate” in the revision (line 228)

      Line 235, consider "reduced" instead of "undermined". 

      “undermined” has been replaced with “compromised” in the revision (line 237)

      Line 262, "em" dashes should be a both ends of this insertion. 

      Line 298, "dysfunctional" would be better.

      Line 356, "samples were". 

      Line 481, 12-hr (add hyphen). 

      All above points have been optimized according to the reviewer’s suggestion.

      Line 712, Is "poly-N" meant? 

      “Poly-N” indicates undetected bases during sequencing. This explanation was added in the revision (lines 759-760).

      Figure 1K, the GAPDH signal is not visible and that panel is unnecessary as there is an H3 control.   

      Figure 1K and 1L respectively show levels of nuclear and cytoplasmic βcatenin. GAPDH and H3 were used as internal references for the cytoplasmic and nuclear fractions, respectively, confirming both robust fractionation and equal loading.

    1. Author response:

      We are grateful to the reviewers and editors for their insightful comments. All recognized that, while mutation recurrences have been used for inferring cancer drivers, our approach has the rigor of quantitative analysis. We would like to add that, without rigorously ruling out mutational hotspots, most CDNs have not been accepted as driver mutations.

      This paper develops the theory stating that i) recurrent point mutations are true Cancer Driving Nucleotides (CDNs); and ii) non-recurrent mutations are unlikely to be CDNs. The reviewers question that, with the theory, we still have not discovered new driving mutations. This is done in the companion paper. Table 3 shows that, averaged across cancer types, the conventional method would identify 45 CDGs while the CDN method tallies 258 CDGs. The power of the CDN method in identifying new driver genes is evident.

      The second question is "By this theory, will we be able discover most CDNs when the sample size increases from ~ 1000 to 10,000?"  This is a question of forecast and can be partially answered using GENIE data. Fig. 7 of this study shows that, when n increases from ~ 1000 to ~ 9,000, the numbers of discovered CDNs increase by 3 – 5 fold, most of which come from the two-hit class, as expected.

      Fig. 7 also addresses the queries whether we have used datasets other than TCGA. We indeed have used all public data, including GENIE, ICGC and other integrated resources such as COSMIC. For the main study, we rely on TCGA because it is unbiased for estimating the probability of CDN occurrences. In many datasets, the numerators are given but the denominators are not (the number of patients with the mutation / the total number of patients surveyed). 

      The third question is about mutation recurrences among cancer types. As stated by one reviewer, "different cancer types have unique mutational landscapes". While this is true when the analysis is done at the whole-gene level, one gets a different picture at the nucleotide level where the resolution is much higher. The pan-cancer trend of point mutations is evident in Fig. 4 of the companion paper.

      Again, we heartily appreciate the criticisms and suggestions of the reviewers and editors!

    1. Author response:

      We are grateful for the reviewers' acknowledgment of the originality of our manuscript and its potential importance in cancer treatment. We appreciate the reviewers' critiques on certain conclusions and thank them for their thorough feedback on the manuscript. In the revised version, we will provide a more detailed clarification of the previous data and methods, bolster the existing data, and present additional evidence in support of our hypothesis. Please find below our replies to particular concerns.

      In brief, to address the comments from Reviewer 1, we will make the following revisions in the manuscript:

      (1) To discuss the issues regarding the specificity of ATP5⍺ CAT-tailing, we will provide new patient-derived cell lines and tumor samples and investigate the CAT-tail modifications of nuclear genome-encoded mitochondrial proteins and changes in RQC proteins within them. We will endeavor to explore the nature of NEMF modifications in GSC cells (Fig. S1A).

      (2) To enhance the quality of image data, we will substitute some images (such as Fig. 1E and 3A) with higher quality images.

      (3) To further understand the influence of NEMF on cancer, the effects of NEMF overexpression in GSC cells will be evaluated through testing (e.g., Fig. 3D).

      (4) To further explore changes in apoptosis, we will employ additional methods to detect apoptosis, including Annexin-PI FACS assays, caspase cleavage analysis, assessing BAX-BCL2 ratios, and monitoring cytochrome c release.

      (5) To further confirm the effectiveness of the CAT-tailing-mitochondria mechanism in in vivo tumor models, we will utilize a Drosophila model to study the impact of the RQC pathway and CAT-tailing mechanism on tumor proliferation in vivo. The overactivation of the Notch signaling pathway in Drosophila can stimulate malignant proliferation of neural stem cells (NSCs) through both canonical (c-Myc mediated pathway) and non-canonical (PINK1-mitochondrial-mTORC2 pathway) pathways, leading to the development of a tumor-like phenotype in the larval brain. A recent publication in PNAS Nexus (Khaket et al., PNAS Nexus, 2024) discusses the impact of the RQC pathway on c-Myc. It is possible for us to analyze the alterations in CAT-tailing on mitochondrial proteins and mitochondrial membrane potential in this Notch model and study how the RQC pathway regulates them. Moreover, tumor implantation experiments will be carried out using immunodeficient mice. Our goal is to conduct a comparative analysis of the growth of control and NEMF KD glioblastoma cell lines in animal models, alongside performing essential biochemical analyses.

      Reference:

      Khaket, T. P., et al. (2024). Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS nexus, 3(8), pgae321.

      To address the comments from Reviewer 2, we will make the following revisions in the manuscript:

      (1) The concerns raised by the reviewer regarding the authenticity of the ATP5a CAT-tail modification are duly noted. Critical control experiments will be incorporated into our study, including NEMF knockout (or NFACT domain mutants) and cycloheximide treatment, alongside other methodologies. The results of these experiments will include placements such as Fig. 1B, 1C, S3A, and S3B to improve comprehension of the CAT-tail modification on ATP5⍺.

      (2) We thank the reviewer for reminding us to consider the differences between the artificial tail and the endogenous CAT-tail. A recently published study (Khan et al., 2024) provides a thorough analysis of the components of the CAT-tail. Our approach to addressing this issue involves emphasizing the use of the artificial CAT-tail sequence and adopting a more measured tone in the revised version. Additionally, we will induce the endogenous ATP5⍺-CAT-tail by express ATP5⍺-K20-non-stop in cells to validate their function in glioblastoma cells.

      (3) Moreover, we aim to examine the impact of different amino acid compositions in the ATP5⍺ c-terminus extension, such as the poly (Gly-Ser) repeats noted by the reviewer, on both mitochondrial function and glioblastoma biology in our revision. By comparing the results obtained from ATP5⍺-CAT-tails with different compositions, it is anticipated that more definitive conclusions can be drawn.

      (4) Additional minor revisions will be implemented to the text in accordance with the feedback given by the reviewer.

      Reference:

      Khan, D., Vinayak, A. A., Sitron, C. S., & Brandman, O. (2024). Mechanochemical forces regulate the composition and function of CAT tails. bioRxiv, 2024-08.

  2. Aug 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The current manuscript uses electron spin resonance spectroscopy to understand how the dynamic behavior and conformational heterogeneity of the LPS transport system change during substrate transport and in response to the membrane, bound nucleotide (or transition state analog), and accessory subunits. The study builds on prior structural studies to expand our molecular understanding of this highly significant bacterial transport system. 

      Strengths 

      This series of well-designed and well-executed experiments provides new mechanistic insights into the dynamic behavior of the LPS transport system. Notable new insights provided by this study include its indication of the spatial organization of the LptC domain, which was poorly resolved in structures, and how the LptC domain modulates the dynamic behavior of the gate through which lipids access the binding site. In addition, a mass spectrometry approach designed to examine LPS binding at different stages in the nucleotide-dependent conformational cycle provides insight into the order of operations of LPS binding and transport. 

      We thank the reviewer for the very positive comments and highlighting the important findings from our study.

      Reviewer #2 (Public Review):

      Lipopolysaccharide (LPS) is a major component of the outer membrane of Gram-negative bacteria and plays a critical role in bacterial virulence. The LPS export mechanism is a potential target for new antibiotics. Inhibiting this process can render bacteria more susceptible to the host immune system or other antibacterial agents. Given the rise of antibiotic-resistant bacteria, novel targets are urgently needed. The seven LPS transport (Lpt) proteins, A-G, move LPS from the inner to the outer membrane. This study investigated the conformational changes in the LptB2FG-LptC complex using site-directed spin labeling (SDSL) electron paramagnetic resonance (EPR) spectroscopy, revealing how ATP binding and hydrolysis affect the LptF βjellyroll domain and lateral gates. The findings highlight the role of LptC in regulating LPS entry, ensuring efficient and unidirectional transport across the periplasm. 

      The β-jellyrolls are not fully resolved in the vanadate-trapped structure of LptB2FG and LptB2FGC. Therefore, the current study provides valuable information on the functional dynamics of these periplasmic domains, their interactions, and their roles in the unidirectional transport of LPS. Additionally, the dynamic perspective of the lateral gates in LptFG in the presence and absence of LptC is another strength of this study. Moreover, at least in detergent samples, more comprehensive intermediates of the ATP turnover cycle are studied than in the available structures, providing crucial missing mechanistic details. 

      We thank the reviewer for highlighting our major findings!

      Other major strengths of the study include high-quality DEER distance measurements in both detergent and proteoliposomes, the latter providing valuable dynamics information in the lipid environment. However, lipid composition is not mentioned. The proteoliposome study is crucial since the previous structural study (Li, Orlando & Liao 2019) was done in rather small-diameter nanodiscs, which might affect the overall dynamics of the complex. It would have been beneficial if the investigators had reconstituted the complex in lipid nanodiscs with the same composition as proteoliposomes. The mixed lipid/detergent micelles provide an alternative. It seems the ATPase activity of the protein complex is much lower in detergent compared with lipid nanodiscs (Li, Orlando & Liao 2019). In the current study, ATPase activity in proteoliposomes is not provided. Also, the reviewer assumes cysteine-less (CL) constructs of the complex components were utilized. The ATPase assay on CL complex is not presented. Additionally, from previous structural studies and the mass spectrometry data presented here, LPS co-purifies and is already bound to the complex, thus the Apo state may represent the LPS-bound state without nucleotides. 

      The liposomes are made from E. coli polar lipid extract, which we added to the Materials and Methods part now. We could not yet perform the investigations in nanodiscs, which is one of our aims for future. The ATPase activity is lower in micelles and the reviewer is correct in that we did not perform/compare ATPase activity in proteoliposomes. The data denoted as wild-type (WT, Figure S4) corresponds to the cysteine-less (CL) variant, which is now corrected in the supporting information. As the reviewer commented, the mass spectrometry data reveal bound LPS in the apo-state. However, as seen from our results, ADP-Mg2+ state is similar to the apo state, thus in the cellular environment LPS may bind to this state as well.

      The selection of sites to probe lateral gate 2, which forms the main LPS entry site, may pose an issue. Although the authors provide justification based on the available structures, one site (position 325 in LptF) is located on a flexible loop, and position 52 in LptG is on the neighboring transmembrane helix, separated by a potentially flexible loop from the gating TM1. These labeling sites could exhibit significant local dynamics, resulting in a broader distribution of distances and potentially masking the gating-related conformational changes. 

      Position 52 in LptG is located at the beginning of the neighboring transmembrane helix. As we have discussed in the manuscript, position 325 in LptF is located on a short loop connected to TM5. In the structures, this loop shows a very similar orientation (Figure S6). Further, the observed heterogeneity for the lateral gate-2 is considerably modulated into distinct conformation(s) upon LptC binding (Figure 6D-E). This would not be the case if this loop possesses any independent flexibility. Confirming these observations, the room temperature continuous wave ESR spectra revealed the least flexibility for this spin pair (Figure S5, S7). In view of the reasons and observations detailed above, we conclude that the local flexibility at the labelled sites might not make any significant contribution for the broad distribution observed at this gate in LptB2FG (Figure 4). 

      Reviewer #3 (Public Review):

      Summary: 

      The manuscript by Dajka and co-workers reports the application of a biophysical approach to analyse the dynamics of the LptB2FG-C ABC transporter, involved in LPS transport across the cell envelope in Escherichia coli. LptB2FG-C belongs to a new class of ABC transporters (type VI) and is essential and conserved in several Gram-negative pathogens. Since LPS is the major component of the outer membrane of the Gram-negative cell and is responsible for the low permeability of this membrane to several antibiotics, a deep understanding of the mechanism and function of the LptB2FG-C transporter is crucial for the development of new drugs targeting Gram-negative pathogens. 

      Several structural studies have been published so far on the LptB2FG-C transporter, disclosing important aspects of the transport mechanism; nevertheless, lack of resolution of some regions of the individual proteins as well as the dynamic nature of the transport mechanism per se (e.g. the insertion and removal of the TM helix of LptC from the TMDs of the transporter during the LPS transport cycle) has greatly limited the understanding of the mechanism that couples ATP binding and hydrolysis with LPS transport. This knowledge gap could be filled by applying an approach that allows the analysis of dynamic processes. The DEER/PELDOR technique applied in this work fits well with this requirement. 

      Strengths: 

      In this study, the authors provide some new pieces of information on the LptB2FG-C function and the role of LptC in the transporter. Notably, they show that: 

      - There is high heterogeneity in the conformational states of the entry gate of LPS in the transporter (gate-2) that are reduced by the insertion of LptC, and the heterogeneity observed is not altered by ATP binding or hydrolysis (as expected since LPS entry is ATP-independent). 

      - ATP binding induces an allosteric opening of LptF β-jellyroll domain that allows for LPS passage to the β-jellyroll of LptC, which is stably associated with the β-jellyroll of LptF throughout the cycle. 

      - The β-jellyroll of LptG is highly flexible, indicating an involvement in the LPS transport cycle. 

      The manuscript is timely and overall clear. 

      We thank the reviewer for the positive comments and highlighting our findings and the strength of DEER/PELDOR spectroscopy for characterizing the dynamics aspect of the LPS transport system.

      Weaknesses:

      I list my concerns below and provide suggestions that, in my opinion, should be addressed to reinforce the findings of this study. 

      (1) Protein complex controls: the authors assess the ATPase activity of the spin-labelled variants of their protein complexes to rule out the possibility that engineering the proteins to enable spin labelling could affect their functionality (Figure S4). It has been reported that the association of LptC to LptB2FG complex inhibits its ATPase activity. However, in the ATPase assay data shown in Figure S4, the inhibitory effect of the LptC TM is not visible (please compare LptB2FG F-A45C G-I335C and F-L325C G-A52C with and without LptC). This can lead to suspect that the regulatory function of LptC is missing in the LptC-containing complexes used in this work. I suggest the authors include wt LptB2FGC in the assay to compare the ATPase activity of this complex with wt LptB2FG. The published inhibitory effect of TM LptC has been observed in proteoliposomes. Since it is not clear from the paper if the ATPase assay in Figure 4 has been conducted in DDM or proteoliposomes, the lack of inhibitory effect could be due to the assay conditions. A comparative test could answer this question. 

      We could not observe the inhibitory effect of LptC on the ATPase activity of LptB2FG. As the reviewer pointed out, the primary reason is that we performed the assays in detergent micelles and not in proteoliposomes. For this reason, a comparison of the activity between (cysteine-less) LptB2FG and LptB2FG-C as the reviewer suggested would not be informative. As this information is not directly relevant for our current interpretations, we plan to perform those experiments in liposomes in the near future.

      (2) Figure 2: NBD closure upon ATP binding to LptB2FG is convincingly demonstrated both in DDM micelles and proteoliposomes, validating the experimental system. However, since under physiological conditions, ATP binding should take place before the displacement of the TM of LptC (Wilson and Ruiz, Mol microbiol 2022), I suggest the authors carry out the experiments with LptC-containing complexes to investigate conformational changes (if any) that are triggered when ATP binding occurs before the TM displacement.  

      We thank the reviewer for the suggestion. These experiments are in our to do list and would be performed in the near future.

      (3) Proteoliposomes: in the experiments shown in Figures 3 and 4, unlike those in Figure 2, measurements in proteoliposomes give different results from the experiments in DDM, showing higher heterogeneity. Could this be related to the presence (or absence) of LPS in liposomes? It is not mentioned in the materials and methods section whether LPS is present. Could the authors please discuss this? 

      We thank the reviewer for bringing out this interesting point. The liposomes are made from E. coli polar lipid extract. In the polar lipid extract, phosphatidylethanolamine (PE) is the predominant lipid component with minor amounts of phosphatidylglycerol (PG) and cardiolipin. Thus, the differences in the heterogeneity we observed in proteoliposomes might not be due to the presence of LPS. We added a short description on this aspect in the ‘Discussion’ part.

      (4) The authors show large conformational heterogeneity in gate-2 (using the spin-labelled pair F-L325R1-G-A52R1) and suggest that deviation from the corresponding simulations could be due to the need for enhanced dynamics to allow for gate interaction with LPS or LptC. The effect of LptC is probed in the experiments shown in Figure 6, but I suggest the authors add LPS to the complexes to evaluate the possible stabilizing effect of LPS on the conformations shown in Figure 4. 

      This indeed is an important experiment, which we plan to do in the near future.

      (5) Figure 6: the measurement of lateral gate 1 and 2 dynamics in the LptC-containing complexes clearly supports the hypothesis, proposed based on the available structures, that TM LptC dissociates from LptB2FG upon ATP binding. However, direct evidence of this movement is still missing. Would it be possible to monitor the dynamics of the TM LptC by directly labelling this protein domain? This would give a conclusive demonstration of the displacement during the ATPase cycle. 

      Yes, it should be possible to label LptC and monitor its position with respect to LptF or LptG. These experiments are in progress in our laboratory. 

      (6) LPS release assay: Figure 6 panels H-I-J show the MS spectra relative to LPS-bound and free proteins obtained from wt LptB2FG upon ATP binding and ATP hydrolysis conditions. From these spectra the authors conclude that LPS is completely released only upon ATP hydrolysis. However, the current model predicts that LPS release into the Lpt bridge made by LptC-A-D is triggered by ATP binding. For this reason, I suggest the authors assess LPS release also from the LptB2FGC complex where, in the absence of LptA, LPS would be expected to be mostly retained by the complex under the same conditions. 

      These indeed are exciting experiments. LPS binding and release by LptB2FGC is in progress in our laboratories.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Page 2 typo: apo-sate should be apo-state 

      Thank you! We corrected the typo.

      Can the authors clarify whether LPS is co-purified with the protein? Does it remain bound throughout the liposome reconstitution process? 

      Our mass spectrometry data show that LPS is co-purified with LptB2FG in micelles. However, we cannot yet verify the presence of bound LPS after reconstitution into proteoliposomes. We added a sentence in the last paragraph before Discussion as ‘Thus, LPS is co-purified with LptB2FG in micelles.’

      Reviewer #2 (Recommendations for The Authors): 

      Several points require clarification: 

      (1) The reviewer would have benefited from access to the raw DEER traces. For instance, in Figure 4, the change in the raw data appears subtle. The differences between the Apo and vanadate-trapped states in b-DDM might be related to a lower signal-to-noise ratio in the Apo state. 

      We would be happy to share the raw DEER data upon request. The analysis is performed with the primary data, which also takes into account of the noise level for the calculating the confidence interval. Therefore, the distances with the 95% confidence interval are reliable to the extent as they are presented.  

      (2) The panel labels in Figures 2-4 do not match the legends. 

      Thank you! We corrected them.

      (3) In Figure 2G, the authors state, "Overall, the ATP-induced closure as observed in micelles (and the structures) is maintained in the native-like lipid bilayers for the NBDs." This statement is technically incorrect since the vanadate-trapped state is not equivalent to the ATP+EDTA "ATP binding" state, which was not tested in proteoliposomes (PLS). The authors should have tested this condition for a few mutants in proteoliposomes. They should revise the manuscript to reflect this or provide evidence that the ATP+EDTA state is similar to the vanadate-trapped state in PLS. 

      We corrected the sentence as ‘Overall, the nucleotide-induced closure as observed in micelles (and the structures) is maintained in the native-like lipid bilayers for the NBDs.’

      (4) The mutant F-L325R1_G-A52R1 is not optimal for probing gate 2. Specifically, position 325 in LptF is highly flexible, as indicated by the very broad distance distributions in Figure 4, and may hinder probing the associated conformational changes in this gate. Comparing the cryo-EM structures of this loop under different conditions (Figure S6) does not provide solid evidence for the lack of flexibility. 

      Position 52 in LptG is located at the beginning of the neighboring transmembrane helix. As we have discussed in the manuscript, position 325 in LptF is located on a short loop connected to TM5. In the structures, this loop shows a very similar orientation (Figure S6). Further, the observed heterogeneity for the lateral gate-2 is considerably modulated into distinct conformation(s) upon LptC binding (Figure 6D-E). This would not be the case if this loop possesses any independent flexibility. Confirming these observations, the room temperature continuous wave ESR spectra revealed the least flexibility for this spin pair (Figure S5, S7). In view of the reasons and observations detailed above, we conclude that the local flexibility of the labelled sites might not make any significant contribution for the broad distribution observed at this gate in LptB2FG (Figure 4). 

      (5) Regarding Figure 4B, the authors state, "In the vanadate-trapped and ATP samples, the major population is centered at 2 nm (which corresponds to the simulation on the vanadate trapped structure)". While the shift to shorter distances aligns with the structures, the average distance from the simulation is around 3 nm and does not correspond closely to the DEER distances of 2 nm. 

      Thank you for noting this point. We corrected the sentence as ‘In the vanadate-trapped and ATP samples, the major population is centred at 2 nm (which is closer to the simulation on the vanadate-trapped structure).’

      (6) Regarding Figure 4D, the authors state, "Unlike the lateral gate-1 (and the NBDs), ADP-Mg2+ also induced a similar shift in the distance distribution." The reviewer believes that even without interaction with LptC, an equilibrium exists between two states in gate-2, and ATP binding or vanadate-trapping shifts the equilibrium to a shorter-distance population. Additionally, if the signal-to-noise ratio of the Apo state were similar to that of the ADP-Mg2+ state, similar distance distributions would have been observed for the Apo state. 

      We thank the reviewer for bringing out this excellent point. We thoroughly modified the corresponding section as ‘ADP-Mg2+ also gave a broad distribution comparable to the apo-state. Thus, in the apo-state this gate appears to exist in an equilibrium between the two conformations observed from the corresponding structures. ATP binding or vanadate-trapping shifts the equilibrium towards the collapsed conformation.’

      (7) Defining the conformational dynamics of the b-jellyroll domains is one of the major strengths of this study. The LptF and LptG b-jellyroll domains exhibit high flexibility in detergent micelles. Unfortunately, none of the experiments were repeated in proteoliposomes to determine if this flexibility persists in a lipid environment. 

      As it is conceivable, it is truly beyond the scope of the current study to repeat all the measurements in liposomes. Currently we are extending those investigations to liposomes and would be able to provide more insights in the near future.

      (8) Regarding Figure 6G, the authors claim, "Distances corresponding to the apo state are present possibly due to an incomplete vanadate trapping for this sample." It is unlikely that vanadate trapping would be incomplete for just one sample. A repeat experiment is recommended. 

      We will update on this point is due time.

      (9) Regarding the structural dynamics of the lateral gates, detergent micelles, and liposomes are vastly different environments. It is challenging to reach a consensus model based on data mostly derived from detergent micelles and only a few from proteoliposomes. 

      The observations in PLS are qualitatively similar to the micellar sample for the investigated positions (please see the first paragraph in “Discussion”). Further, our observations are in agreement with previous structural and biochemical data and further extent the mechanism in a coherent manner. 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments 

      (1) Figure legends: There are several mismatches between panel nomenclature and the corresponding descriptions in the legends. Please check the correspondence between panel identification and descriptions throughout the manuscript (for example, F-G and H-J in Figure 2; and I and H in Figure 3). 

      Thank you! We corrected them.

      - Figure 6 legend: asterisk is in panel D and not C. 

      Corrected

      - Panels E and F are not mentioned. Moreover, the spectra for vanadate trapped conformation of LptF219-LptC104 have not been given a letter. 

      Corrected

      - A description of the different colors in the "Distance r" axis should be added to figure 2, 3, and 4 legends. 

      Corrected

      - Please indicate the meaning of the black arrows in figure legends. 

      Corrected

      (2) To improve data comprehension by the readers, the authors should indicate the relative spinlabelled pairs on the top of Figure 2, 3, and 4, as done for Figures 5 and 6. 

      Done

      (3) Reference 56 is cited incorrectly in the reference list and refers to a study employing reconstituted LptB2FG complexes rather than isolated β-jellyroll domains. 

      Corrected

      (4) Figure 3: How do the authors explain the evidence that ATP binding influences gate 1 conformational flexibility only in DDM micelles with respect of PLS? Is this something related to the release of LPS from the complex in different environments? 

      We do not know whether this difference is related to LPS release. Therefore, we generally interpreted as an effect of the membrane environment.

      (5) The initial sentence of the discussion looks somewhat incomplete, please correct it. 

      Done

      (6) To improve the readability of the paper, it could be useful to better focus the topic of the headings of the result paragraphs concerning the analysis of the individual lateral gates (for example, by indicating the name of the gate in the headings).

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors used a stopped-flow method to investigate the kinetics of substrate translocation through the channel in hexameric ClpB, an ATP-dependent bacterial protein disaggregase. They engineered a series of polypeptides with the N-terminal RepA ClpB-targeting sequence followed by a variable number of folded titin domains. The authors detected translocation of the substrate polypeptides by observing the enhancement of fluorescence from a probe located at the substrate's C-terminus. The total time of the substrates' translocation correlated with their lengths, which allowed the authors to determine the number of residues translocated by ClpB per unit time.

      Strengths:

      This study confirms a previously proposed model of processive translocation of polypeptides through the channel in ClpB. The novelty of this work is in the clever design of a series of kinetic experiments with an engineered substrate that includes stably folded domains. This approach produced a quantitative description of the reaction rates and kinetic step sizes. Another valuable aspect is that the method can be used for other translocases from the AAA+ family to characterize their mechanism of substrate processing.

      Weaknesses:

      The main limitation of the study is in using a single non-physiological substrate of ClpB, which does not replicate the physical properties of the aggregated cellular proteins and includes a non-physiological ClpB-targeting sequence. Another limitation is in the use of ATPgammaS to stimulate the substrate processing. It is not clear how relevant the results are to the ClpB function in living cells with ATP as the source of energy, a multitude of various aggregated substrates without targeting sequences that need ClpB's assistance, and in the presence of the co-chaperones.

      Indeed, we agree that our RepA-Titinx substrates are not aggregates but are model, soluble, substrates used to reveal information about enzyme catalyzed protein unfolding and translocation.  Our substrates are similar to RepA-GFP and GFP-SsrA used by multiple labs including Wickner, Horwich, Sauer, Baker, Shorter, Bukua, to name only a few.  The fact that “this is what everyone does” does not make the substrates physiological or the most ideal. However, this is the technology we currently have until we and others develop something better. In the meantime, we contend that  the results presented here do advance our knowledge on enzyme catalyzed protein unfolding

      Part of what this manuscript seeks to accomplish is presenting the development of a single-turnover experiment that reports on processive protein unfolding by AAA+ molecular motors, in this case, ClpB.  Importantly, we are treating translocation on an unfolded polypeptide chain and protein unfolding of stably folded proteins as two distinct reactions catalyzed by ClpB. If these functions are used to disrupt protein aggregates, in vivo, then this remains to be seen.

      We contend that processive ClpB catalyzed protein unfolding has not been rigorously demonstrated prior to our results presented here.  Avellaneda et al mechanically unfolded their substrate before loading ClpB (Avellaneda, Franke, Sunderlikova et al. 2020).  Thus, their experiment represents valuable observations reflecting polypeptide translocation on a pre-unfolded protein.  Our previous work using single-turnover stopped-flow experiments employed unstructured synthetic polypeptides and therefore reflects polypeptide translocation and not protein unfolding (Li, Weaver, Lin et al. 2015).  Weibezahn et al used unstructured substrates in their study with ClpB (BAP/ClpP), and thus their results represent translocation of a pre-unfolded polypeptide and not enzyme catalyzed protein unfolding (Weibezahn, Tessarz, Schlieker et al. 2004). 

      Many studies have reported the use of  GFP with tags or RepA-GFP and used the loss of GFP fluorescence to conclude protein unfolding.  However, such results do not reveal if ClpB processively and fully translocates the substrate through its axial channel.  One cannot rule out, even when trapping with “GroEL trap”, the possibility that ClpB only needs to disrupt some of the fold in GFP before cooperative unfolding occurs leading to loss of fluorescence.  Once the cooperative collapse of the structure occurs and fluorescence is lost it has not been shown that ClpB will continue to translocate on the newly unfolded chain or dissociate. In fact, the Bukau group showed that folded YFP remained intact after luciferase was unfolded (Haslberger, Zdanowicz, Brand et al. 2008).  Our approach, reported here, yields signal upon arrival of the motor at the c-terminus or within the PIFE distance thus we can be certain that the motor does arrive at the c-terminus after unfolding up to three tandem repeats of the Titin I27 domain.

      ATPgS is a non-physiological nucleotide analog.  However, ClpB has been shown to exhibit curious behavior in its presence that we and others, as the reviewer acknowledges, do not fully understand (Doyle, Shorter, Zolkiewski et al. 2007).  Some of the experiments reported here are seeking to better understand that fact.  Here we have shown that ATPgS alone will support processive protein unfolding. With this assay in hand, we are now seeking to go forward and address many of the points raised by this reviewer. 

      The authors do not attempt to correlate the kinetic step sizes detected during substrate translocation and unfolding with the substrate's structure, which should be possible, given how extensively the stability and unfolding of the titin I27 domain were studied before. Also, since the substrate contains up to three I27 domains separated with unstructured linkers, it is not clear why all the translocation steps are assumed to occur with the same rate constant.

      We assume that all protein unfolding steps occur with the same rate constant, ku.  We conclude that we are not detecting the translocation rate constant, kt, as our results support a model where kt is much faster than ku.  We do think it makes sense that the same slow step occurs between each cycle of protein unfolding.

      We have added a discussion relating our observations to mechanical unfolding of tandem repeats of Titin I27 from AFM experiments  (Oberhauser, Hansma, Carrion-Vazquez and Fernandez 2001). Most interestingly, they report unfolding of Titin I27 in 22 nm steps.  Using 0.34 nm per amino acids this yields ~65 amino acids per unfolding step, which is comparable to our kinetic step-size of 57 – 58 amino acids per step.

      Some conclusions presented in the manuscript are speculative:

      The notion that the emission from Alexa Fluor 555 is enhanced when ClpB approaches the substrate's C-terminus needs to be supported experimentally. Also, evidence that ATPgammaS without ATP can provide sufficient energy for substrate translocation and unfolding is missing in the paper.

      In our previous work we have used fluorescently labeled 50 amino acid peptides as substrates to examine ClpB binding (Li, Lin and Lucius 2015, Li, Weaver, Lin et al. 2015).  In that work we have used fluorescein, which exhibits quenching upon ClpB binding.  We have added a control experiment where we have attached alexa fluor 555 to the 50 amino acid substrate so we can be assured the ClpB binds close to the fluorophore.  As seen in supplemental Fig. 1 A  upon titration with ClpB, in the presence of ATPγS, we observe an increase in fluorescence from AF555, consistent with PIFE.  Supplemental Fig. 1 B shows the relative fluorescence enhancement at the peak max increases up to ~ 0.2 or a 20 % increase in fluorescence, due to PIFE, upon ClpB binding.   

      Further, peak time is our hypothesized measure of ClpB’s arrival at the dye. Our results indicate that the peak time linearly increases as a function of an increase in the number of folded TitinI27 repeats in the substrates which also supports the PIFE hypothesis. Finally, others have shown that AF555 exhibits PIFE and we have added those references.

      The evidence that ATPγS alone can support translocation is shown in Fig. 2 and supplemental Figure 1.  Fig. 2 and supplemental Figure 1 are two different mixing strategies where we use only ATPgS and no ATP at all.  In both cases the time courses are consistent with processive protein unfolding by ClpB with only ATPγS.

      Reviewer #2 (Public Review):

      Summary:

      The current work by Banwait et al. reports a fluorescence-based single turnover method based on protein-induced fluorescence enhancement (PIFE) to show that ClpB is a processive motor. The paper is a crucial finding as there has been ambiguity on whether ClpB is a processive or non-processive motor. Optical tweezers-based single-molecule studies have shown that ClpB is a processive motor, whereas previous studies from the same group hypothesized it to be a non-processive motor. As co-chaperones are needed for the motor activity of the ClpB, to isolate the activity of ClpB, they have used a 1:1 ratio ATP and ATPgS, where the enzyme is active even in the absence of its co-chaperones, as previously observed. A sequential mixing stop-flow protocol was developed, and the unfolding and translocation of RepA-TitinX, X = 1,2,3 repeats was monitored by measuring the fluorescence intensity with the time of Alexa F555 which was labelled at the C-terminal Cysteine. The observations were a lag time, followed by a gradual increase in fluorescence due to PIFE, and then a decrease in fluorescence plausibly due to the dissociation from the substrate allowing it to refold. The authors observed that the peak time depends on the substrate length, indicating the processive nature of ClpB. In addition, the lag and peak times depend on the pre-incubation time with ATPgS, indicating that the enzyme translocates on the substrates even with just ATPgS without the addition of ATP, which is plausible due to the slow hydrolysis of ATPgS. From the plot of substrate length vs peak time, the authors calculated the rate of unfolding and translocation to be ~0.1 aas-1 in the presence of ~1 mM ATPgS and increases to 1 aas-1 in the presence of 1:1 ATP and ATPgS. The authors have further performed experiments at 3:1 ATP and ATPgS concentrations and observed ~5 times increase in the translocation rates as expected due to faster hydrolysis of ATP by ClpB and reconfirming that processivity is majorly ATP driven. Further, the authors model their results to multiple sequential unfolding steps, determining the rate of unfolding and the number of amino acids unfolded during each step. Overall, the study uses a novel method to reconfirm the processive nature of ClpB.

      Strengths:

      (1) Previous studies on understanding the processivity of ClpB have primarily focused on unfolded or disordered proteins; this study paves new insights into our understanding of the processing of folded proteins by ClpB. They have cleverly used RepA as a recognition sequence to understand the unfolding of titin-I27 folded domains.

      (2) The method developed can be applied to many disaggregating enzymes and has broader significance.

      (3) The data from various experiments are consistent with each other, indicating the reproducibility of the data. For example, the rate of translocation in the presence of ATPgS, ~0.1 aas-1 from the single mixing experiment and double mixing experiment are very similar.

      (4) The study convincingly shows that ClpB is a processive motor, which has long been debated, describing its activity in the presence of only ATPgS and a mixture of ATP and ATPgS.

      (5) The discussion part has been written in a way that describes many previous experiments from various groups supporting the processive nature of the enzyme and supports their current study.

      Weaknesses:

      (1) The authors model that the enzyme unfolds the protein sequentially around 60 aa each time through multiple steps and translocates rapidly. This contradicts our knowledge of protein unfolding, which is generally cooperative, particularly for titinI27, which is reported to unfold cooperatively or utmost through one intermediate during enzymatic unfolding by ClpX and ClpA.

      We do not think this represents a contradiction.  In fact, our observations are in good agreement with mechanical unfolding of tandem repeats of Titin I27 using AFM experiments (Oberhauser, Hansma, Carrion-Vazquez and Fernandez 2001).  They showed that tandem repeats of TitinI27 unfolded in steps of ~22 nm.  Dividing 22 nm by 0.34 nm/Amino Acid gives ~65 amino acids per unfolding event.  This implies that, under force, ~65 amino acids of folded structure unfolds in a single step.  This number is in excellent agreement with our kinetic step-size of 65 AA/step. 

      Importantly, the experiments cited by the reviewer on ClpA and ClpX are actually with ClpAP and ClpXP.  We assert that this is an important distinction as we have shown that ClpA employs a different mechanism than ClpAP (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013, Miller and Lucius 2014).  Thus, ClpA and ClpAP should be treated as different enzymes but, without question, ClpB and ClpA are different enzymes.

      (2) It is also important to note that the unfolding of titinI27 from the N-terminus (as done in this study) has been reported to be very fast and cannot be the rate-limiting step as reported earlier(Olivares et al, PNAS, 2017). This contradicts the current model where unfolding is the rate-limiting step, and the translocation is assumed to be many orders faster than unfolding.

      Most importantly, the Olivares paper is examining ClpXP and ClpAP catalyzed protein unfolding and translocation and not ClpB.  These are different enzymes.  Additionally, we have shown that ClpAP and ClpA translocate unfolded polypeptides with different rates, rate constants, and kinetic step-sizes indicating that ClpP allosterically impacts the mechanism employed by ClpA to the extent that even ClpA and ClpAP should be considered different enzymes (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013).  We would further assert that there is no reason to assume ClpAP and ClpXP would catalyze protein unfolding using the same mechanism as ClpB as we do not think it should be assumed ClpA and ClpX use the same mechanism as ClpAP and ClpXP, respectively. 

      The Olivares et al paper reports a dwell time preceding protein unfolding of ~0.9 and ~0.8 s for ClpXP and ClpAP, respectively.   The inverse of this can be taken as the rate constant for protein unfolding and would yield a rate constant of ~1.2 s-1, which is in good agreement with our observed rate constant of 0.9 – 4.3 s-1 depending on the ATP:ATPγS mixing ratio.  For ClpB, we propose that the slow unfolding is then followed by rapid translocation on the unfolded chain where translocation by ClpB must be much faster than for ClpAP and ClpXP.  We think this is a reasonable interpretation of our results and not a contradiction of the results in Olivares et al. Moreover, this is completely consistent with the mechanistic differences that we have reported, using the same single-turnover stopped flow approach on the same unfolded polypeptide chains with ClpB, ClpA, and ClpAP (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013, Miller and Lucius 2014, Li, Weaver, Lin et al. 2015).

      (3) The model assumes the same time constant for all the unfolding steps irrespective of the secondary structural interactions.

      Yes, we contend that this is a good assumption because it represents repetition of protein unfolding catalyzed by ClpB upon encountering the same repeating structural elements, i.e. Beta sheets. 

      (4) Unlike other single-molecule optical tweezer-based assays, the study cannot distinguish the unfolding and translocation events and assumes that unfolding is the rate-limiting step.

      Although we cannot, directly, distinguish between protein unfolding and translocation we have logically concluded that protein unfolding is likely rate limiting. This is because the large kinetic step-size represents the collapse of ~60 amino acids of structure between two rate-limiting steps, which we interpret to represent cooperative protein unfolding induced by ClpB.  It is not an assumption it is our current best interpretation of the observations that we are now seeking to further test. 

      Reviewer #3 (Public Review):

      Summary:

      The authors have devised an elegant stopped-flow fluorescence approach to probe the mechanism of action of the Hsp100 protein unfoldase ClpB on an unfolded substrate (RepA) coupled to 1-3 repeats of a folded titin domain. They provide useful new insight into the kinetics of ClpB action. The results support their conclusions for the model setup used.

      Strengths:

      The stopped-flow fluorescence method with a variable delay after mixing the reactants is informative, as is the use of variable numbers of folded domains to probe the unfolding steps.

      Weaknesses:

      The setup does not reflect the physiological setting for ClpB action. A mixture of ATP and ATPgammaS is used to activate ClpB without the need for its co-chaperones, Hsp70. Hsp40 and an Hsp70 nucleotide exchange factor. This nucleotide strategy was discovered by Doyle et al (2007) but the mechanism of action is not fully understood. Other authors have used different approaches. As mentioned by the authors, Weibezahn et al used a construct coupled to the ClpA protease to demonstrate translocation. Avellaneda et al used a mutant (Y503D) in the coiled-coil regulatory domain to bypass the Hsp70 system. These differences complicate comparisons of rates and step sizes with previous work. It is unclear which results, if any, reflect the in vivo action of ClpB on the disassembly of aggregates.

      We agree with the reviewer, there are several strategies that have been employed to bypass the need for Hsp70/40 or KJE to simplify in vitro experiments.  Here we have developed a first of its kind transient state kinetics approach that can be used to examine processive protein unfolding.  We now seek to go forward with examining the mechanisms of hyperactive mutants, like Y503D, and add the co-chaperones so that we can address the limitations articulated by the reviewer.   In fact we already began adding DnaK to the reaction and found that DnaK induced ClpB to release the polypeptide chain (Durie, Duran and Lucius 2018).  However, the sequential mixing strategy developed here was needed to go forward with examining the impact of co-chaperones. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 1: I recommend changing the title of the paper to remove the terms that are not clearly defined in the text: "robust" and "processive". What are the Authors' criteria for describing a molecular machine as "robust" vs. "not robust"? A definition of processivity is given in equation 2, but its value for ClpB is not reported in the text, and the criteria for classifying a machine as "processive" vs. "non-processive" are not included. Besides, the Authors have previously reported that ClpB is non-processive (Biochem. J., 2015), so it is now clear that a more nuanced terminology should be applied to this protein. Also, Escherichia coli should be fully spelled out in the title.

      The title has been changed.  We have removed “robust” as we agree with the reviewer, there is no way to quantify “robust”.  However, we have kept “processive” and have added to the discussion a calculation of processivity since we can quantify processivity.  Importantly, the unstructured substrates used in our previous studies represent translocation and not protein unfolding.  here, on folded substrates, we detect rate-limiting protein unfolding followed by rapid translocation.  Thus, we report a lower bound on protein unfolding processivity of 362 amino acids. 

      Line 20: The comment about mitochondrial SKD3 should be removed. SKD3, like ClpB, belongs to the AAA+ family, and it is simply a coincidence that the original study that discovered SKD3 termed it an Hsp100 homolog. The similarity between SKD3 and ClpB is limited to the AAA+ module, so there are many other metazoan ATPases, besides SKD3, that could be called homologs of ClpB, including mitochondrial ClpX, ER-localized torsins, p97, etc.

      Removed.

      Lines 133-139. Contrary to what the authors state, it is not clear that the "lag-phase" becomes significantly shorter for subsequent mixing experiments (Figure 1E) perhaps except for the last one (2070 s). It is clear, however, that the emission enhancement becomes stronger for later mixes. This effect should be discussed and explained, as it suggests that the pre-equilibrations shorter than ~2000 sec do not produce saturation of ClpB binding to the substrate.

      We have added supplemental figure 2, which represents a zoom into the lag region.  This better illustrates what we were seeing but did not clearly show to the reader.  In addition, we address all three changes in the time courses, i.e. extend of lag, change in peak position, and the change in peak height. 

      Line 175. The hydrolysis rate of ATPgammaS in the presence of ClpB should be measured and compared to the hydrolysis rate with ATP/ATPgammaS to check if the ratio of those rates agrees with the ratio of the translocation rates. These experiments should be performed with and without the RepA-titin substrate, which could reveal an important linkage between the ATPase engine and substrate translocation. These experiments are essential to support the claim of substrate translocation and unfolding with ATPgammaS as the sole energy source.

      The time courses shown in figure 2 and supplemental Figure 1 are collected with only ATPgS and no ATP.  The time courses show a clear increase in lag and appearance of a peak with increasing number of tandem repeats of titin domains.  We do not see an alternate explanation for this observation other than ATPγS supports ClpB catalyzed protein unfolding and translocation.  What is the reviewers alternate explanation for these observations?

      We agree with the reviewer that the linkage of ATP hydrolysis to protein unfolding and translocation is essential and we are seeking to acquire this knowledge.  However, a simple comparison of the ratio of rates is not adequate. We contend that a complete mechanistic study of ATP turnover by ClpB is required to properly address this linkage and such a study is too substantial to be included here but is currently underway. 

      All that said, the statement on line 175 was removed since we do not report any ATPase measurements in this paper.

      Line 199: It is an over-simplification to state that "1:1 mix of ATP to ATPgammaS replaces the need for co-chaperones". This sentence should be corrected or removed. The ClpB co-chaperones (DnaK, DnaJ, GrpE) play a major role in targeting ClpB to its aggregated substrates in cells and in regulating the ClpB activity through interactions with its middle domain. ATPgammaS does not replace the co-chaperones; it is a chemical probe that modifies the mechanism of ClpB in a way that is not entirely understood.

      We agree with the reviewer.  The sentence has been modified to point out that the mix of ATP and ATPγS activates ClpB.

      Figure 3B, Supplementary Figure 5A. The solid lines from the model fit cannot be distinguished from the data points. Please modify the figures' format to clearly show the fits and the data points.

      Done.

      Lines 326, 329. It is not clear why the authors mention a lack of covalent modification of substrates by ClpB. AAA+ ATPases do not produce covalent modifications of their substrates.

      The issue of covalent modification was presented in the introduction lines 55 – 60 pointing out that much of what we have learned about protein unfolding and translocation catalyzed by ClpA and ClpX is from the observations of proteolytic degradation catalyzed by the associated protease ClpP.  However, this approach is not possible for ClpB/Hsp104 as these motors do not associate with a protease unless they have been artificially engineered to do so. 

      Lines 396-399. I am puzzled why the authors try to correlate the size of the detected kinetic step with the length of the ClpB channel instead of the size characteristics of the substrate.

      We are attempting to discuss/rationalize the observed large kinetic step-size which, in part, is defined by the structural properties of the enzyme as well as the size characteristics of the substrate.  We have attempted to clarify this and better discuss the properties of the substrate as well as ClpB.

      As I mentioned in the Public Review, it is essential to demonstrate that the emission increase used as the only readout of the ClpB position along the substrate is indeed caused by the proximity of ClpB to the fluorophore. One way to accomplish that would be to place the fluorophore upstream from the first I27 domain and determine if the "lag phase" in the emission enhancement disappears.

      Alexa Fluor 555 is well established to exhibit PIFE.  However, as in the response to the public review, we have included an appropriate control showing this in supplemental Fig. 1.

      Finally, the authors repetitively place their results in opposition to the study of Weibezahn et al. published in 2004 which first demonstrated substrate translocation by engineering a peptidase-associated variant of ClpB. It should be noted that the field of protein disaggregases has moved since the time of that publication from the initial "from-start-to-end" translocation model to a more nuanced picture of partial translocation of polypeptide loops with possible substrate slipping through the ClpB channel and a dynamic assembly of ClpB hexamers with possible subunit exchange, all of which may affect the kinetics in a complex way. However, the present study confirmed the "start-to-end" translocation model, albeit for a non-physiological ClpB substrate, and that is the take-home message, which should be included in the text.

      It is not clear to us that the field has “moved on” since Weibezahn et al 2004.  Their engineered construct that they term “BAP” with ClpP is still used in the field despite us reporting that proteolytic degradation is observed in the absence of ATP with that system  (Li, Weaver, Lin et al. 2015) and should, therefore, not be used to conclude processive energy driven translocation. The “partial translocation” by ClpB is also grounded in observations of partial degradation catalyzed by ClpP with BAP from the same group (Haslberger, Zdanowicz, Brand et al. 2008). It is not clear to us that the idea of subunit exchange leading to the possibility of assembly around internal sequences is being considered.  We do agree that this is an important mechanistic possibility that needs further interrogation. We agree with the reviewer, all these factors are confounding and lead to a more nuanced view of the mechanism.

      All that said, we have removed some of the opposition in the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is assumed that the lag phase will be much longer than the phase in which we see a gradual increase in fluorescence, as the effect of PIFE is significant only when the enzyme is very close to the fluorophore. Particularly for RepA-titin3, the enzyme has to translocate many tens of nm before it is closer to the C-terminus fluorophore. However, in all cases, the lag time is lower or similar to the gradual increase phase (for example, Figure 3B). Could the authors explain this?

      The extent of the lag, or time zero until the signal starts to increase, is interpreted to indicate the time the motor moves from it’s initial binding site until it gets close enough to the fluorophore that PIFE starts to occur.  In our analysis we apply signal change to the last intermediate and dissociation or release of unfolded RepA-TitinX.  The increase in PIFE is not “all or nothing”.  Rather, it is starting to increase gradually.  Further, because these are ensemble measurements, and each molecule will exhibit variability in rate there is increased breadth of the peak due to ensemble averaging. 

      (2) Although the reason for differences in the peak position (for example, Figure 1E, 2B) is apparent, the reason for variations in the relative intensities has to be given or speculated.

      We have addressed the reason for the different peak heights in the revised manuscript.  It is the consequence of the fact that each substrate has slightly different fluorescent labeling efficiencies.  Thus, for each sample there is a mix of labeled and unlabeled substrates both of which will bind to ClpB but the unlabeled ClpB bound substrates do not contribute to the fluorescence signal, but will represent a binding competitor.  Thus, for low labeling efficiency there is a lower concentration of ClpB bound to fluorescent RepA-Titinx and for higher labeling efficiency there is higher concentration of ClpB bound to RepA-Titinx leading to an increased peak height.  RepA-Titin2 has the highest labeling efficiency and thus the largest peak height.

      Reviewer #3 (Recommendations For The Authors):

      The authors should make it clear that they and previous authors have used different constructs or conditions to bypass the physiological regulation of ClpB action by Hsp70 and its co-factors as mentioned above. In particular, the construct used by Avellaneda et al should be explained when they challenge the findings of those authors.

      Minor points:

      The lines fitting the experimental points are difficult or impossible to see in Figures 2B, 3B, and s5B.

      Fixed

      Typo bottom of p6 - "averge"

      Fixed

      Avellaneda, M. J., K. B. Franke, V. Sunderlikova, B. Bukau, A. Mogk and S. J. Tans (2020). "Processive extrusion of polypeptide loops by a Hsp100 disaggregase." Nature.

      Doyle, S. M., J. Shorter, M. Zolkiewski, J. R. Hoskins, S. Lindquist and S. Wickner (2007). "Asymmetric deceleration of ClpB or Hsp104 ATPase activity unleashes protein-remodeling activity." Nature structural & molecular biology 14(2): 114-122.

      Durie, C. L., E. C. Duran and A. L. Lucius (2018). "Escherichia coli DnaK Allosterically Modulates ClpB between High- and Low-Peptide Affinity States." Biochemistry 57(26): 3665-3675.

      Haslberger, T., A. Zdanowicz, I. Brand, J. Kirstein, K. Turgay, A. Mogk and B. Bukau (2008). "Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments." Nat Struct Mol Biol 15(6): 641-650.

      Li, T., J. Lin and A. L. Lucius (2015). "Examination of polypeptide substrate specificity for Escherichia coli ClpB." Proteins 83(1): 117-134.

      Li, T., C. L. Weaver, J. Lin, E. C. Duran, J. M. Miller and A. L. Lucius (2015). "Escherichia coli ClpB is a non-processive polypeptide translocase." Biochem J 470(1): 39-52.

      Miller, J. M., J. Lin, T. Li and A. L. Lucius (2013). "E. coli ClpA Catalyzed Polypeptide Translocation is Allosterically Controlled by the Protease ClpP." Journal of Molecular Biology 425(15): 2795-2812.

      Miller, J. M. and A. L. Lucius (2014). "ATP-gamma-S Competes with ATP for Binding at Domain 1 but not Domain 2 during ClpA Catalyzed Polypeptide Translocation." Biophys Chem 185: 58-69.

      Oberhauser, A. F., P. K. Hansma, M. Carrion-Vazquez and J. M. Fernandez (2001). "Stepwise unfolding of titin under force-clamp atomic force microscopy." Proc Natl Acad Sci U S A 98(2): 468-472.

      Rajendar, B. and A. L. Lucius (2010). "Molecular mechanism of polypeptide translocation catalyzed by the Escherichia coli ClpA protein translocase." J Mol Biol 399(5): 665-679.

      Weibezahn, J., P. Tessarz, C. Schlieker, R. Zahn, Z. Maglica, S. Lee, H. Zentgraf, E. U. Weber-Ban, D. A. Dougan, F. T. Tsai, A. Mogk and B. Bukau (2004). "Thermotolerance requires refolding of aggregated proteins by substrate translocation through the central pore of ClpB." Cell 119(5): 653-665.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers and editor for their helpful comments. We have addressed their concerns as detailed below.

      It would have been nice to have included a bona-fide SIRT2 target as a control throughout the study.

      We agree that including a bona-fide SIRT2 target as a control is important for validating our results. Previous data from our work has shown that SIRT2 demyristoylates ARF6. Thus, we have included a blot in Figure S15 demonstrating that SIRT2 knockdown results in increased myristoylation of ARF6. This serves as a control to confirm the activity and role of SIRT2 in our study.

      Did the authors also consider investigating SIRT1 in their assays? SIRT1 activates ACSS2 while SIRT2 leads to degradation of ACSS2. They should at least discuss these seemingly opposing roles of SIRT1 and SIRT2 in the regulation of ACSS2 and acetate metabolism in more depth particularly as it concerns situations (i.e., diseases, pathologies) where either SIRT1, SIRT2, or both sirtuins, are active. This would enhance the significance of the findings to the broader research community.

      The study by Hallows et al. showed increased SIRT1 deacetylate K661 of ACSS2 and increase its catalytic activity. Subsequently, a follow-up investigation unveiled the role of the circadian clock in modulating intracellular acetyl-CoA levels through SIRT1-catalyzed K661 deacetylation of. Conversely, our research elucidates a contrasting mechanism wherein SIRT2 inhibits ACSS2 by deacetylating K271 under conditions of nutrient stress. The dual regulation of ACSS2 by SIRT1 through the circadian clock and SIRT2 under nutrient stress underscores the intricate and multifaceted nature of regulatory mechanisms involved in lipid metabolism. These findings underscore the versatility of lysine acetylation in modulating cellular metabolic pathways.

      Collectively, these studies contribute to a better understanding of how SIRT1 and SIRT2 regulate ACSS2 activity in various metabolic contexts, thereby enhancing our knowledge of acetate metabolism and its implications in health and disease.

      We have included such discussion of the manuscript.

      In Figure 3, the authors should consider immunoblotting for endogenous ACSS2 throughout the differentiation and lipogenesis study since the total ACSS2 levels is the crucial aspect to affecting acetate-dependent promotion of lipogenesis in adipocytes, and to confirm TM-dependent stabilization of ACSS2 in that assay.

      We have updated Figure 3 to include immunoblotting for endogenous ACSS2 levels. Additionally, we have confirmed the TM-dependent stabilization of ACSS2, which is now shown in Figure S12.

      Do the authors have any data proving the K271 mutants of ACSS2 are still functional? Or that K271 ACSS2 protein is folded correctly?

      To assess the functionality of the mutants, we isolated Flag-tagged wildtype, K271R, and K271Q ACSS2 proteins from SIRT2 knockdown HEK293T cells. Subsequently, we examined acetyl-CoA formation from acetate and CoA using high-performance liquid chromatography (HPLC). Our findings indicate that while the wildtype ACSS2 exhibits slightly higher activity compared to the K271R and K271Q mutants, but all variants remain functional (Figure S13).

      Nearly all experiments are performed in a single cell line. Authors should test whether SIRT2 regulates ACSS2 acetylation in at least 1 or 2 more cell lines. Does SIRT2 regulate ACSS2 acetylation in 3T3-L1 preadipocytes?

      Experiments showing that endogenous ACSS2 levels change in EBSS and nutrient-deprived media were repeated in A549 cells (Figure S5). However, due to the poor transfection efficiency of A549 cells, we were unable to obtain acetylation data. Similarly, conducting acetylation experiments in 3T3-L1 preadipocytes is challenging due to poor transfection efficiency.

      The article does not explicitly address whether the absence of amino acids impacts the acetylation and subsequent degradation of ACSS2 by activating SIRT2. If so, one would expect the level of ACSS2 acetylation or ACSS2 expression under amino acid deprivation to be lower than that under normal conditions, as depicted in Fig. 1C and Fig. S3.

      The experiments shown in Fig. 1C and Fig. S3 were using overexpressed Flag-tagged ACSS2 and we actually adjust the amount of DNA used to have similar Flag-ACSS2 levels.

      To address the comment raised by the reviewer, we added Figure S14, which shows that endogenous ACSS2 acetylation is decreased under amino acid deprivation in SIRT2 control KD cells, indicating that the absence of amino acids impacts ACSS2 acetylation. The decreased expression of ACSS2 under amino acid deprivation is also addressed in Figure S6.

      Several reviewers noted discrepancies between what is occurring to basal levels of ACSS2 vs in SIRT2 KD conditions. Fig. 2H shows higher basal level of acetylated ACSS2 in K271R mutant compared to wildtype (input may be an issue). If Fig. 2H is a critical piece of data, authors are recommended to show this using FLAP-IP & then Ac-K.

      The increased stability of the K271R mutant compared to the wildtype (WT) results in higher protein levels, which results in the different input levels. However, this does not affect the conclusion that K271 is the acetylation site as the quantification result shows that K271R mutant has lower acetylation level and is not regulated by SIRT2 (Figure S16).

      Regarding the basal levels of ACSS2 in control and SIRT2 KD conditions, it was because the experiments in question were using overexpressed Flag-tagged ACSS2 and we actually adjust the amount of DNA used to have similar Flag-ACSS2 levels. To address the concern, we monitored endogenous ACSS2 protein and acetylation levels and the results are shown in Figure S14.

      Also, in Fig 2I there is no difference in basal ubiquitination between WT and K271R mutant. Related, based on model you would expect that overexpression of ACSS2-K271R mutant compared to wildtype would be at higher levels. In many figures authors do not see this (Fig. 2I, 3A, 3B). This needs to be explained.

      This is related to some previous comments. In these experiments, we actually adjusted the DNA used in the transfection to obtain equal protein levels so that we can quantify other things (acetylation or ubiquitination levels). As stated in the manuscript regarding Figures 3A and 3B, "To ensure comparable expression levels at the beginning, we adjusted the amount of transfected DNA for both wild-type and the K271R mutant ACSS2." This approach allowed us to accurately compare the ubiquitination status between the wildtype and K271R mutant ACSS2 variants.

      Data showing role of ACSS2-K271 mutant in lipid accumulation requires clarification. Based on model overexpression of ACSS2-K271 mutant should by itself cause increased lipid accumulation compared to wildtype.

      This is indeed the case and we have added this in the revised manuscript “Consistent with our above observation that ACSS2 K271R mutant is more stable than the WT, expressing the K271R mutant lead to more lipid droplets than expressing the WT ACSS2 (Figure S12).”

      Loading controls are notably absent at certain instances, such as IPs in Fig. 1A, 1C, and the IP in Fig. 2H. Such controls are required to interpret potential changes in acetylation.

      For this experiment, we employed an approach where we overexpressed Flag-tagged wild-type (WT) and mutant forms of ACSS2. We conducted an immunoprecipitation (IP) targeting acetyl-lysine residues to enrich lysine-acetylated proteins, followed by immunoblotting for the Flag tag to specifically detect ACSS2 acetylation levels. To ensure the reliability of our results, we included a Flag blot to confirm equal expression levels of ectopically expressed ACSS2 across our samples before IP. Given the nature of our experimental design and the specific aim of investigating ACSS2 acetylation, we believe that additional loading controls beyond the input Flag blot are not required for the interpretation of our results. The inclusion of the input Flag blot serves as a control for protein expression levels, which is crucial for accurate assessment of ACSS2 acetylation status.

      While CHX treatment is known to inhibit protein synthesis, it appears contradictory that CHX treatment in Fig. 2C seemingly leads to ACSS2 accumulation in SIRT2 knockdown HEK293T cells. This discrepancy requires clarification.

      We conducted quantitative analysis of the immunoblot with replicates to ensure the reliability of our findings. Our analysis indicates that the protein level of ACSS2 remains relatively stable over the time course of CHX treatment. The observed slight increase at the 8-hour time point can be attributed to inherent experimental variability, as evidenced by the presence of large error bars in the graph. We have included a graph in Figure S7 to show that there is no significant change in the level of ACSS2 in the SIRT2 HEK293T cells.

      In Fig. 2F-H, the authors argue that SIRT2 deacetylates ACSS2 to facilitate its ubiquitination and subsequent proteasomal degradation. However, these results are depicted under normal conditions, whereas findings in Fig. 1 suggest that SIRT2 deacetylates ACSS2 exclusively under nutrient stress. An explanation for this inconsistency is warranted.

      These experiments were done in amino acid deprived (EBSS) media. We have corrected this in the manuscript.

      Line 160 authors conclude "amino acid limitation..deacetylates K271"..but this was not directly demonstrated. Authors should add this data or change conclusion.

      Addressed in response to some of the comments above.

      Figures 1A and 1B, acetylation quantification, not clear if it is relative to the Flag tag or actin.

      Acetylation quantification is relative to Flag tag. This is clarified in the figure legend.

      Methods section lacking details & not well referenced (how did authors express wildtype & mutant in 3T3-L1 cells?) 

      ACSS2 wildtype and K271R mutant Flag-tagged expression plasmids were transfected into ACSS2 knockdown 3T3-L1 cells using PEI transfection reagent following the manufacturer’s protocol. The pCMV-Tag4a empty vector was used as the negative control. Differentiation of 3T3L1 cell lines were done according to manufacturer’s protocol (DIF001-1KT, Sigma Aldrich) 24 hours after transfection. This has been included in the methods.

      In Figure 3A, is the actin blot from the same immunoblots above it? Reviewers recommend the authors upload original immunoblot.

      This experiment was repeated, and the blot has been replaced.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for your time and consideration on our submission. We also thank the reviewers for their consideration and helpful comments.  We have revised the introduction, results, and discussion sections of the revised manuscript in accordance with the reviewers’ suggestions, which have enhanced the clarity of our work. Specifically, we have clarified that the aim of the study is to report newly discovered sperm behaviours inside the uterus via high resolution deep tissue live imaging, and to stimulate further studies and discussion in the field of postcopulatory sexual selection in mice based on our observations. To the best of our knowledge, many of the specific sperm behaviours described in our manuscript are being reported for the first time, proven through direct observation inside the living reproductive tract.

      We have also restructured our manuscript and moved our hypothetical interpretations based on our experimental observations to the discussion section. We hope that these revisions have clarified our claims and that our revised manuscript effectively communicates the importance of our findings and its values in prompting new questions and insight that encourage further studies. We believe that our work clearly demonstrates the importance of sperm/reproductive tract interaction, which cannot be adequately studied in artificial environments, and may become an important guideline for designing future experiments and studies.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. The authors are trying to distinguish between two hypotheses put forward by others on the role of the sperm hook: (1) the sperm cooperation hypothesis (the sperm hook helps to form sperm trains) vs (2) the migration hypothesis (that the sperm hook is needed for sperm movement through the uterus). They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall. 

      We thank the reviewer for summarizing our work and the critical review of our paper. As summarized, the sperm hook has been primarily associated with the sperm cooperation (sperm hook) hypothesis and the migration hypothesis. However, we would like to emphasize that the aim of our work is not to cross check between the two hypotheses. Our aim was not to disprove either hypothesis, but rather to develop an experimental platform that enables detailed observation of sperm migration dynamics within the live reproductive tract. 

      Through live imaging, we observed both the formation of sperm trains as well as interaction between the sperm and female reproductive tract epithelium. However, in our observations, we could not find advantage in terms of faster movement for the rarely observed sperm trains. While these events were infrequent in our experiments, we are not asserting that the sperm train hypothesis is invalid but rather reporting our observations as is. 

      The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. We have extensively revised the manuscript structure to clarify our findings.

      Strengths: 

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm. 

      Weaknesses: 

      The paper is descriptive and the data are correlations. 

      The data are not properly described in the figure legends. 

      When statistical analyses are performed, the authors do not comment on the trend that sperm from the three males behave differently from each other. This weakens confidence in the results. For example, in Figure 1 the sperm from male 3613 (blue squares) look different from male 838 (red circles), but all of these data are considered together. The authors should comment on why sperm across males are considered together when the individual data points appear to be different across males. 

      Thank you for your comments and suggestions. We have revisited all figure legends and made the necessary amendments (shown in the red-lined manuscript). Please note that, for a better flow of the paper, the previous Figure 1 has been changed to Figure 2 in the revised manuscript.

      Regarding the analysis using different males, we would like to explain the statistics used. We used generalized linear mixed models to test the effect of the Angle and Distance to the wall on the migration kinetic parameters. The advantage of the generalized linear mixed models is that they consider individual variations in the data as an error term, thereby controlling such individual variations. 

      There are two main factors contributing to individual variations. One is, as you pointed out, the difference in sperm from different males. However, we used genetically similar mice, so genetical variations must be minimal. Nonetheless, there must be individual differences that caused variations including age, stress level as well as body conditions. As these factors cannot be controlled, we used the mixed model approach where individual variations are grouped within the individual. This approach enabled us to test the effect of each explanatory variable (Angle and Distance) within an individual. 

      The second factor that could cause variations is the female oestrous status. To avoid artifacts that could influence sperm behaviour, we did not use any invasive methods, such as hormone injections, to control or induce female oestrus. We controlled for this possible effect by including the mating date as a random effect. Since each female was used only once, the mating date reflects the variation caused by each female.

      To provide further verification that the variation between individual males do not affect our results, we conducted analysis per individual male and mating dates (per each female). As clearly shown, sperm data points from individual males or female also show consistent clear correlations with the distance from the uterus wall. As pointed out, while the mean sperm speed could be different between individuals, they are not the topic we are interested in here. Our interest here is the effect of the distance between sperm and the uterine wall. Additionally, the variation between males is not always larger than those effect of the day (female), which in total suggest that integrating male variation is not essential. We have added this information to Supplementary Figure (Fig. S3) of the revised supplementary materials.

      Moving forward, we can also consider the same analysis for the effects of the distance from wall on sperm SWR and LIN (linearity of forward progression) where no statistical significance was found. As see in the following figures, no statistically significant effect of the distance to wall on SWR and LIN are seen in that the regression lines drawn for each male and mating dates.

      In summary, the statistical approach we used here has successfully reflected variations in sperm kinetics from different males as well as the variance from different females. We hope that our explanations and additional analysis answer your concerns. 

      Movies S8-S10 are single data points and no statistical analyses are performed. Therefore, it is unclear how penetrant the sperm movements are. 

      With respect to Movie S8, Figure 4A and B (Figure 5A and B in the current revised manuscript) depict the trajectories of accumulated spermatozoa (sperm trains) in the female uterus, as shown in Movie S8. We have added this information to the revised figure legend (L 293) for clarity. We could not observe sperm trains that moved faster than single sperms during over 100 hours of observation and collection of over 10TB of images. The three sperm trains presented in Fig. 5B were the sperm trains that moved in the head-forward direction. Most other identifiable trains, or clusters, did not move or could not move forward as their heads were entangled randomly. Although we of course agree that a statistical test for Movie S8 (also Fig. 5B) would be great, due to the small number of sperm trains we found, we could not perform meaningful statistical tests. Instead, we provided all data in the box plots in Fig. 5C so that readers can evaluate and understand our points. We believe that this is a more neutral way of presenting our data rather than providing statistical significance.

      Regarding Movies S9 and S10, we are not entirely sure whether we understood your comments clearly. It would be very helpful if you could point out more specifically to the manuscript with line numbers as we would like to address your concerns and suggestions, and we believe that your input will improve our manuscript. We did not describe the penetration of sperm in these movies. Movies S9 and S10 are newly found sperm behaviours inside the UTJ and Isthmus. We observed that sperm beating is influenced by the width of luminal space as well as internal flow as see in Movies S9 and S10. As our animal model only expresses red fluorescence in the midpiece, accurate beating frequency measurement cannot be performed. However, we can clearly observe that beating is not continuous and almost results in a halt with respect to reproductive tract variations. We revised our description about the findings about beating speed changes in the revised manuscript (LL 305-335).  

      Movies S1B - did the authors also track the movement of sperm located in the middle of the uterus (not close to the wall)? Without this measurement, they can't be certain that sperm close to the uterus wall travels faster. 

      We revised the new Movie S1B to include videos that were used for the sperm migration kinetics analysis in Figure 2 (previously Figure 1). As you can see in the movies, the graph, and statistical analysis, there is a clear trend showing spermatozoa migration is slower as a function of distance from the uterus wall. Regarding your comment with respect to the middle of the uterus (not close to the wall), we have added another movie (Movie S1C) that was acquired at different depths from the wall (going towards the centre of the uterus). As clearly seen in Movie S1c, when imaging deeper into the uterus, there are an increasing number of inactive or slow-moving spermatozoa. Since the diameter of the uterus is easily over 2mm, we currently do not have optical access to exactly the centre of the uterus, but for all depths that are observable, spermatozoa near the wall were clearly faster.

      Movie S5A - is of lower magnitude (200 um scale bar) while the others have 50 and 20 uM scale bars. Individual sperm movement can be observed in the 20 uM (Movie 5SC). If the authors went to prove that there is no upsucking movement of sperm by the uterine contractions, they need to provide a high magnification image. 

      The main focus of video S5A, is the intramural UTJ where spermatozoa are located in rows within narrow luminal space (see Author response image 1). When there is up-suck like sperm passive carriage, there must be sperm movement from the uterus to intramural UTJ as in Author response image 1 left. However, there is no such sperm movement could be seen in our observations, as shown in Movie 5A. Importantly, as you can see in Movie 5A, indicated by an arrow from 5 sec to 6 sec, some spermatozoa are moving downward (see also Author response image 1 right). This is the opposite direction of movement with respect to possible up-suck like sperm carriage. 

      Genetical evidence also support up-suck like passive sperm carriage is not the case for sperm migration from the uterus to UTJ. If environmental up-suck like passive transfer plays an important role, it is unlikely that genetically modified spermatozoa cannot pass the entrance of the intramural UTJ (Nakanishi et al., 2004, Biol. Reprod.; Li et al., 2013, J. Mol. Cell Biol.; Larasati et al., 2020, Biol. Reprod.; Qu et al., 2021, Protein Cell). 

      Author response image 1.

      The left image represents what is expected when up-suck like passive sperm carriage occurs. The right image represents what is actually experimentally observed in the intramural UTJ (see Movie S5A). The direction of the arrowheads indicates the direction of sperm movement.

      Movie S8 - if the authors want to make the case that clustered sperm do not move faster than unclustered sperm, then they need to show Movie S8 at higher magnification. They also need to quantify these data. 

      We understand your concern. As shown in Figure 5B, we included all sperm kinetics data of each sperm train and unlinked spermatozoon around the trains as individual dots. The only analysis we did not conduct was a statistical test with the data as it could be erroneous due to the large sample size difference (3 trains vs 181 unlinked spermatozoa). As the medians of the four sperm kinetic parameters are similar except SWR, we concluded that they are not necessarily faster than unlinked single spermatozoa. Since there is no known advantage to spermatozoa (including sperm trains) with intermediate moving speeds for sperm competition – for example in IVF, success fertilization rate is high when faster and active spermatozoa with normal shape are selected (Vaughan & Sakkas, 2019, Biol. Reprod.) – it is questionable whether there can be an advantage to the formation of sperm trains whose speed is not faster than unlinked spermatozoa in our data.

      However, we do not agree with your comment regarding the need for higher magnification. Measurement of the sperm migration speeds (kinetic parameters) does not require measurement of exact tail movements in this study. Only sperm heads were tracked to measure their trajectory and such tracking was better done at low mag. For example, measuring the speed of a car does not need higher magnifications to visualize the rotation of the wheels. Additionally, including the effect of observation magnification on the sperm kinetic parameters for all 4 GLMM models for Figure 2 (Table S3) does not change the result, which shows that magnification is not a factor that influences our analysis. 

      Movie S9C - what is the evidence that these sperm are dead or damaged? 

      Thank you for your valid comment. We tracked sperm movements for at least 10 minutes and such entangled spermatozoa in the UTJ never became re-active. As you can see in the new Movie S9b, entangled spermatozoa were also acrosome re-acted (green acrosome head is gone) while active spermatozoa are responding to peristaltic movement by exhibiting movements within the same video. However, as you pointed out, we did not measure their viability with appropriate dyes. Although we also considered about extracting these spermatozoa and performing viability tests, we could not come up with a way to specifically extract the exact spermatozoa that were imaged. Considering your comments, we changed the term damaged or dead to inactive in the revised manuscript (LL 313-316, Legend Figure 6D. LL 380-384).

      Movie S10 - both slow- and fast-moving sperm are seen throughout the course of the movie, which does not support the authors' conclusion that sperm tails beat faster over time. 

      There must have been a misunderstanding. We did not indicate that sperm beating got faster over time anywhere in the main manuscript, including the figure legend and related movie captions. As correctly pointed out, the sperm beating speed changes over time (not getting faster over time) and shows a correlation with internal fluid flow and width of luminal space (LL 320-332). Please let us know if you meant something else. 

      Reviewer #2 (Public Review): 

      Summary: 

      The specific objective of this study was to determine the role of the large apical hook on the head of mouse sperm (Mus musculus) in sperm migration through the female reproductive tract. The authors used a custom-built two-photon microscope system to obtain digital videos of sperm moving within the female reproductive tract. They used sperm from genetically modified male mice that produce fluorescence in the sperm head and flagellar midpiece to enable visualization of sperm moving within the tract. Based on various observations, the authors concluded that the hook serves to facilitate sperm migration by hooking sperm onto the lining of the female reproductive tract, rather than by hooking sperm together to form a sperm train that would move them more quickly through the tract. The images and videos are excellent and inspirational to researchers in the field of mammalian sperm migration, but interpretations of the behaviors are highly speculative and not supported by controlled experimentation. 

      Thank you for your critical review and valuable comments on our manuscript. As pointed out, some of our findings and suggestions were largely observation based. However, to the best of our knowledge, many of our observations are novel, particularly in the context of live imaging inside the female uterus and reproductive tract. We believe these observations open doors to many questions and follow up studies that can be envisioned based on our findings, which is what drives science forward. 

      That being said, we entirely agree that many follow up experiments need to be designed and performed, especially to validate the exact molecular mechanisms of the observed dynamics. We acknowledge that it is unfortunate we currently lack the proper molecular experimental toolsets to perform further tests. We have removed much of the hypothetical discussions from the results section and moved them to the discussion section. We hope that our revision more clearly defines the observed experimental data and our interpretations.

      Strengths: 

      The microscope system developed by the authors could be of interest to others investigating sperm migration. 

      The new behaviors shown in the images and videos could be of interest to others in the field, in terms of stimulating the development of new hypotheses to investigate. 

      Weaknesses: 

      The authors stated several hypotheses about the functions of the sperm behaviors they saw, but the hypotheses were not clearly stated or tested experimentally. 

      The hypothesis statements were weakened by the use of hedge words, such as "may". 

      We appreciate your helpful comments and have revised our hypotheses and suggestions accordingly. We have removed instances of “may” or revised it to be more direct. We have also moved most of our interpretations and hypotheses from the results to the discussion section. 

      It is important to note that experimental approaches to test what we suggested from our findings in the current ex-vivo observation platform are not trivial and require extensive investigation of several unknown factors of the female reproductive tract. For instance, obtaining detailed information on the chemical characteristics and fluid dynamics in the female reproductive tract is essential to build a microfluidic channel that accurately resembles the uterus and oviduct, replicating what we found in an extracted living entire organ. This poses a significant challenge and requires collaborative expertise from many labs, which we hope to build in the near future. 

      Furthermore, our biggest concern is that, even if we were to construct the appropriate microfluidic channel to test sperm migration, it is very likely that the sperm behaviours that we observed under natural conditions may not be replicated in artificial environments. This raises questions about whether in-silico or in-vitro findings can truly resemble what we reported here using the ex-vivo observation inside a living organ.

      To share our experience related to this difficulty, at the initial stage of our study, we attempted sperm injection combined with fluorescent beads to visualize the fluid flow, as well as dyeing the female reproductive tract and spermatozoa after mating. However, none of these resulted in meaningful results. Another potential approach to perform similar research regarding our claims is using genetical engineering to indirectly confirm the influence of the sperm hook morphology on sperm behaviour. However, such an approach lacks a mechanical demonstration about how the sperm hook interacts with the female reproductive tract. 

      It is unfortunate that the sperm behaviours that we found and reported here are considered as highly speculative. The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, these behaviours include tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. 

      We have extensively revised the manuscript structure to clarify our findings and integrated our points in the introduction. Although we understand our following hypotheses may be considered speculative and the causative relationship between the sperm hook and its role in sperm migration requires further experimental approaches, we believe that the image-based observation of dynamic behaviours of spermatozoa are solid. We believe our findings will facilitate further studies and discussion in the field of studies on postcopulatory sexual selection in rodents.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript is written for an expert in a fairly small field. I recommend that the authors rewrite the manuscript to make it more accessible to people outside of the field. These suggestions include 

      (1) Provide a diagram of the female reproductive tract in Figure 1. 

      a. Indicate where sperm enter the tract and the location of the oocyte they are trying to reach. 

      b. Label all areas of the uterus that are mentioned in this study and be consistent about the label. 

      (2) All movies should have a diagram of the location of the uterus that is being imaged. 

      Thank you for the great suggestion. We have added a diagram of the female reproductive tract in the revised Figure 1A. In response to your comments 1a and b, we have indicated such information by including eggs in the ampulla and arrows that indicate sperm migration direction. We have also labelled the name of the specific areas that were studied in the manuscript.

      We are unsure how to integrate the diagram in all movies without reframing the videos, which could cause serious corruption of the files. More importantly, we think that adding the same diagram to all movies may complicate the visuals and disrupt indications and subject in the movie. Instead, we have referred to the common diagram (Figure 1A) in each movie caption, specifying where the video was taken. Thank you for the suggestion. With this information, we hope readers can now more easily understand where we made the observations. 

      (3) The major questions in the field need to be better described in the introduction. 

      Thank you for your valuable suggestions and specific comments which have greatly helped improve our manuscript. We have revised our introduction and discussion sections by adding more literature reviews and integrating studies across a wider range of the postcopulatory sexual selection, as per your suggestion (LL 34-57, LL 385-398).

      (4) The major question that the authors are trying to address should be described in the introduction. 

      Thank you for the helpful suggestion. We have clarified in the introduction that our aim was to contribute to the field of postcopulatory sexual selection in rodents by advancing methodological progress and to stimulate discussion and future research on the function of the sperm hook in murine rodents (LL 76-94) based on our observations.

      (5) A discussion of the sperm hook should be provided. How many species have this structure (or similar structure)? 

      We have integrated your point into the revised discussion section. Essentially, most murine rodent species have sperm hooks (while their exact shapes differ). However, as there are over 500 species and not all of them have been tested, we do not know exactly how many of them have this structure. Therefore, we included paper references that examined species variations in sperm hook characteristics and their possible correlation with sperm competition (LL 385417) in the discussion. Additionally, we also included papers by Breed (2004) and by Roldan et al (1992) that investigated murine rodents with a sperm hook in the introduction section as well (LL 58-61).  

      (6) The figure legends must describe everything in the figure or movie. 

      Thank you for the helpful suggestion. We previously thought that our figure legends may be too long. We have included further information in the figure legends and movie captions. We have also revised the movies by adding some clips following our revision (Movie S1).

      Reviewer #2 (Recommendations For The Authors): 

      Here are some specific concerns I had about the clarity of approach to experiments and interpretations of results. 

      In the Introduction, the authors stated that the study was intended to determine the function of the hooks on the mouse sperm heads. However, in the Results section, the authors did not explain the rationale for the first set of experiments with respect to the overall objective of the study. In this experiment, the authors measured the velocities of sperm swimming in the uterus and found that the sperm moved faster when closer to the uterine wall (VCL, VSL). They concluded that migration along the uterine wall "may" be an efficient strategy for reaching the entrance to the uterotubal junction (UTJ) and did not explain how this related to the function of the hooks. 

      Thank you for your critical comment and guidance. We have changed the order of Figure 1 and Figure 2 and revised the result section to integrate your points. At the initial stage of the study, we expected to find evidence of the function of sperm trains in aiding sperm migration in the female uterus (which has not been observed in the live uterus; previous works were done invitro with extracted sperm from epididymis or uterus after mating). However, what we found was something unexpected: dynamic sperm hook related movements facilitating sperm migration inside the female uterus by playing a mechanical role in sperm interaction with the uterine wall. These results that were presented in the previous Figure 2 has been reorganized as the new Figure 1.

      Based on this observation, our research later moved to clarify whether such sperm-epithelium interaction indeed helps sperm migration. This led us to measure sperm kinetics in relation to their distance and angle to the uterine wall. We have revised our introduction and result parts by integrating these points. We hope that our revision will answer your questions. We have also reduced the use of ‘may’ or ‘can’ in the results section. In the revised manuscript, we have moved such hypotheses to the discussion section and focused on what we observed in the results section.

      The authors proposed that the sperm hook "may" play a crucial role in determining the direction of migration. When sperm encountered a uterine wall, significantly more changed migration direction toward the pro-hook direction than toward the anti-hook direction. In Figure 2B, sperm behavior is not visually understandable nor clearly explained. 

      Thank you for the helpful comments. We have removed “may” and “might” to make our claim clearer and more concise. We have also revised the previous Figure 2B by combining it with the previous Figure 2C (they have been combined into Figure 1C now). We have also revised Figure 1B by increasing the line thickness of the sperm trajectory of the pro-wall-hook direction and added the anti-wall-hook trajectory. We hope that these revisions make the figure easier to understand.

      In Figure 2E, are the authors showing that the tip of the hook is caught between two epithelial cells? Please clarify the meaning of this figure. 

      Please clarify the difference between "tapping" and "anchoring". 

      Thank you for the detailed comments. As you pointed out, we currently have no evidence whether sperm can be caught in epithelia inter-cellular gaps. We have revised this source of confusion by removing the gap in the revised figure (Figure 1E). We have also included the definition of anchoring (LL 142-143) and tapping (LL 128-130). Anchoring facilitates the attachment of sperm to the uterine epithelia. Such anchoring also involves the catching of the sperm head in the inter-mucosal fold or gap, particularly at the entrance of the intramural UTJ at the end of the uterus. Tapping is the interaction between the head hook and epithelia in which the sperm hook is tapping (or patting) on the surface. Sperm tapping can be a byproduct that results from flagella beating when spermatozoa migrate toward the pro-wall-hook direction along the uterine wall (epithelia) or can play some role in sperm migration. As we currently cannot draw a conclusion, we did not integrate the possible function of the tapping in the manuscript.

      The authors proposed that opposite sliding of neighboring mucosal folds lining the UTJ would cause small openings to form, through which only perhaps one sperm at a time could enter and pass through the UTJ into the uterus. This hypothesis was not actually tested. 

      Imaging inside deep tissue is challenging due to light scattering as it penetrates through biological tissue. While this is also true for the uterus, the intramural UTJ is especially difficult to image because the UTJ consists of several thick muscle and cell layers (see Movie S5A). Another challenge is that the peristaltic movement of the UTJ results in constant movement, making continuous tracking of single sperms while passing through the entirety of the UTJ impossible in our current experiments. We have moved this hypothesis to the discussion section and restated that this is a pure hypothetical model (LL 399-406). We hope that our model encourages the community in designing or establishing an improved ex-vivo observation system that may be able to test this hypothetical model in the near future.

      Next, the authors hypothesized that sperm that encounter the small openings in the UTJ may then be guided onward and the hooks could prevent backward slipping. This was also not tested. 

      As you’ve noted, the function of the sperm hook that aids in sliding and preventing backward slipping could not be tested directly in our ex-vivo observation platform that relies on natural movement of the living organ. However, we believe that these limitations also highlight the importance of continued research and the development of more advanced methodologies in this field.

      We would also like to note that we provide direct observations of spermatozoa resisting internal flow due to reproductive tract contractions in Movie S3A, B as well as Movie S5B. We referred to these movies and pointed out the role of anchoring (sperm attachment) in preventing sperm from being squeezing out (LL 140-149, LL 224-241). Unfortunately, we cannot conceive of how this behaviour can be tested additionally in any uterus-resembling microfluidic device or ex-vivo systems. In line with your suggestion, we have rewritten the related result section and moved our related discussions in the result part to the discussion section (LL 224-241, LL 399-417). 

      The authors observed that large numbers of uterine sperm are attached to the entrance of the UTJ. Some sperm clustered and synchronized their flagellar beating. The authors speculated that this behavior served to push sperm in clusters onward through the UTJ. 

      We would like to note that we did not speculate that sperm clustering and their synchronization could serve to push spermatozoa in a cluster to move onward through the UTJ. We only pointed out our observation in recorded videos, that generative flow from the clustered spermatozoa pushed away other spermatozoa as seen in Movie S7 (LL 261-264). Although such sperm cooperation is possible (blocking passage of later sperm), we cannot draw that conclusion from our observation. The possibility you pointed out (pushing sperm onward through the UTJ) was suggested by Qu et al in 2021 [Cooperation-based sperm clusters mediate sperm oviduct entry and fertilization, Protein & Cell] based on their observations on cleared dead reproductive tracts.

      The authors found only a few sperm trains in the uterus, UTJ, and oviduct, so they could not measure sufficient numbers of samples to test whether sperm trains swim faster than single sperm. Without sufficient data, they concluded that the "sperm trains did not move faster than unlinked single spermatozoa." 

      We would like to take this opportunity to clarify our claims. We do not claim that our current experiments can give the final verdict on whether the sperm train hypothesis for faster swimming is correct or not. The phrase “sperm trains did not move faster” was not intended to mean that the sperm train hypothesis is invalid.  We did not draw a conclusion but dryly described the experimental data that we observed (LL 279-286).  We would once again like to emphasize that the main claim of our manuscript is not to rule out the sperm train hypothesis, but to present the various dynamic interactions of the sperm head with the female reproductive tract. To make the statement more balanced, we revised the sentence as “observed sperm trains did not move faster or slower than unlinked single spermatozoa” (LL 281-282).

      The authors hypothesized that the dense sperm clusters at the entrance into the UTJ could prevent the rival's sperm from entering the UTJ (due to plugging entrance and/or creating an outward flow to sweep back the rival's sperm), but they did not test it. 

      We agree that we were not able to test such possible function of the sperm cluster at UTJ entrance. Following your concerns, we revised the result part (LL 256-264) by removing most of our discussions related to the observed phenomena. We also integrated some interpretation rather to the discussion section (LL 421-437) and suggested that future works using appropriate microfluidic channel designs or sequential double mating experiments may be performed for additional tests (LL 443-447). However, we would like to point out that Movie S7C clearly shows surrounding sperms that are swept away from the sperm clusters. Since the sperm density is high, this is almost equivalent to a particle image velocimetry experiment, and we can clearly see the effect of the outward flow generated by the sperm clusters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      This valuable study combines multidisciplinary approaches to examine the role of insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a potential novel host dependency factor for Zika virus. The main claims are partially supported by the data, but remain incomplete. The evidence would be strengthened by improving the immunofluorescence analyses, addressing the role of IGF2BP2 in "milder" infections, and elucidating the role of IGF2BP2 in the biogenesis of the viral replication organelle. With the experimental evidence strengthened, this work will be of interest to virologists working on flaviviruses.

      We thank the reviewers for their feedback and constructive suggestions. In this revised version of the manuscript, we have addressed the reviewer’s comments to the best of our ability as detailed below. We believe that the newly incorporated data strengthens our study and conclusions. We hope that this revised manuscript will satisfy the reviewers and will be of high interest to flavivirologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the co-option of IGF2BP2, an RNA-binding protein by ZIKV proteins. Designed experiments evaluated if IFG2BP2 co-localized to sites of viral RNA replication, interacted with ZIKV proteins, and how ZIKV infection changed the IGF2BP2 interactome.

      Strengths:

      The authors have used multiple interdisciplinary techniques to address several questions regarding the interaction of ZIKV proteins and IGF2BP2.

      The findings could be exciting, specifically regarding how ZIKV infection alters the interactome of IGF2BP2.

      We thank the reviewer for acknowledging the multidisciplinary approach of our study and its exciting potential.

      Weaknesses:

      Significant concerns regarding the current state of the figures, descriptions in the figure legends, and the quality of the immunofluorescence and electron microscopy exist.

      In this new version of the manuscript, we have improved the quality of the microscopy data and included the requested information in the figure legends as described below in the Recommendations section.

      Reviewer #2 (Public Review):

      Clément Mazeaud et al. identified the insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a proviral cellular protein that regulates Zika virus RNA replication by modulating the biogenesis of virus-induced replication organelles.

      The absence of IGF2BP2 specifically dampens ZIKV replication without having a major impact on DENV replication. The authors show that ZIKV infection changes IGF2BP2 cellular distribution, which relocates to the perinuclear viral replication compartment. These assays were conducted by infecting cells with an MOI of 10 for 48 hours. Considering the ZIKV life cycle, it is noteworthy that at this time there may be a cytopathic effect. One point of concern arises regarding how the authors can ascertain that the observed change in localization is a consequence of the infection rather than of the cytopathic effect. To address this concern, shorter infection periods (e.g., 24 hours post-infection) or additional controls, such as assessing cellular proteins that do not change their localization or infecting with another flavivirus lacking the IGF2BP2 effect, could be incorporated into their experiments.

      We thank the reviewer for these relevant comments regarding the specificity of IGF2BP2 relocalization to the ZIKV replication compartment.

      It is noteworthy that we chose the 2-day post-infection time point for our analyses because it corresponds to the peak of replication with much more titers produced compared to those at 24 hours post-infection (generally ~106 PFU/mL vs. ~104 PFU/mL). Consistently, the abundance of viral replication factories is more obvious at this time-point. A MOI of 5-10 was chosen to maximize the % of infected cells. That said, as suggested by the reviewer, we have analyzed the distribution of IGF2BP2 in ZIKV-infected cells at one-day post-infection, and we provide evidence in Figure S1 that IGF2BP2 relocalizes to the dsRNA-containing compartment at this time point.

      Importantly, we now show in Figure S5 that in contrast to IGF2BP2, other host RNA-binding proteins such as LARP1 and DDX5 do not accumulate to ZIKV replication compartment at 2 days post-infection. LARP1 actually seems to be excluded from it while DDX5 remains nuclear. Of note, consistent with the ZIKV-induced decrease in expression observed in western blots (Fig 4A), the intensity of DDX5 signal decreases in infected cells. Altogether, this demonstrates that the IGF2BP2 relocalization phenotype is specific and is not due to ZIKV-induced cell death.

      By performing co-immunoprecipitation assays on mock and infected cells that express HAtagged IGF2BP2, the authors propose that the observed change in IGF2BP2 localization results from its recruitment to the replication compartment by the viral NS5 polymerase and associated with the viral RNA. Given that both IGF2BP2 and NS5 are RNA-binding proteins, it is plausible that their interaction is mediated indirectly through the RNA molecule. Notably, the authors do not address the treatment of lysates with RNase before the IP assay, leaving open the possibility of this indirect interaction between IGF2BP2 and NS5.

      We agree with the hypothesis of the reviewer. As suggested, we have performed coimmunoprecipitation assays following RNase A treatment of the cell lysates. As shown in new Fig S6, the abundance of ZIKV NS5 co-immunoprecipitating with IGF2BP2-HA is drastically decreased upon RNase A treatment compared to the untreated condition. This demonstrates that the IGF2BP2/NS5 interaction is mostly RNA-dependent, which is not surprising as RNA is often a structural component of ribonucleoprotein complexes. Of note, the same is observed with ATL2. This new set of data allows us to refine our model of Figure 11 and the discussion as they strongly suggest that the direct binding of IGF2BP2 to viral RNA (evidenced in vitro; Fig 5D) is required for subsequent association with NS5 and ER-shaping protein ATL2. This is in line with the fact that viral RNA is a co-factor in the biogenesis of ER-derived ZIKV vesicle packets (PMID: 32640225). However, we cannot exclude a contribution of cellular RNA in these processes as discussed.   

      In in vitro binding assays, the authors demonstrate that the RNA-recognition motifs of the IGF2BP2 protein specifically bind to the 3' nontranslated region (NTR) of the ZIKV genome, excluding binding to the 5' NTR. However, they cannot rule out the possibility of this host protein associating with other regions of the viral genome. Using a reporter ZIKV subgenomic replicon system in IGF2BP2 knock-down cells, they additionally demonstrate that IGF2BP2 enhances viral genome replication. Despite its proviral function, the authors note that the "overexpression of IGF2BP2 had no impact on total vRNA levels." However, the authors do not delve into a discussion of this latter statement.

      We agree with the reviewer’s comments. We now mention in the discussion that we cannot exclude the possibility that IGF2BP2 associates with RNA motifs within the coding region of the viral genomic RNA, especially considering that it contains N6A-methylated sequences (PMID: 27773535; 27773536; 29373715). Moreover, we discuss the observation that IGF2BP2 overexpression has no impact on vRNA levels (as well as titers). We believe that this is because endogenous IGF2BP2 is highly expressed in cancer cells such as the Huh7.5 and JEG-3 cells used here and is presumably not limiting for viral replication in our system (PMID: 38320625; 35111811; 34309973; 35023719; 37088822; 33224879; 35915142).

      In this study, the authors extend their findings by illustrating that ZIKV infection triggers a remodeling of IGF2BP2 ribonucleoprotein complex. They initially evaluate the impact of ZIKV infection on IGF2BP2's interaction with its endogenous mRNA ligands. Their results reveal that viral infection alters the binding of specific mRNA ligands, yet the physiological consequences of this loss of binding in the cell remain unexplored. 

      We acknowledge that it would be of interest to further study the physiological relevance of the modulation of IGF2BP2 ribo-interactome. Since we have focused here on the role of IGF2BP2 in viral replication, we feel that this will be the focus of future studies notably involving a larger omic-centered approach to identify the most impacted IGF2BP2 mRNA ligands. Of note, Gokhale and colleagues have already reported that CIRBP, TNRC6A and PUM2 proteins regulates the replication of Flaviviridae (PMID: 31810760).

      Additionally, the authors demonstrate that ZIKV infection modifies the IGF2BP2 interactome. Through proteomic assays, they identified 62 altered partners of IGF2BP2 following ZIKV infection, with proteins associated with mRNA splicing and ribosome biogenesis being the most represented. In particular, the authors focused their research on the heightened interaction between IGF2BP2 and Atlastin 2, an ER-shaping protein reported to be involved in flavivirus vesicle packet formation. The validation of this interaction by Western blot assays prompted an analysis of the effect of ZIKV on organelle biogenesis using a newly described replication-independent vesicle packet induction system. Consequently, the authors demonstrate that IGF2BP2 plays a regulatory role in the biogenesis of ZIKV replication organelles.

      Based on these findings and previously published data, the authors propose a model outlining the role of IGF2BP2 in ZIKV infectious cycle, detailing the changes in IGF2BP2 interactions with both cellular and viral proteins and RNAs that occur during viral infection.

      The conclusions drawn in this paper are generally well substantiated by the data.

      We thank the reviewers for this encouraging general comments on our study.

      However, it is worth noting that the majority of infections were conducted at a high MOI for 48 hours, spanning more than one infectious cycle. To enhance the robustness of their findings and mitigate potential cell stress, it would be valuable to observe these effects at shorter time intervals, such as 24 hours post-infection.

      As explained above, IGF2BP2 relocalization to the (dsRNA-enriched) replication compartment was also observed in ZIKV infected cells at one day post-infection.

      Furthermore, the assertion regarding the association of IGF2BP2 with NS5 could be strengthened through additional immunoprecipitation (IP) assays. These assays, performed in the presence of RNAse treatment, would help exclude the possibility of an indirect interaction between IGF2BP2 and NS5 (both RNA-binding proteins) through viral RNA, thus providing more confidence in the observed association.

      See above for our answer and the description of the new data of Fig. S7.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Mazeaud and colleagues pursued a small-scale screen of a targeted RNAi library to identify novel players involved in Zika (ZIKV) and dengue (DENV) virus replication. Loss-of-function of IGF2BP2 resulted in reduced titers for ZIKV of the Asian and African lineages in hepatic Huh7.5 cells, but not for either of the four DENV serotypes nor West Nile virus (WNV). The phenotype was further confirmed in two additional cell lines and using a ZIKV reporter virus. In addition, using immunoprecipitation assays the interaction between IGF2BP2 and ZIKV NS5 protein and RNA genome was detected. The work addressed the role of IGF2BP2 in the infected cell combining confocal microscopy imaging, and proteomic analysis. The approach indicated an altered distribution of IGF2BP2 in infected cells and changes in the protein interactome including disrupted association with partner mRNAs and modulation of the abundance of a specific set of protein partners in IGF2BP2 immunoprecipitated ribonucleoprotein (RNP) complexes. Finally, based on the changes in IGF2BP2 interactome and specifically the increment in the abundance of Atlastin 2, the biogenesis of ZIKV replication organelles (vRO) is investigated using a genetic system that allows virus replication-independent assembly of vRO. Electron microscopy showed that knockdown of IGF2BP2 expression reduced the number of cells with vRO.

      Strengths:

      The role of IGF2BP2 as a proviral factor for ZIKV replication is novel. The study follows a logical flow of experiments that altogether support the assembly of a specialized RNP complex containing IGF2BP2 and ZIKV NS5 and RNA genome.

      We thank the reviewer for their positive feedback on our study and its novelty.

      Weaknesses:

      The statistical analysis should clearly indicate the number of biological replicates of experiments to support statistical significance.

      This information has been included in all figure legends.

      The claim that IGF2BP2 knockdown impairs de novo viral organelle biogenesis and viral RNA synthesis is built upon data that show a reduction in RNA synthesis <0.5-fold as assessed using a reporter replicon, thus suggesting a limited impact of the knockdown on RNA replication.

      We agree that a 50% decrease in the replication of our reporter replicon might be considered mild. However, we want to pinpoint that in an infectious set-up, the phenotypes were higher as demonstrated by an 80% decrease in viral particle production even when IGF2BP2 levels were never depleted more that 80% compared to endogenous levels. Moreover, our findings were validated through the analysis of de novo vRO biogenesis by electron microscopy in a replication-independent set-up. Together, these experiments provide compelling evidence for a role for IGF2BP2 in the early stages of viral genome replication.

      Validation of IGF2BP2 partners that are modulated upon ZIKV infection (i.e. virus yield in knocked down cells) can be relevant especially for partners such as Atlastin 2, as the hypothesis of a role for IGF2BP2 RNP in vRO biogenesis is based on the observed increase in the abundance of Atlastin 2 in the RNP complex preciìtated from infected cells.

      First, we would like to emphasize that the proviral role of ATL2 in flavivirus replication, including links to vRO biogenesis, was already reported in two independent studies notably by one of the co-authors (PMID: 31636417; 31534046). Therefore, we have chosen to discuss these previous studies in the manuscript rather than repeating published experiments.  Second, we agree that it would be interesting to further interrogate the role of modulated IGF2BP2 protein partners in ZIKV replication. However, these experiments would constitute a new project per se involving fastidious RNAi-based phenotypic screening and subsequent functional characterization of the identified hits. Therefore, this will be the focus of follow-up studies.  

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      All IFAs claimed that showing co-localization is minimal, this needs to be addressed.

      We have performed colocalization analyses for relevant images in the revised manuscript (see below and Figs. 4B, 5A, S4A-C and S5A-D. Although this quantification increases confidence in our analysis, we were still cautious in our conclusions, stating that colocalization was partial and that IGF2BP2 accumulates in the replication compartment.

      Western blots and IPs need to be quantified.

      As requested, we have included WB quantification in Figs. 2A, 4A, 4D, 8B-D, S6C and S7D.

      Figure 1: What is the strain background for the ZIKV reporter virus?

      As indicated in the legend of Figure 1E of the primary submission, the Rluc-expressing ZIKV reporter virus (ZIKV-R2A) was based on the FSS13025 isolate (Asian lineage)(PMID: 27198478). To clarify this, we have also indicated the strain background in the main text of the Results and Material & Methods sections.

      Figure 2A: If shGF2BP2 reduces viral titer, the NS3 should show a reduction in 2A, but it doesn't.

      We agree with the reviewer. Although NS3 seems not to be decreased upon IGF2BP2 knockdown in the experiment initially shown in Figure 2A, it should be noted that our homemade rat anti-NS3 antibody is highly sensitive, leading to signal saturation that makes it challenging to distinguish changes in NS3 expression without diluting substantially the lysate sample before the PAGE-SDS. The initial reason for including Fig 2A was not to make a statement about viral protein expression but to validate IGF2BP2 knock-down efficiency. Conclusions about NS3 levels in the initial figure are further complicated by the high MOI of ZIKV was used in Huh7.5 cells which are not quantitative for viral replication measurements. To address this issue, we assessed the impact of IGF2BP2 knockdown on viral protein abundance (as a read-out of overall viral replication) with a lower MOI of ZIKV. The results of the repeat experiment (seen in the new Fig. 2A) show that IGF2BP2 knockdown leads to a decrease in the abundance of NS4A, NS5 and NS3, which is consistent with the titer decrease phenotypes.

      Figure S3: The re-localization claimed is minimal and does not show overlap with NS3. The dsRNA is difficult to see here. Suggest improving the immunofluorescence images and reducing the claim for "strong" co-option of RNP complexes.

      In addition to replication complexes, NS3 labels convoluted membranes which are devoid of dsRNA and IGF2BP2 and surround the cage-like replication compartment as large puncta (PMID: 27545046; 33432690; 28249158). The signal overlap is more obvious between IGF2BP2 and NS3/dsRNA-containing areas, which is reflected by the Mander’s coefficients that have been included in the revised version (Fig. S5C-D). We have also adjusted the text to conclude that the colocalization was partial and that IGF2BP2 accumulated in the replication compartment. We acknowledge that the dsRNA signal is weak, and we have updated the images (and others, when relevant) to better visualize this viral component. Moreover, we have rephrased the sentence to remove the word “strongly”.

      Figure 4A: Western blot needs quantification.

      This is now included in the figure.

      Figure 4B: As in many of the IFAs, the co-localization is only partial. Additionally, the dsRNA is not visible. So the images need to be improved. The colocalization should be quantified across the cell diameter.

      We changed the color and intensity of the dsRNA staining to make it more visible. Mander’s colocalization coefficients have been determined and included in Figures 4B and S5C-D.

      Figure 4C: It is difficult to understand what the +/- is on the blots for the cell extracts and the anti-HA IP samples. It is not described in the figure legend or the text.

      As already indicated on the right of the panel, the +/- indicates whether or not IGF2BP2-HA was overexpressed in the cells. In the revised version, this is clarified in the figure legend.

      Figure 5A: Once again similar to other IFAs, the co-localization is only minimal and thus difficult to claim as "co-localization" is actually happening. It would be good to either improve the images or discuss this observation in the text and reduce the claim of colocalization. Specifically, since the two proteins might be co-localizing in specific regions which would make it a very interesting observation. Also, quantification of co-localizing regions would be beneficial.

      We have included the requested colocalization analysis. We have been cautious to indicate that colocalization was only partial. It is noteworthy that, despite many efforts in the optimization of the cell permeabilization procedure, we noticed that the FISH probes were not very efficient in accessing the perinuclear area of the infected cells, where replication complexes accumulate. In that respect, it is likely that this imaging approach “miss” some of the IGF2BP2/vRNA complexes and that the determined colocalization factor is underestimated. This explains why the confirmation of the vRNA/IGF2BP2 complex with a biochemical approach (Fig. 5B) was very relevant.

      Figure 5D: It is unclear what the blue squares represent. Clearer figure legends and text would be beneficial.

      As stated in the initial figure, the blue squares indicate values obtained with the ZIKV 5’ UTR probe while the green circles involve a 3’ UTR probe. We have further emphasized this information in the figure legend to make it clearer.

      Figure 6B. The graph is missing the data and X-axis label for shIGF2BP2.

      We had initially omitted the values of the conditions with shIGF2BP2 and the replicationdead GAA replicon, since this viral system does not allow accumulation of viral genomes or proteins and was not relevant at the 48h time point. We thought that the inclusion of the shNT/GAA condition was enough an internal negative control of viral replication since values for shIGF2BP2/GAA did not exceed background. Nevertheless, we have now included this condition in the revised figure.

      Figure 7D: It is unclear what the -/+ signs are in the cell extracts and the IP blots. Specifically, since there is an NS5 signal in the (-) lanes.

      As explained above, the +/- indicates whether IGF2BP2-HA was overexpressed. The meaning of these symbols is now further clarified in the figure legend.

      Figure 8C: The circles with the different colors are not clearly described. What does it mean?

      As indicated in the figure (left part), the red and green circles identify the partners of the STRING network whose association with IGF2BP2 is decreased and increased during infection, respectively. We have included this information in the figure legend.

      Figure 9: The electron microscopy to quantify vesicles should be carried out using whole-cell tomography in order to get the most accurate quantification of the vesicles following different treatments. This is because if you only look at one cell profile (slice), the number of vesicles might be less in that profile and more in another below or above it. It is unclear how many cell profiles were used for the quantification and how the calculations were carried out.

      We agree with the reviewer that ideally, one should perform 3D electron tomography to precisely assess the morphology of VPs. Regardless the fact that we do not possess the imaging infrastructure to perform that type of analysis, such an approach would represent a tremendous amount of work if one would like to process at least 200-400 vesicles from > 50 cells and their whole cytoplasm (as we did). Despite not having 3D images, this number of data points is sufficient to see general changes in viral replication vesicle morphology, especially considering that Huh7-Lunet cells are relatively flat cells. (PMID: 32640225; 36700643; 34696522; 31636417). Furthermore, since IGF2BP2 knockdown decreases the abundance of VPs and does not impact their diameter, we believe that the addition of sophisticated 3D analysis would not bring any new and relevant information and that the TEM data stand by themselves for the conclusion we made. A more refined morphological analysis to determine how IGF2BP2 is structurally involved in virus-mediated membrane reorganization could be the focus of a future study.

      We feel that we have already provided sufficient information about the quantification in the Material & Methods section of the first version of the manuscript: “Quantification was performed by systematically surveying cells and evaluating the presence of VPs. Only cells with >2 VPs were considered as positive. For each condition, >50 cells were surveyed over 4 biological replicas. All observed VPs were imaged, and VP diameters were determined using ImageJ by measuring the distance across two axes and averaging”.

      Reviewer #2 (Recommendations For The Authors):

      The inclusion of a control in the knock-down and infection assays with the reporter virus could enhance the validity of the findings. Introducing STAT2 knockdown, a recognized antiviral protein for ZIKV, as a control would provide a valuable benchmark to evaluate the extent of viral enhancement in the experiments. This additional control not only supports the proposed function of LARP1 in virus assembly/release but also strengthens the overall interpretation of the results.

      We agree that adding a positive control could have been relevant for assessing the extent of replication modulation, especially for increases such as that observed with shLARP1. However, finding such control proteins in our system was a challenge. Indeed, STAT2 would not have been a good control for these experiments since we used Huh7.5 cells for the RNAi mini-screening, which do not express a functional RIG-I protein, and generally do not produce type I and III interferons. Thus, STAT2 knockdown is not expected to result in an increase in replication. That said, we feel that it was unnecessary to include a control for replication inhibition here given that only a few statistically reliable candidates we obtained. Instead, we have opted for an extensive secondary validation approach by assessing the proviral role of IGF2BP2 for multiple viruses - DENV1-2-3-4, WNV and SARS-CoV-2, and 3 ZIKV strains in three relevant cell types.

      Additionally, in Figure S4, the authors employ an antibody against NS5 that specifically recognizes ZIKV NS5 but not DENV NS5. Given the objective of highlighting distinctions between these two viruses, it is advisable to use an antibody that detects DENV NS5 as well. This approach would contribute to a more comprehensive comparison, ensuring a balanced representation of both viruses in the experimental analysis.

      We thank the reviewer for this relevant suggestion. We have repeated the coimmunoprecipitation assays using antibodies specific to DENV NS5 (Aithor response image 1). While we specifically pulled down ZIKV NS5 with IGF2BP2-HA as expected, this was not the case for DENV NS5 when using extracts from DENV-infected cells despite our multiple attempts. Indeed, the amount of pulled-down DENV NS5 with IGF2BP2-HA was always comparable to that in the negative control (“empty” pWPI lentivirus-transduced cells, “-“ condition), which corresponds to non-specific binding to the HA-resin. Thus, while the antibody was very efficient at detecting DENV NS5 in the cell extracts, no specific binding between DENV NS5 and IGF2BP2-HA could be evidenced. Consistent with our different replication phenotypes between DENV and ZIKV, this strongly supports that the NS5/IGF2BP2 interaction is specific to ZIKV. The specificity of the IGF2BP2 interaction with ZIKV NS5 compared to DENV NS5 is discussed in the updated manuscript.

      Author response image 1.

      DENV NS5 is not specifically co-immunoprecipitated with IGF2BP2-HA in contrast to ZIKV NS5. Huh7.5 cells stably expressing IGF2BP2-HA (+) and control cells (-) were infected with ZIKV H/PF/2013 at a MOI of 10 or left uninfected. Two days later, cell extracts were prepared and subjected to RNase A treatment (+) or not (-) before anti-HA immunoprecipitations. The resulting complexes were analyzed by western blotting for their abundance in the indicated proteins.

      Reviewer #3 (Recommendations For The Authors):

      (1) Statistical analysis. Please clearly indicate what columns and error bars represent for bar graphs such as those presented in Figures 1A-D and F, Figures 2B-C, and bottom panels in DE, Figure 3, Figure 5B, Figure 6B-C, and Figures 9B-D and F. For instance, the mean of n independent experiments and standard deviation.

      Information about the number of replicates, error bars, and statistical tests has been added for all figures in the legends. 

      (2) What is the scale in the Y-axis of Figure 2C? As shown, it is difficult to know what is the virus titer in knocked-down cells. Please use a linear scale or a log scale.

      This is a linear scale of viral titers, which we have modified to make it clearer for the reader.

      (3) Throughout the manuscript (e.g. Figures 1, 2, and 3) the fold reduction in titer is presented instead of the actual virus titers. I suggest showing the titer as it may be much more informative for the reader.

      We prefer showing the data as fold reduction as they better reflect the IGF2BP2 knockdowninduced phenotypes across the independent biological replicates. Indeed, from one experiment to another, the reference titers in the control condition sometimes varies because of the cell passage or the lentiviral transduction efficiency for instance, especially when low multiplicities of infection are used. However, the reduction phenotype in foldchange observed upon IGF2BP2 knockdown was always consistent regardless of the titer value.  Of note, all considered experiments had reference titers above 105 PFU/mL.

      (4) Is it possible to perform a colocalization analysis of confocal images showing overlapping signals?

      This has been done and the results of these analyses are included in the updated figures 4B, 5A, S4 and S5.

      (5)  Assessing the effect of Atlastin2 knockdown in virus yield and showing coimmunoprecipitation of Atlastin 2 with NS5 can add relevant information.

      As mentioned in the discussion and above, ATL2 was already reported to be required for DENV and ZIKV replication in two independent studies (including one by one of the coauthors)(PMID: 31636417; 31534046). We have not tested whether ATL2 associates with NS5. However, new Fig. S7 of the revised manuscript shows that IGF2BP2/ATL2 is RNAdependent. This suggests that, as initially depicted in our model, IGF2BP2 associates with the ER (and thus, ATL2) after its binding to the viral RNA. Further interrogation into the role of atlastins in the flavivirus replication cycle is the focus of another ongoing IGF2BP2-unrelated study from one of the co-authors which will be reported elsewhere.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript reports useful findings by resolving the crystal structure of Sedoheptulose-1,7-Bisphosphatase (SBPase) from the green algae Chlamydomonas reinhardtii, which is involved in the Calvin cycle. The data presented are solid based on validated methodologies, which help in understanding the structure and function of this enzyme.

      We thank the editors for this positive assessment.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Le Moigne and coworkers shed light on the structural details of the Sedoheptulose-1,7-Bisphosphatase (SBPase) from the green algae Chlamydomonas reinhardtii. The SBPase is part of the Calvin cycle and catalyzes the dephosphorylation of sedoheptulose-1,7-bisphosphate (SBP), which is a crucial step in the regeneration of ribulose-1,5-bisphosphate (RuBP), the substrate for Rubisco. The authors determine the crystal structure of the CrSBPase in an oxidized state. Based on this structure, potential active site residues and sites of post-translational modifications are identified. Furthermore, the authors determine the CrSBPase structure in a reduced state revealing the disruption of a disulfide bond in close proximity to the dimer interface. The authors then use molecular dynamics (MD) to gain insights into the redox-controlled dynamics of the CrSBPase and investigate the oligomerization of the protein using small-angle X-ray scattering (SAXS) and size-exclusion chromatography. Despite the difference in oligomerization, disruption of this disulfide bond did not impact the activity of CrSBPase, suggesting additional thiol-dependent regulatory mechanisms modulating the activity of the CrSBPase.

      We thank reviewer 1 for his/her careful reading of our manuscript.

      The authors provide interesting new findings on a redox-mechanism that modulates the oligomeric behavior of the SBPase, however without investigating this potential mechanism in more detail. The conclusions of this manuscript are mostly supported by the data, but they should be more carefully evaluated in respect to what is known from other systems as e.g. the moss Physcomitrella patens. This is especially of interest, as SBPase was previously reported to be dimeric, whereas for FBPase a dimer/tetramer equilibrium has been observed.

      We thank reviewer 1 for his/her comments on the novel or confirmatory character of our structure-function analysis onCrSBPase. We address the questions of oligomeric states later in this response.

      (1) Given that PpSBPase has been already characterized in detail, the authors should provide a more rigorous comparison to the existing data on SBPases. This includes a more conclusive structural comparison but also the enzymatic assays should be compared to the findings from P. patens. Do the authors observe differences between the moss and the chlorophyte systems, maybe even in regard to the oligomerization of the SBPase?

      Indeed, a previous study conducted by one of the authors of the current manuscript (Stéphane D. Lemaire) and collaborators determined the structure and regulatory properties of SBPase from the moss Physcomitrella patens (Gütle et al. 2018 https://doi.org/10.1073/pnas.1606241113). We added a clearer reference to this earlier work. The differences that we observed regarding the oligomeric states of SBPase from Chlamydomonas reinhardtii principally stem from our analytical method in vitro through size-exclusion chromatography, in comparison with crystal packing analysis in the reference study. We detailed PpSBPase/CrSBPase oligoimeric state comparison in the paragraph 'Oligomeric states of CrSBPase'. Besides, the asymmetric unit of our CrSBPase crystal structure is also a homodimer, similarly to PpSBPase, and we suggest that PpSBPase is also likely to adopt several oligomeric states in vitro. If this were confirmed by experiments, SBPase in several organisms would behave analogously to FBPase regarding the dimer/tetramer equilibrium.

      In paragraph 'Crystal structure of CrSBPase' we added a comparison by alignment of our CrSBPase crystal structure to the previously reported _Pp_SBPase crystal structure, stating that with RMSD=0.478 Å the proteins are essentially identical.

      In paragraph 'CrSBPase enzymatic activity' we compared the value we obtained for enzyme specific activity to those previously published on other SBPase from Chlamydomonas or the land plant Spinacia oleracea, highlighting the similarity of results in three different systems and teams (Seuter et al. 2002 https://doi.org/10.1023/A:1019297521424 and Tamoi et al. 2005 DOI: 10.1271/bbb.69.848).

      (2) The authors should include the control experiments (untreated SBPase) and the assays performed with mutant versions of the SBPase, which are currently only mentioned in the text or not shown at all.

      We add supplementary figure 14 in order to illustrate that since SBPase C115S or C120S mutants are still activated by reducing agent, the disulfide bridge between cysteines 115 and 120 is not the single control over SBPase activity but rather a control over the oligomeric exchange of the enzyme indirectly contributing to redox activation of the active site.

      (3) The representation of the structure in figures (especially Figures 1 and 3) should be adjusted to match the author's statements. In Figure 1, the angle from which the structure is displayed changes over the entire figure making it difficult to follow especially as a non-structural biologist. Furthermore, important aspects of the structure mentioned in the text are not labeled and should be highlighted, by e.g. a close-up. Same holds true for Figure 3 that currently mostly shows redundant information.

      We thank reviewer 1 for his/her advise on how to improve Figure 1. We drew new images for the complete figure, hopefully providing more consistent and clearer visual support to our text. For simplicity, protein is now always represented centered around its active site in the same orientation. We represent co-crystallized water in all projections as a guide to the eye.

      Figure 3 and supplementary figure 3 were switched in order to better represent the experimental evidence provided by the resolution of SBPase structure under reducing conditions, i.e., the increase in local disorder around C115-C120 pair of cysteines in the 113-130 stretch forming a redox-conditionally dynamic loop and β-hairpin motif.

      (4) The authors state that mutation of C115 and C120 to serine destabilize the dimer formation, while more tetramer and monomer is formed. As the tetramer is essentially a dimer of dimers, the authors should elaborate how this might work mechanistically. In my opinion, dimer formation is a prerequisite for tetramer formation and the two mutations rather stabilize the tetramer instead of destabilizing the dimer.

      Time-dependent dynamic character of SBPase oligomer exchange is not resolved by the current study because we essentially combined size-exclusion chromatography (SEC) and X-ray crystallography to define quaternary structures at equilibrium. Overall, homodimer is the dominant state of wild-type SBPase by abundance in the purified recombinant form and by forming the constitutive asymmetric unit in all crystal packings. Dimer is indeed present in the tetramer state, a dimer of dimers, as pertinently stated by reviewer 1.

      This being recognized, we tried to explain the systematic co-elution of the principal dimeric form with an additional species of smaller size on SEC (supplementary figure 1, right-side shoulder of the peak), at the apparent mass of a monomer. When solving the crystal structures of SBPase we realized that the dimer interface is contributed by residues 113-130 forming a loop and β-hairpin motif. Notably, in this loop cysteine 115 (C115) maps at bonding distance of 3.9 Å of side chain of arginine 220 (R220) from dimer partner subunit. In loop 113-120, cysteine pair C115 and C120 are subject to redox switching between disulfide (closed) and dithiol (open) conformations, as shown in our structures 7B2O and 7ZUV, respectively. Given that the reduction of C115-C120 disulfide bridge correlates with a higher flexibility of this motif that contributes to dimer interface (figure S3), we hypothesized that reduction of SBPase would destabilize dimer state to the benefit of transitory monomer state, and indeed point mutagenesis of C115S or C120S caused a large modification of oligomer equilibrium in favour of the monomer (figure S1C).

      Mechanistically, we suggest two scenarios for the tetramer formation: either monomers first interact as in the crystallographic dimer before pairing such dimers into tetramers (as proposed by reviewer 1), or monomers start tetramerization by favoring the alternative subunit interface (figure 5B, between cyan and magenta chains) before stabilizing the crystallographic homodimer interface. In this latter case, monomerization would be necessary to efficiently re-arrange SBPase dimers into tetramers.

      In physiological conditions the re-arrangement switch would be controlled by C115-C120 reduction through ferredoxin-thioredoxin redox cascade. Structural studies in dynamic conditions like native mass spectroscopy/photometry would be necessary to solve this speculation unambiguously although at this stage of our investigation there seem little doubt to us that C115-C120 disulfide-dithiol exchange is essential to control a dimer/monomer balance in first instance.

      Reviewer #2 (Public Review):

      The central theme of the manuscript is to report on the structure of SBPase - an enzyme central to the photosynthetic Calvin-Benson-Bassham cycle. The authors claim that the structure is first of its kind from a chlorophyte Chlamydomonas reinhardtii, a model unicellular green microalga. The authors use a number of methods like protein expression, purification, enzymatic assays, SAXS, molecular dynamics simulations and xray crystallography to resolve a 3.09 A crystal structure of the oxidized and partially reduced state. The results are supported by the claims made in the manuscript. One of the main weakness of the work is the lack of wider discussion presented in the manuscript. While the structure is the first from a chlorophyte, it is not unique. Several structures of SBPase are available. As the manuscript currently reads, the wider context of SBPase structures available and comparisons between them is missing from the manuscript. Another important point is that the reported structure of crSBPase is 0.453A away from the alphafold model. Though fleetingly mentioned in the methods section, it should be discussed to place it in the wider context.

      We thank reviewer 2 for his/her assessment of our manuscript. In response to his/her suggestion to better compare our SBPase structure from the model microalga Chlamydomonas reinhardtii to that of the ortholog from Physcomitrium patens previously reported by an author of this manuscript (Stéphane D. Lemaire) and collaborators (Gütle et al. 2018), we wish to point out that paragraph 3 of the introduction was dedicated to this reference along with a mention to related Thermosynechococcus elongatus dual function fructose-1,6-bisphosphatase sedoheptulose-1,7-bisphosphatase (F/SBPase). We nevertheless follow his/her suggestion to better detail comparison between chloroplastic SBPase structures in the first result section 'Crystal structure of CrSBPase', consistently with response 1 to reviewer 1 (see above).

      Regarding the integration of AlphaFold (AF) computational models in a general discussion about SBPase molecular structure, we wish to point out that our initial 7B2O crystallographic model of CrSBPase was deposited in PDB on 2020-11-27 before AlphaFold2 was available for the scientific community (Jumper et al. publication date is 15 July 2021).

      AF2 entry AF-P46284-F1-model_v4 from AlphaFold Protein Structure Database aligns with our crystal structure 7B2O chain E with RMSD = 0.434 Å, showing excellent agreement between experiment and prediction at the level of protein main chain. It must still be pointed out that it is the AF2 model which is at 0.434 Å away from the experiment, and not the opposite. Exceptions of alignments are in local differences in several loops conformations and in the length of secondary structure elements. Many amino acid residues side chains adopt distinct orientations between the computational model and the experimental structure.

      AF3 was recently communicated (Abramson et al. 2024) along with its online prediction server hosted at https://golgi.sandbox.google.com. CrSBPase model from AF3 align to our crystal structure 7B2O chain A with RMSD = 0.489 Å showing again their strong similarity and with a smaller discrepancy between AF2 and AF3 of RMSD = 0.216 Å. The only significant deviations between 7B2O and AF3 are in the orientation of several side chains and notably on the conformation of region 114-131 that contain the redox sensor motif.

      We added the last two paragraphs to the revised version of the manuscript, after the results section presenting our crystallographic work.

      Recommendations for the authors:

      We made all recommended modifications as detail below.

      Reviewer #1 (Recommendations For The Authors):

      I have outlined a number of minor points below.

      We addressed all minor points listed.

      Line 220: The asymmetric unit only contains three dimers. The dimer of dimer or tetramer can only be reconstituted by displaying the symmetry mates.

      We corrected our sentence for 'The asymmetric unit is composed of six polypeptide chains packing as three dimers'.

      I also suggest that the authors separate the description of the asymmetric unit content from the modeled water molecules and rephrase e.g. „..and four water molecules could be modeled."

      We rephrased as suggested.

      I appreciate that the authors uploaded the structure in advance of this article, which allowed to evaluate the quality of the structure. Although this does not add valuable information, I have identified several unmodeled blobs, which possibly also account for waters.

      Unmodeled blobs were tentatively assigned to water but had to be removed during later refinements. We used Coot Validate tools 'Unmodelled blobs' and 'Check/Delete water' to progress towards the current optimal refinement statistics. We admit that the resolution of the crystallographic dataset (3.09 Å) is limiting to reliably model mobile or less resolved elements like water molecules. Overall, we estimate that the functional elements of the structure are modeled to the best of our knowledge and with minimal subjectivity.

      Line 222: Please write 309 instead of spelling the number.

      We corrected for 309 instead of spelling the number.

      Line 223: The structure representation in Figure 1A/B has to be improved. The authors might consider labeling the two domains & color them in two colors instead of the rainbow color coding. Furthermore, the 90{degree sign} rotation does not add much information. Here, turning the model in a different direction that allows to see the central b-sheet of domain 2 might be better suited. Furthermore, instead of describing b-strands first, followed by a-helices, I suggest describing which secondary structure elements form the two domains.

      We improved Figure 1A as suggested while keeping Figure 2B with 90° rotation as rainbow color gradient in order to display with clarity the secondary structure content and connectivity. The orientation was tilted to better display the central β-sheet. This new version of Figure 1A/B should facilitate the text description of SBPase architecture that we amended as suggested.

      Line 229: The information on A113-120 should be depicted in a closeup in Figure 1A.

      We made a close-up view of sequence 113-120 as added figures 1C-D and modified the rest of the figure and legend accordingly.

      Line 234: Please provide an r.m.s.d here.

      We now provide r.m.s.d. for all structural alignments.

      Line 242: Please introduce the domain labeling in Fig 1C to make it easier to track the exact region within SBP here. Is the residue numbering according to SBP or the human FBP?

      Modified version of figure 1 now shows SBPase in the same orientation for panels A, E, F, G, H for simplicity. Domains labeling is indicated in panel A with NTD/CTD distinct colors as suggested. We explicited the position of W401 on all panels as a guide to the eye. We indicated in figure legend that residue numbering is according to Chlamydomonas SBPase Uniprot entry P46284.

      Line 244: Is Figure 1D in the same orientation as C? I suggest making the surface transparent and showing the cartoon below, which will allow to easier see the solvent accessibility of the residues. Also, clearly label W401 (although it's the only water shown/modeled in this region).

      We modified figure 1 to show all equivalent panels (ie. A-E-F-G-H) with the same orientation. In this new form we think that solvent accessibility and the relative position of significant residues is easier to interpret for the reader. W401 is consistently labeled throughout figure 1 panels.

      Line 263: Please provide a close-up of the C222 and C231 including measured distance. It's clearly not visible from this view. It might even be helpful to provide close-ups of all cysteine residues that are mentioned in the text.

      In the modified version of figure 1 we estimate that C222 and C231 are more easily visible. We added a close-up view of C22-C231 environment in a new supplementary figure 2. Since we do not explore further the functional relevance of this redox pair we chose not include C222-C231 close-up view in main figure 1. We added legends and modified supplementary figures numbering accordingly.

      Line 276: As already mentioned earlier, none of the panels in Figure 1 provide a close-up of this loop. This should be added.

      This loop is now displayed as a close-up view in panels C and D of main figure 1.

      Line 284: It is difficult to follow the relative positions of the potential modification sites if the model is always depicted from a different angle in Figure 1. The authors might want to change this across Figure 1 or show the rotation angle.

      This problem was addressed in the revised figure 1, panels A-E-F-G-H are in the same orientation now. Panel B was kept at a rotation of 90° with corresponding annotation.

      Line 290: Please label W401. Also stick to one nomenclature (W or H20).

      We labeled W401 and kept nomenclature consistent throughout the manuscript.

      For comparative reasons, a full kinetic measurement (determination of Km and kcat) of the SBPase would also be helpful here.

      We resolved to avoid a full kinetic measurement of CrSBPase because we could neither identify a reliable chemical provider nor synthesize ourselves the physiological substrate sedoheptulose-1,7-bisphosphate (SBP) and only characterized the reaction with fructose-1,6-bisphosphate. However, in the revised form of the manuscript we added in main text paragraph 'CrSBPase enzymatic activity' the kinetic constants from the previous reference study conducted on spinach SBPase (Cadet and Meunier, Biochem. J. 1988) with KMSBP\=0.05 mM and kcatSBP\=81 sec-1 of fully active enzyme with SBP as a substrate. For comparison, the authors of this study report that activity of SBPase on FBP is in the same range but lower, with KMFBP\=0.38 mM and kcatFBP\=21 sec-1. We also added a comparison of specific activities of our CrSBPase and spinach SBPase in the main text, showing that our enzyme behaves as previously reported ortholog from land plant.

      Line 303: How much MgSO4 was used for the experiment shown in Figure 2A?

      10 mM of MgS04 was used for experiment shown in Figure 2A. We added this information in the figure legend. We also added in the legend that 10 mM DTT is present in the experiment of Figure 2B and that 10 mM of MgSO4 and 1 mM of DTT are present in the experiment of Figure 2C.

      Line 321: In my opinion it is not necessary to show the regions of all molecules here. I was rather expecting a superposition of the two structures (oxidized and reduced) with a close-up of the respective disulfide in the two states.

      We agree that the initial version of Figure 3 panels showing side-by-side all conformational variants of the redox motif appear redundant. We switched initial Figure 3 to supplementary data and replaced it with the crystallographic b-factor mapping of the redox motif, in the variable conditions resolved by the crystals. We would like to stress that all these conformations were experimentally determined through X-ray crystallography, whether of the crystal of pure inactive enzyme that proved to be oxidized on the redox motif, or of the equivalent crystals submitted to activating treatment by the chemical reductant TCEP. As an attempt to clarification we added visual boxes to better appreciate this reduction-induced conformational plasticity that we interpreted as a local conditional disorder.

      Line 331: Could the authors provide movies of the MD simulation? Otherwise, interpretation of the MD simulation results might be difficult for non-experts.

      We added two movies of 20-µsec MD simulations as supplementary data to help non-expert readers.

      Line 343: It might be helpful to label the structure elements in Figure 4 accordingly (e.g. residues, etc.)

      We added secondary structure labeling in Figure 4.

      Line 381: Should be changed to Figure 5A.

      We changed reference to figure 6 that is a renumbering of figure 5 with changes included from suggestions below. Figure 6 now includes chromatograms of recombinant SBPase in panel A and chromatogram and western blot analysis of Chlamydomonas extracts in panel B.

      Line 383: See above, figure 5B. Which structure is shown in the figure? 7zuv or 7b2o? Maybe include both structures in the figure in a side-by-side view. The authors might also want to include the SEC chromatograms in the main figure. Especially the purification from Chlamydomonas is helpful to estimate whether post-translational modifications have an impact on the oligomerization. This should also be mentioned in the text.

      7b2o and 7zuv are illustrated side-by-side in panels A and B of figure 5. This was indicated in the figure legend, we now added the information on the figure. As suggested above we included chromatograms initially presented as supplementary material in a new main figure 6, panel A for recombinant proteins and panel B for proteins extracted from Chlamydomonas. Initial figures 5D-E, showing surface conservation of the dimeric SBPase, is moved to supplementary figure 5.

      Line 385: I don't find the cultivation of Chlamydomonas in the method section. It should be added.

      We added a methods paragraph dedicated to « Cultivation of Chlamydomonas for native SBPase analysis ».

      Line 390-392: This information is not really helpful. Concentrated purified proteins might precipitate after a week storage without physiologically relevant effects being the reason.

      We agree that the observation of a precipitate building up in vitro after a week of storage bears no particular physiological implications. We rather intended to report that an aggregated form of purified protein can be turned to droplets under the redox conditions that activate the enzyme. We reformulated these lines for clarification.

      Line 397: I would appreciate having the SEC-chromatograms of the mutants also in the main figure.

      Size-exclusion chromatograms that were initially in supplementary figures are now shown in main text figure 6 panel A, with the profiles WT and mutants aligned.

      Line 402: Where are these data shown? They should be included in Figure 5.

      We added a figure to present these data, not shown in the initial version of the manuscript. We preferred to place it as supplementary material because C115S and C120S mutant catalytic activity is essentially the same as WT and do not reveal a direct mechanistic effect of C115-C120 reduction over the catalytic pocket.

      Line 427: Did the authors look into a possible cooperativity of their SBPase?

      We did not observe direct positive cooperativity that could be ascribed to allostery in our enzymatic assays. It was previously reported for spinach SBPase that SBP saturation functions were hyperbolic with no evidence of homotropic interactions in the enzyme oligomer (Cadet and Meunier Biochem J. 1988 253, 249-254). The authors of this kinetic study however present a clear sigmoid response of SBPase to Mg2+ concentration, suggestive of an activating cross-talk between active sites in the oligomer. We consider this hypothesis of interest and wish we could further investigate allosteric conformational changes when SBP physiological substrate would be available.

      Line 428-434: I don't really understand how the proteome mapping fits in here. Do the authors speculate that SBPase is recruited by some of the identified enzymes or directly interacts with them or that rather the spatial distribution optimizes the reaction kinetics?

      We indeed want to correlate our in vitro observations of CrSBPase conditions of activity to those recently published by the group of Dr. Martin Jonikas in a physiological, in vivo setup of Chlamydomonas reinhardtii (Wang, Patena et al. Cell 2023 186, 3499–3518). We have no experimental evidence demonstrating the first suggestion that SBPase is recruited or directly interacts with partner enzymes but we privilege the second suggestion that local spatial distribution in the chloroplast stroma optimizes enzyme reaction kinetic thanks to Calvin-Benson-Bassham enzymes proximity. We rephrased these lines to clarify our hypothesis and express its speculative character.

      Reviewer #2 (Recommendations For The Authors):

      To make the manuscript stronger, the authors are recommended to do the following:

      We followed given recommendations.

      (1) include a wider discussion on the other SBPase structures that are available. A detailed comparison should be made between the oxidized and reduced structures present in the PDB with the structures that are being reported in the manuscript.

      Consistently with reviewer #1 suggestion, and as detailed in response to public review above, we followed the recommendation to better report previous structural studies of SBPase in the results section. We also added comparisons with computational models from AlphaFold2 and AlphaFold3.

      (2) The authors mention co-operativity between the subunits. With excellent sampling from molecular dynamics simulations, the authors should demonstrate co-operativity between the subunits.

      Our molecular dynamic (MD) simulations span 20 µsec of SBPase in the dimeric state, starting from the experimental structures determined by XRC. In the considered time window, the only significant events that we observed are the local reorganization of the LBH motif that is a prerequisite for dimer rearrangement. We infer that local disorder contributes a separation of the pair of subunits in order to later allow for the building of the active homotetramer, at longer time scales that are outside the capacities used in this work. Moreover, demonstrating cooperativity with MD simulations would require more than a single event to ensure that results are significant, and performing series of 20µs-MD of SBPase is also outside the available capacities.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides a useful strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) with serum derived from mCSCC-exposed mice. The exploration of serum-derived antibodies as a potential therapy for curing cancer is particularly promising but the study provides inadequate evidence for specific effects of mCSCC-binding serum antibodies. This study will be of interest to scientists seeking a novel immunotherapic strategy in cancer therapy.

      Joint Public Review:

      Summary:

      This study presents an immunotherapeutic strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) using serum from mice inoculated with mCSCC. The author hypothesizes that antibodies in the generated serum could aid the immune system in tumor volume reduction. The study results showed a reduction in tumor volume and altered expression of several cancer markers (p53, Bcl-xL, NF-κB, Bax) suggesting the potential effectiveness of this approach.

      Strengths:

      The approach shows potential effect on preventing tumor progression, from both the tumor size and the cancer biomarker expression levels bringing attention to the potential role of antibodies and B cell responses in cancer therapy.

      We greatly appreciate your positive feedback on our study.

      Weaknesses:

      These are some of the specific things that the author could consider to strengthen the evidence supporting the claims in their study.

      (1) The study fails to provide evidence of the specific effect of mCSCC-antibodies on mCSCC. The study utilized serum which also contains many immune response factors like cytokines that could contribute to tumor reduction. There is no information on serum centrifugation conditions, which makes it unclear whether immune components like antigen-specific T cells, activated NK cells, or other immune cells were removed from the serum. The study does not provide evidence of neutralizing antibodies through isolation, analysis of B cell responses, or efficacy testing against specific cancer epitopes. To affirm the specific antibodies' role in the observed immune response, isolating antibodies rather than employing whole serum could provide more conclusive evidence. Purifying the serum to isolate mCSCC-binding antibodies, such as through protein A purification, and ELISA would have been more useful to quantify the immune response. It would be interesting to investigate the types of epitopes targeted following direct tumor cell injection. A more thorough characterization of the antibodies, including B cell isolation and/or hybridoma techniques, would strengthen the claim.

      I am deeply appreciative of the reviewer's highly professional comments. Tumor development involves the coexistence of cancer cells at different developmental stages, each harboring a variety of known and unknown mutated proteins. These mutated proteins expose multiple known and unknown epitopes, each capable of stimulating the production of corresponding antibodies in healthy mice. Identifying all these antibodies presents a significant challenge. Current research methodologies, such as ELISA, WB, and ChIP, can only identify known antibodies based on existing antigens. A prerequisite for using these techniques is that both antigens and antibodies are identified. At present, there is no technology available to identify antibodies produced by an unknown mutated protein and epitope. However, I find the reviewer's comments insightful. Perhaps we can initially identify some known mCSCC-antibodies on mCSCC. However, studying the specific effect of these known mCSCC-antibodies on mCSCC is uncertain because we believe that tumor shrinkage results from the combined action of both known and unknown antibodies.

      We concur with the reviewer's observations regarding the use of serum, which is rich in immune response factors such as cytokines that could potentially contribute to tumor reduction. In our future research, we plan to systematically analyze the individual roles of these antibodies and cytokines in tumor reduction. In 1973, Nature published a report indicating that serum demonstrated promising results in tumor treatment (Immunotherapy of Cancer with Antibody in Rats. Nature 243, 492 (1973). https://doi.org/10.1038/243492b0). Since then, there have been scarcely any reports on serum therapy for tumors. The primary focus of our study is to evaluate the efficacy of serum therapy in treating tumors. We hypothesize that antibodies and cytokines form a complex interactive network, working in synergy to reduce tumors. Consequently, we believe that studying these antibodies and cytokines in isolation may not yield effective results.

      In this study, the methodology section outlines the process of serum preparation. It is important to note that serum is devoid of blood cells. I hypothesized that whole blood might have superior therapeutic effects compared to serum. This is because antibodies could potentially synergize with immune cells (including T cells, B cells, and NK cells), thereby enhancing the effectiveness of the treatment. As previously discussed, these antibodies, cytokines, and immune cells form a complex interactive network aimed at tumor reduction. Consequently, there are numerous factors that could influence the experimental outcomes, which presents a challenge for analyzing the results. Furthermore, the implementation of whole blood transfusion therapy introduces additional considerations, such as potential side effects and reactions associated with blood transfusions.

      We thank the reviewers for their suggestion to purify the serum in order to isolate mCSCC-binding antibodies. As we previously mentioned, separating a large number of both known and unknown serum antibodies presents a significant technical challenge. We are eager to discuss and consider suggestions from the reviewers regarding methods to identify a large variety and number of unknown antibodies on cells. Perhaps, as the reviewer suggested, we could begin with known antibodies and employ Protein A purification technology to purify these antibodies and subsequently detect immune responses. We could also categorize the types of epitopes targeted, direct tumor cell injection, to study the epitopes of these types in further studies. The suggestion to study the response of B cells is valuable, and we plan to conduct comprehensive research on the response and status of B cells in our future studies.  

      The purification of antibodies to enhance the specificity of their effectiveness against tumors is a critical aspect of our study. However, we would like to address some concerns raised. (1) The separation of all antibodies and cytokines presents a significant technical challenge. Particularly, there is a risk of overlooking antibodies that are present in low concentrations but play crucial roles. (2) What concerns us is that studying the composition separately would lose the overall effectiveness of the study. Our primary concern is that studying these components in isolation could compromise the holistic understanding of the study. This is akin to current research on traditional medicine, where the separation and individual study of compounds often result in a loss of overall therapeutic efficacy. For instance, consider a scenario where 100 antibodies collectively work to shrink a tumor. These antibodies interact with 20 cytokines, forming a complex network that enhances the cytokines' activity against tumor cells. Furthermore, many important antibodies and cytokines are currently unknown. Studying these antibodies in isolation could potentially result in the loss of this therapeutic effect. Therefore, in the discussion section, we have emphasized that our study considers a tumor mass, including tumor cells at various stages of development, as a single entity. As a practicing clinician, my primary focus is on the therapeutic outcomes in tumor treatments, despite the mechanisms of serum therapy remaining largely elusive, liking a black box.

      (2) In the study design, the control group does not account for the potential immunostimulatory effects of serum injection itself. A better control would be tumor-bearing mice receiving serum from healthy non-mCSCC-exposed mice. Additionally, employing a completely random process for allocating the treatment groups would be preferable. Also, the study does not explain why intravenous injection of tumor cells would produce superior antibodies compared to those naturally generated in mCSCC-bearing mice.

      I concur with the reviewer's perspective that using serum from healthy, non-mCSCC exposed mice as a control could potentially improve our study. Initially, our primary concern was to minimize harm to the mice and avoid excessive blood reactions, which led us to exclude the use of serum from healthy, non-mCSCC exposed mice in our control group. The main objective of our study was to investigate tumor shrinkage through serum treatment, specifically serum-derived antibodies. We anticipated that tumor-bearing mice receiving serum from healthy, non-mCSCC exposed mice would exhibit a response to the injected serum, which would manifest as a blood reaction. However, we did not expect this to result in a tumor treatment effect. If it turns out that normal serum (from healthy, non-mCSCC-exposed mice) possesses tumor-reducing properties, it would indeed be a novel discovery. We appreciate the reviewer's insightful suggestion and will consider incorporating it into our future research.

      We concur with the reviewer's observations that the use of a completely random process for assigning treatment groups would be more desirable. Indeed, the complete randomization of the entire process further underscores the efficacy and universality of serum therapy. In this study, we utilized paired mice to mitigate the risk of cross-infection and adverse reactions associated with blood transfusions. We deeply value the reviewer's expert feedback.  

      Lastly, the reason why tumor cells, when intravenously injected, produce antibodies superior to those naturally generated in mCSCC-bearing mice, is due to the following reasons. As tumor cells grow, they produce a variety of mutated proteins to adapt to the immune microenvironment and evade the immune system of mCSCC-bearing mice. However, these tumor cells with mutated proteins are exceptionally sensitive and recognizable to healthy mice. This recognition triggers an immune response in healthy mice, leading to the production of specific therapeutic antibodies. This simultaneous production of diverse and abundant antibodies is only achievable by living organisms.

      (3) In Figure 2B, it would be more helpful if the author could provide raw data/figures of the tumor than just the bar graph. Similarly in Figure 3, the author should show individual data points in addition to the error bar to visualize the actual distribution.

      Raw data (numerical values) have been incorporated into Figures 2B and 3, but the data is placed in the table below the graph. If placed above the error bar, it requires a small font and may not be clear.

      (4) The author mentioned that different stages of tumor cells have different surface biomarkers. Therefore, experimenting with injecting tumor cells at various stages could reveal the most immunogenic stage. Such an approach would allow for a comparative analysis of immune responses elicited by tumor cells at different stages of development.

      Yes, throughout the course of tumor development, tumor cells at various stages will exhibit distinct markers or possess different mutated proteins. The concept of segregating tumor cells from different stages and independently comparing their immune responses is indeed commendable. Future research could involve isolating cells that express identical biomarkers at each stage for a comparative analysis of the immune responses triggered by the tumor cells. However, this approach diverges from the original intent of this study.

      Most tumor cells exist within the same developmental stage. However, this does not imply that all tumor cells within the tumor mass are at the same stage. For instance, a stage III liver cancer tumor may contain both stage I and stage IV tumor cells. Moreover, due to the complexity of tumor development, not all tumor cell surface markers are identical, even for tumors at the same stage. For instance, 20 major proteins and 100 minor proteins are implicated in tumor formation. In fact, random mutations in just 5 of these major proteins and 10 minor proteins can instigate the development of tumors. This implies that the protein pattern (tumor cell surface markers) associated with each individual's tumor is unique. While studying tumor cells at different stages separately allows for the observation of the immune response of tumor cells at each stage, it lacks a comprehensive research and treatment effect. For this reason, the design of this study treats a tumor mass as a whole, encompassing both the primary stage tumor cells and those not in that stage. These tumor cells are then injected to produce corresponding therapeutic antibodies. Furthermore, if tumor cells from only one stage are isolated and specific antibodies are produced against these cells, it could lead to immune escape of tumor cells at other stages, preventing the tumor from shrinking. Therefore, our approach aims to address this issue by considering the tumor mass as a whole.

      (5) In the abstract the author mentioned that using mCSCC is a proof-of-concept for this potential cancer treatment strategy. The discussion session should extend to how this strategy might apply to other cancer types beyond carcinoma.

      We have incorporated an additional paragraph in the discussion section where we delve into the concepts and experimental principles underpinning this study. This, we believe, addresses the reviewer's query regarding the applicability of our study's methodology to other types of tumors. The process for other tumors also involves isolating cells from the tumor, stimulating therapeutic antibody production in healthy mice using these cells, and ultimately reintroducing these antibodies into mice with tumors to facilitate tumor elimination

      Recommendations For The Authors:

      The author is encouraged to refine the study's design in future studies considering the weaknesses highlighted above, summarize the results more effectively, and seek opportunities to expand on this promising idea and enhance the research's impact and applicability.

      We greatly appreciate the valuable suggestions provided by the editor and reviewers. These insights will certainly be addressed in our future research endeavors.

      Suggestions for title modification:

      Following the scope of the study, the term 'specific homologous neutralizing-antibodies' may be misleading as neutralizing antibodies typically refer to antibodies preventing viral cell entry. In cancer therapy, 'neutralization' is not a relevant concept, as cancer cells do not infect host cells. Using whole tumor cells as immunogens diverges from the specificity of traditional vaccination approaches that utilize well-defined proteins or antigens. Furthermore, the term "homologous" suggests a precision in targeting that is not demonstrated by reintroducing serum without isolating its specific components. Therapeutic effects should not be attributed to "neutralizing antibodies" without isolating or characterizing the antibody response or verifying their efficacy against specific cancer epitopes. Additionally, it is suggested that you indicate the biological system that your study utilised in the title. More so, this approach is not entirely novel, as seen with the use of adjuvants in some flu vaccines, or in Moderna's cancer vaccine mRNA-4157, which encodes up to 34 patient-specific tumor neoantigens. You can consider the title below or a variant of the same.

      Suggested title: Generating serum-based antibodies from tumor-exposed mice: a potential strategy in cutaneous squamous cell carcinoma treatment

      I concur with your suggestion and have modified the title to " Generating serum-based antibodies from tumor-exposed mice: a new potential strategy for cutaneous squamous cell carcinoma treatment ". I believe this research remains some new, hence the addition of the word "new". Furthermore, the term "novel" in the paper has been either removed or substituted.

      Moreover, I propose that this study shares similarities with Moderna's cancer vaccine mRNA-415, albeit with certain differences. Moderna's cancer vaccine mRNA-415 encodes 34 recognized neoantigens to stimulate an immune response by eliciting specific T cell responses. This is similar to the strategy of some companies developing a protein set for diagnosing lung cancer, liver cancer, among others. Without a doubt, these methods have improved the effectiveness of tumor diagnosis and treatment. However, I think that these methods currently face challenges in completely eradicating tumors because they perceive tumors as a static process and cells that express certain mutated proteins in a fixed manner. I believe that small molecule antibodies, cytokines, and immune cells present in serum that are difficult to detect, have low concentrations, or are unknown are essential for maintaining the expression of important mutant proteins and the escape of tumor cells. This is also the primary reason why tumors are difficult to treat and prone to recurrence at present.

      From my perspective, different tumors, as well as different stages of the same tumor, express varying mutated proteins or surface markers. Targeting some may result in others escaping or even creating a more conducive growth environment for those that do escape. Our study adopts a comprehensive view of a tumor block, encompassing tumor cells at different stages and tumor cells at the same stage but expressing different biomarkers. This approach generates a multitude of known and unknown antibodies that work in concert with cytokines and immune cells. While our method may not be capable of generating all mutated proteins and epitope antibodies due to the weakness of some antigens (epitopes of mutated proteins), it can still be effective. As long as the number of tumor cells is reduced below a certain threshold following multiple rounds of treatment with various antibodies produced at different stages, these cancer cells can be eradicated by the body's immune system. This is a process that is real-time and dynamic. Undoubtedly, if it becomes evident that alterations in a set of proteins can bolster the immune system and eradicate tumor cells, then the implications are significant. The immunotherapy proteins, which have demonstrated positive therapeutic effects, developed by certain companies are also predicated on this very principle.

      Finally, I greatly appreciate your suggestions, which will be considered and gradually addressed in future research.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review): 

      In the manuscript "Mechanistic target of rapamycin (mTOR) pathway in Sertoli cells regulates age-dependent changes in sperm DNA methylation", the authors proposed to test if the balance of mTOR complexes in Sertoli cells may play a significant role in age-dependent changes in the sperm epigenome. The paper could be of interest and has a good scientific aim but there are too many drawbacks that hamper the initial enthusiasm. All sections need extensive revision. The paper is mostly descriptive without a mechanistic-orientated explanation for the observed results. 

      Comments on revised version: 

      I am not sure that the authors have made an attempt to clearly answer the reviewers comments that aimed to improve the quality of the manuscript. It stands as mostly descriptive and with limited interest as it is. 

      We are thankful to the reviewer for agreeing to review our revised manuscript. Unfortunately, we completely disagree with the evaluation provided by the reviewer. Research on sperm DNA methylation experienced a significant rise of interest in the current century and by now more than 2000 papers have been published. Although it was demonstrated that the sperm DNA methylome may be affected by almost every factor analyzed, no study was published to identify molecular mechanisms that may link these factors with the sperm epigenome. Our study is the FIRST to identify such a mechanism (mTOR complexes balance in Sewrtoli cells). More so, we demonstrated experimentally that manipulations of this mechanism allow regulation of the rates of epigenetic aging of sperm in both directions (accelerate aging or rejuvenate). Thus, our study provides a mechanistic background for the development of therapeutic interventions that may target sperm epigenome.

      We acknowledge that our study does not provide the full cascade of events linking the balance of mTOR complexes in Sertoli cells with the sperm DNA methylome. It suggests, however, the most plausible event next in a cascade (BTB permeability changes). Our group is working on this question now and we hope to provide the answer soon in a separate study. Even after that, we will be far from understanding the complete chain of molecular events that link mTOR and sperm methylome. It may take many years and significant effort of many research groups to dissect the whole cascade. It is worth mentioning that understanding of a complete cascade involved in pathology is not needed to develop efficient therapies if the critical nodes are known. For many common drugs (e.g. metformin) we do not know the full chain of molecular mechanisms but use them successfully.

      Thus, we believe that our study is mechanistic as it identified a critical mechanism manipulation of which allows experimental aging and rejuvenation of the sperm methylome. Additionally, it generates new mechanistic questions and hypotheses to be answered in the future.

      Reviewer #3 (Public Review): 

      Summary and Strength: 

      The manuscript by Amir et al. describes that Sertoli-specific inactivation of the mTORC1 and mTORC2 complex by KO of either Raptor or Rictor, respectively, resulted in progressive changes in blood-testis-barrier (BTB) function, testis weight, and sperm parameters, including counts, morphology, mtDNA content and sperm DNA methylation. 

      The described studies are based on the hypothesis that a decline of BTB function with increasing chronological age of a male contributes to the DNA methylation changes that are known to occur in sperm DNA of old males when compared to sperm DNA from isogenic young males. In order to demonstrate the relevance of a functioning BTB for the maintenance of sperm methylation patterns, the authors generated mice with genetically disrupted mTORC2 complex or mTORC1 complex in Sertoli cells and determined sperm methylation patterns in comparison to isogenic wild-type males. In line with previously published scientific literature (e.g. Mok et al., 2013; Dong et al, 2015; and others), the manuscript corroborates that a Sertoli-cell specific deletion of mTORC2 caused a loss of BTB function and a progressive spermatogenic defect. The authors further show that sperm DNA is differentially methylated (DMRs) as a consequence of either a mTORC2 disruption (associated with a loss of BTB function) or following a mTORC1 disruption (BTB function either increased or not leaky) when compared to their isogenic age-matched wt controls. Those DMRs overlap partially with changes in sperm DNA methylation that were found when comparing sperm from 8-week males with sperm isolated from 22-week-old male mice. 

      The authors interpret the observed changes as representative of the sperm DNA methylation changes that occur during normal chronological aging of the male. For an aged control group, the authors use sperm DNA of 22-week-old wild-type mates from the mTORC2 and mTORC2 KO breeding and compare the sperm methylation patterns found in sperm from those 22-week males to 8-week young males, that are intended to represent an old and a young cohort, respectively. DNA methylation analysis indicates that a disruption of mTORC2 (& decrease of BTB function) results in increased DNA methylation of sperm DNA, while a disruption of mTORC1 (and proposed increase of BTB tightness, not shown in the manuscript, though) resulted in increased hypomethylation. 

      Weaknesses: 

      While the hypothesis and experimental system are interesting and the data demonstrating the relevance of the mTORC2 complex for BTB function is convincing, several open questions limit the evidence that supports the hypothesis that the sperm DNA methylation changes seen in old males are caused by BTB failure following an imbalance of mTOR signaling complexes. The major critique points are the lack of a chronologically old group and the choice of 8 weeks & 22 weeks age of age: 

      - Data illustrating the degree of BTB decline and sperm DNA methylation changes from chronologically "old" male mice is missing. 22-week-old mice are not considered old but are of good and mature breeding age, equivalent to humans in their mid-late twenties. (In the manuscript, the 22-week-old wildtype mice show no evidence of BTB breakdown (Figure 3), so why are their sperm used to represent "aged" sperm? 

      - Adding a group of "old" wild-type mice of 12-14 months of age, which is closer to the end of effective reproduction in mice, more equivalent to 45-59 year-old humans) could be used to illustrate that (a) aging causes a marked decrease in BTB function at this time in mouse life, and that this BTB breakdown chronologically aligns with the age-associated DNA hypermethylation seen in old sperm. Age-matched "old" mTORC1 KO, with a (supposedly) tighter BTB barrier, could then be expected to have a sperm DMA methylation profile closer to that of younger wild-type animals. Such data are currently missing. While the progressive testicular decline observed in the mTORC1 KO (Fig.5) could make it difficult to obtain the appropriately aged mTORC1 KO tissues, it is completely feasible to obtain data from chronologically old wild-type males. (The progressive testicular decline further raises the question of what additional defects the KO causes, and how such additional defects would influence the sperm DNA methylation profile.) The addition of data from an old group to the currently included groups could strengthen the interpretation that the observations in the BTB-defective mTORC2 KO mice are modelling an age-related testicular decline, provided that the DMRs seen in the chronologically old group significantly overlap with the BTB-defective changes. 

      - In the current form, the described differences in sperm DNA methylation are based on comparisons between pubertal mice (8 weeks) and mature but not old adult males (22 weeks), while a chronologically "old" group is missing from the data sets and comparisons. Thus, it appears that the described sperm methylation changes reflect developmental changes associated with normal maturation and not necessarily declining sperm quality due to aging. (Sperm obtained from 8-week-old mice likely were generated, at least in part, during the 1st wave of spermatogenesis, which is known to differ from the continuously proceeding spermatogenesis during the remained of the mature life. During the 1st wave of spermatogenesis, Sertoli cells are known to undergo gene expression changes which could contribute to varying degrees of BTB function, and thus have effects on the sperm DNA methylation profiles of such 1st wave sperm.) 

      - It is unclear why the aging-related DMRs between the 8 and 22-week-old wild-type mice vary so dramatically between the two wild-type groups derived from the mTORC1 and the mTORC2 breeding (Fig. S4). If the main difference was due to mTORC1 or mTORC2 activity, both wildtype groups should behave very similarly. Changes seen in a truly "old" mouse (e.g. 20 weeks to 56 weeks), changes in "young mTORC1" and in "old mTORC2" are missing.

      How do those numbers and profiles compare to the shown samples? 

      Comments on latest version: 

      The rebuttal letter and public response indicate the authors' reluctance to consider the limitations of their study, i.e. having chosen chronologically young animals to demonstrate a sperm aging effect and indicate that they are not willing to include adequate controls. 

      Since there is no evidence that mice at this young age have a deteriorating blood-testis-barrier (indeed, normal intact BTB is clearly visible in the figures included in this study from animals of the relevant age group), the whole central hypothesis that the study is built upon (i.e. that increasing age causes deteriorating BTB integrity which in turn causes age-related changes in sperm DNA methylation), appears irrelevant or invalid. 

      The authors' claim that age-related DNA methylation changes in sperm occur in linear fashion and that the changes are somewhat proportional with chronological age is in stark contrast of the claim that a decline of the BTB in old animals is causative for age-related sperm epigenetic changes, putting the relevance of the whole study in question. 

      We are thankful to the reviewer for agreeing to review our revised manuscript. We disagree with the evaluation provided by the reviewer, however.

      First, the reviewer misinterpreted the hypothesis of the study, although it is formulated in the last sentence of the Introduction:  “ … we hypothesized that the balance of mTOR complexes in Sertoli cells may also play a significant role in age-dependent changes in the sperm epigenome.” Instead, the reviewer assigned a different hypothesis to our study (that BTB integrity changes are responsible for age-dependent changes in sperm DNA methylation) and criticized us for not providing clear testing of this hypothesis.

      To clarify, we believe that our study provides high-quality testing of OUR hypothesis as we demonstrated experimentally that manipulations of mTOR complexes balance in Sertoli allow acceleration and deceleration of epigenetic aging of sperm. Additionally, our study generated a hypothesis that BTB permeability may mediate the effects of the mTOR pathway on sperm methylome. This second hypothesis is to be tested in the future research.

      We also disagree with the reviewer's interpretation of the aging process as an abrupt transition from a young, healthy, and undamaged state to an old, moribund, and damaged state. The whole body of biogerontological knowledge suggests instead steady accumulation of damage over lasting periods of time. For example, this understanding of steady change at the molecular level allowed the development and successful use of epigenetic clock and other molecular clock models, including several variants of sperm epigenetic clocks. These models clearly demonstrate linear or semi-linear accumulation in DNA-methylation changes in various tissues and biological species across the whole lifespan. It is reasonable to assume that BTB permeability decreases with age steadily as well and that in younger animals this decrease may be not easily detected by the existing analytical methods. Experimental data showing the dynamics of the BTB deterioration over age do not exist to our knowledge although it was demonstrated that older animals have loose BTB as compared with young. We agree with the reviewer that future studies testing the role of BTB deterioration for sperm methylome aging will need to provide such evidence. It was not the subject of the current study, however.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the manuscript "Mechanistic target of rapamycin (mTOR) pathway in Sertoli cells regulates age-dependent changes in sperm DNA methylation", the authors proposed to test if the balance of mTOR complexes in Sertoli cells may play a significant role in age-dependent changes in the sperm epigenome. The paper could be of interest and has a good scientific aim but there are too many drawbacks that hamper the initial enthusiasm. All sections need extensive revision. The paper is mostly descriptive without a mechanistic-orientated explanation for the observed results.

      Specific comments:

      (1) The abstract is poorly written. There is a lot of unnecessary introduction that does not provide a rationale for the work. It is not possible to understand the experimental approach or the major data just by reading the abstract. It does not clearly represent the work.

      - We have added details of experimental design and results to the abstract and reduced the introductory part of the abstract.

      (2) The introduction is somewhat vague and does not provide a clear rationale for the hypothesis. There should be more focus more on the role of mTOR in Sertoli cells that goes far beyond BTB. That will give more focus on mTOR. Then it is important to focus on BTB and mTOR: what is known? What is the gap and how can it be solved? Several relevant references are missed concerning mTOR and Sertoli cells.

      - The goal of this study was not to explore all potential roles of mTOR pathway in Sertoli cells, but to test if shifts in the balance of mTOR complexes regulate (accelerate/decelerate) epigenetic aging of sperm. As such, we disagree with the reviewer and consider that the current Introduction provides a focused rational for the study.

      (3) The Material and Methods section needs improvement. There is much important information missing. For instance: how many animals were used per group and how was the breeding done? At what age? Statistical analysis should be explained in detail.

      - The number of animals was clearly stated in the original manuscript. We have added details of breeding and statistical analysis. 

      (4) The results description could be improved. It is vague without highlighting how much difference was detected. The results should be numerically described when possible and the differences should be highlighted. A 10% difference may be significant but not biologically relevant. To correctly evaluate the differences it is important to describe them with some degree of detail.

      - For all DNA methylation experiments we provide numerical characteristics of methylation changes, including numbers of DMRs, % change, significance, correlation coefficients. We believe that only age- and genotype-associated changes in reproductive parameters were not characterized in our manuscript in detail. We have added Table 1 to provide these numbers.

      (5) There is no discussion of the data. The authors just summarize their findings without a comprehensive analysis of the literature and how the effects can be mediated. mTOR interacts with different pathways (mTORC1 and mTORC2 are even mediators of distinct pathways). This would be very relevant to discuss. In addition, there are many study limitations not discussed. There is no clear mechanistic explanation of the way by which the mTOR pathway in Sertoli cells regulates age-dependent changes in sperm DNA methylation. The paper seems preliminary.

      - We have added an additional paragraph to the discussion to highlight a potential molecular mechanism that links mTOR pathway with the sperm epigenome.

      (6) Figure 1 is too simple and does not provide any schematic support for the text.

      - We disagree with the reviewer and believe that the figure represents a good visualization of our hypothesis useful for the perception of the study.

      (7) Figure 2 lacks some detail. For instance, how many animals were used for each step?

      - Numbers of animals are provided in the text of the paper.

      (8) Taking into consideration the roles of mTOR on sperm, particularly mTORC1, it is not clear whether there were any differences in sperm motility.

      - We did not assess sperm motility in this study. 

      Reviewer #2 (Public Review):

      In this study, the authors hypothesized that the balance of mTOR complexes in Sertoli cells may also play a significant role in age-dependent changes in the sperm epigenome. To test this hypothesis, the authors use transgenic mice with manipulated activity of mTOR complexes in Sertoli cells. These results suggest that the mTOR pathway in Sertoli cells may be used as a novel target of therapeutic interventions to rejuvenate the sperm epigenome in advanced-age fathers.

      The authors attempt to demonstrate that the balance of mTOR complexes in Sertoli cells regulates the rate of sperm epigenetic aging. The authors have effectively met their research objectives, and their conclusions are supported by the data presented.

      - We are very thankful for the positive evaluation of our study.

      Reviewer #3 (Public Review):

      Summary and Strength:

      The manuscript by Amir et al. describes that Sertoli-specific inactivation of the mTORC1 and mTORC2 complex by KO of either Raptor or Rictor, respectively, resulted in progressive changes in blood-testis-barrier (BTB) function, testis weight, and sperm parameters, including counts, morphology, mtDNA content and sperm DNA methylation.

      The described studies are based on the hypothesis that a decline of BTB function with increasing chronological age of a male contributes to the DNA methylation changes that are known to occur in sperm DNA of old males when compared to sperm DNA from isogenic young males. In order to demonstrate the relevance of a functioning BTB for the maintenance of sperm methylation patterns, the authors generated mice with genetically disrupted mTORC2 complex or mTORC1 complex in Sertoli cells and determined sperm methylation patterns in comparison to isogenic wild-type males. In line with previously published scientific literature (e.g. Mok et al., 2013; Dong et al, 2015; and others), the manuscript corroborates that a Sertoli-cell specific deletion of mTORC2 caused a loss of BTB function and a progressive spermatogenic defect. The authors further show that sperm DNA is differentially methylated (DMRs) as a consequence of either a mTORC2 disruption (associated with a loss of BTB function) or following a mTORC1 disruption (BTB function either increased or not leaky) when compared to their isogenic age-matched wt controls. Those DMRs overlap partially with changes in sperm DNA methylation that were found when comparing sperm from 8-week males with sperm isolated from 22-week-old male mice.

      The authors interpret the observed changes as representative of the sperm DNA methylation changes that occur during normal chronological aging of the male. For an aged control group, the authors use sperm DNA of 22-week-old wild-type mates from the mTORC2 and mTORC2 KO breeding and compare the sperm methylation patterns found in sperm from those 22-week males to 8-week young males, that are intended to represent an old and a young cohort, respectively. DNA methylation analysis indicates that a disruption of mTORC2 (& decrease of BTB function) results in increased DNA methylation of sperm DNA, while a disruption of mTORC1 (and proposed increase of BTB tightness, not shown in the manuscript, though) resulted in increased hypomethylation.

      Weaknesses:

      While the hypothesis and experimental system are interesting and the data demonstrating the relevance of the mTORC2 complex for BTB function is convincing, several open questions limit the evidence that supports the hypothesis that the sperm DNA methylation changes seen in old males are caused by BTB failure following an imbalance of mTOR signaling complexes. The major critique points are the lack of a chronologically old group and the choice of 8 weeks & 22 weeks age of age:

      - Data illustrating the degree of BTB decline and sperm DNA methylation changes from chronologically "old" male mice is missing. 22-week-old mice are not considered old but are of good and mature breeding age, equivalent to humans in their mid-late twenties. (In the manuscript, the 22-week-old wildtype mice show no evidence of BTB breakdown (Figure 3), so why are their sperm used to represent "aged" sperm?

      - Adding a group of "old" wild-type mice of 12-14 months of age, which is closer to the end of effective reproduction in mice, more equivalent to 45-59 year-old humans) could be used to illustrate that (a) aging causes a marked decrease in BTB function at this time in mouse life, and that this BTB breakdown chronologically aligns with the age-associated

      DNA hypermethylation seen in old sperm. Age-matched "old" mTORC1 KO, with a (supposedly) tighter BTB barrier, could then be expected to have a sperm DMA methylation profile closer to that of younger wild-type animals. Such data are currently missing. While the progressive testicular decline observed in the mTORC1 KO (Fig.5) could make it difficult to obtain the appropriately aged mTORC1 KO tissues, it is completely feasible to obtain data from chronologically old wild-type males. (The progressive testicular decline further raises the question of what additional defects the KO causes, and how such additional defects would influence the sperm DNA methylation profile.) The addition of data from an old group to the currently included groups could strengthen the interpretation that the observations in the BTB-defective mTORC2 KO mice are modelling an age-related testicular decline, provided that the DMRs seen in the chronologically old group significantly overlap with the BTB-defective changes.

      - In the current form, the described differences in sperm DNA methylation are based on comparisons between pubertal mice (8 weeks) and mature but not old adult males (22 weeks), while a chronologically "old" group is missing from the data sets and comparisons. Thus, it appears that the described sperm methylation changes reflect developmental changes associated with normal maturation and not necessarily declining sperm quality due to aging. (Sperm obtained from 8-week-old mice likely were generated, at least in part, during the 1st wave of spermatogenesis, which is known to differ from the continuously proceeding spermatogenesis during the remained of the mature life. During the 1st wave of spermatogenesis, Sertoli cells are known to undergo gene expression changes which could contribute to varying degrees of BTB function, and thus have effects on the sperm DNA methylation profiles of such 1st wave sperm.)

      - It is unclear why the aging-related DMRs between the 8 and 22-week-old wild-type mice vary so dramatically between the two wild-type groups derived from the mTORC1 and the mTORC2 breeding (Fig. S4). If the main difference was due to mTORC1 or mTORC2 activity, both wildtype groups should behave very similarly. Changes seen in a truly "old" mouse (e.g. 20 weeks to 56 weeks), changes in "young mTORC1" and in "old mTORC2" are missing. How do those numbers and profiles compare to the shown samples?

      Some general comments regarding the chosen age of animals:

      - As mentioned, sperm from 8-week-old mice represent many sperm that were produced in the 1st wave of spermatogenesis; 22-week-old mice are not considered chronologically old mice, but mature and "relatively" young animals. 18-24 month-old mice are considered to be equivalent to 56-69 year-old humans, and might be more suitable to detect aging effects. "Old mice" for study purposes should be at least 12-14 months of age, ideally >18 months of age. 22 weeks (5 months of age) are mice at good breeding age, but still considered mature adults, not old males, and therefore are not expected to show typical aging health problems (like declining fertility).

      Even the cited reference (Flurkey et al. 2007) defines that "... mice used a reference group for "young mice" should be at least 3 months of age (~ 13 weeks), i.e. fully sexually mature. The authors specifically state: " The young adult group should be at least 3 months old because, although mice are sexually mature by 35 days, relatively rapid maturational growth continues for most biologic processes and structures until about 3 months. The upper age range for the young adult group is typically about 6 months. ... For the middleaged group, 10 months is typically the lower limit.... The upper age limit for the middleaged group is typically 14-15 months, because at this age, most biomarkers still have not changed to their full extent, and some have not yet started changing. For the old group, the lower age limit is 18 months because age-related change for almost all biomarkers of aging can be detected by then. The upper limit is 22-26 months, depending on the genotype." According to this reference, mice up to 6 months of age are generally considered "mature adults" (equivalent to humans 20-30 yrs), mice of 10-14 month are "middle-aged adults" (equivalent to ~38-47 human years) and 18-24 month mice are "old" (equivalent to human of 56-69 yrs.).

      Going on these commonly used age ranges, it is unclear why the authors used 8-week-old mice (generally considered pubertal to late adolescent age) as young mice and 5-month-old mice as "old mice".

      Differences seen between these cohorts most likely do not reflect aging, but more likely reflect changes associated with normal developmental maturation, since testis and epididymides continue to grow until about 10-11 weeks of age.

      - The DMRs identified between 8 and 22-week-old animals could represent DMRs that are dependent on developmental maturation more than being changed in an "age-dependent" manner (in the sense of increased chronological age). This interpretation is congruent with the fact that those DMRs are enriched for developmental categories.

      - We are thankful to the reviewer for a detailed explanation of their disagreement with the ages of mice used in this study. In short, the reviewer suggests that our older group (22 weeks) is not old enough to represent aged animals and our young group (8 weeks) may still have spermatozoa from the first wave of spermatogenesis, and as such the observed differences between the 2 ages cannot be considered as aging-related but rather may represent different stages of maturation of the reproductive system. At the first glance this criticism looks valid. 

      However, to design our experiments we used our data that was not included to this manuscript initially. These data demonstrated that age dependent changes in sperm DNA are linearly or semi linearly associated with age in the age range from 56 to 334 days. Thus, within this interval any 2 ages, distant enough to register the difference in DNA methylation, can be used to assess age dependent changes in DNA methylation and changes in the rates of epigenetic aging of sperm in response to genetic manipulations. We have added these results now, - see “Identification of agedependent patterns in sperm DNA methylation” section in Material and Methods and “Patterns of age-dependent changes in sperm DNA methylation” in Results. We also consider that the reviewer’s suggestion that sperm from 8-week-old mice represents the first wave of spermatogenesis does not have ground. Indeed, C57BL/6 mice first have fertile sperm in cauda epididymis at 37 days of age [1], 19 days earlier than the age of 56 days (8 weeks) at which sperm was collected in our study in the youngest group of mice. Given that young C57BL/6 mice ejaculate spontaneously around 3 times per 5 days [2], 8 weeks old mice have ejaculated > 10 times since the first wave of spermatogenesis before the sperm was collected for our study, making negligibly small the chances of survival of any first wave sperm in their cauda epididymides to the age of 8 weeks. We have added this information to the text.

      (1) Mochida, K.; Hasegawa, A.; Ogonuki, N.; Inoue, K.; Ogura, A. Early Production of Offspring by in Vitro Fertilization Using First-Wave Spermatozoa from Prepubertal Male Mice. J. Reprod. Dev. 2019, 65, 467–473, doi:10.1262/jrd.2019-042.

      (2) Huber, M.H.; Bronson, F.H.; Desjardins, C. Sexual Activity of Aged Male Mice: Correlation with Level of Arousal, Physical Endurance, Pathological Status, and Ejaculatory Capacity. Biol. Reprod. 1980, 23, 305–316, doi:10.1095/biolreprod23.2.305.

    1. Author response:

      We thank the editors and reviewers for their enthusiasm for this work and helpful suggestions. In summary, the reviewers provided suggestions for additional discussion items and clarifications for the text and figures, especially in relation to the cryo-EM structures and suppressor screen sections of the manuscript. We will consider each of these and make edits as needed. In particular, reviewers asked for further details about the structural model in addition to analysis of our new structure with respect to previously reported intron lariat spliceosome (ILS) complexes. For the latter point, we present additional evidence for the correct assignment of Yju2 in the S. cerevisiae ILS structure and note that docking of the 3’ splice site is not observed in any ILS structure from yeast, worms, or humans. This is consistent with our proposed mechanism. We will clarify these points in the text as well highlight some caveats of prior studies of the ILS complex. We feel that these changes will add additional nuance to the manuscript as well as clarify the findings and their context and significance for the reader.

    1. Author response:

      We would like to thank all reviewers for their valuable comments that help us to improve our manuscript. We will make the following modifications in the revised manuscript:

      (1) To reduce the complexity of the experiments we carried out, we will summarize trimeric G proteins in Ciona in the first paragraph of the Result section and explain how we focused on Gas and Gaq in the initial phase of this study.

      (2) As the reviewer 1 suggested, the polymodal roles of papilla neurons are interesting. We will add a discussion regarding this aspect. The sentences will be like the following:

      “The recent study (Hoyer et al., 2024) provided several lines of evidence suggesting that papilla neurons can serve as the sensors of several chemicals in addition to the mechanical stimuli. This finding and our model seem mutually related because these chemicals could modify Ca2+ and cAMP signaling. The use of G protein signaling may allow Ciona to reflect various environmental stimuli to initiate metamorphosis in the appropriate situation, both mechanically and chemically.”

      (3) As both reviewers suggested, imaging cAMP on the backgrounds of some G protein knockdowns and pharmacological treatments is important, and we will carry out some of these experiments.

      (4) According to reviewer 2's comment, we will carefully modify the text about interpreting the results so that the descriptions suitably reflect the results.

    1. Author response:

      Response to reviewers (Public review):

      We thank all the three reviewers for their opinion on our work on Candida albicans β-1,6-glucan, which highlights the importance of this cell wall component in the biology of fungi. Here are our responses to their comments for public reviews:

      (1) Indeed, the data presented for immunological studies is preliminary. It has been acknowledged by the reviewers that our analysis providing insights into the biosynthetic pathways involved in comprehensive in dealing with organization and dynamics of the β-1,6-glucan polymer in relation with other cell wall components and environmental conditions (temperature, stress, nutrient availability, etc.). However, we anticipated that there would be immediate curiosity as to what the immunological contribution of β-1,6 glucan and we therefore felt we needed to initiative these studies and include them. We therefore performed immunological studies to assess whether β-1,6-glucans act as a pathogen-associated molecular pattern (PAMP), and if so, what its immunostimulatory potential is. Our data clearly suggest that β-1,6-glucan is a PAMP, and consequently lead to several questions: (a) what are the host immune receptors involved in the recognition of this polysaccharide, and thereby the downstream signaling pathways, (b) how is β-1,6-glucan differentially recognized by the host when C. albicans switches from a commensal to an opportunistic pathogen, and (c) how does the host environment impact the exposure of this polysaccharide on the fungal surface. We believe addressing these questions is beyond the scope of the present manuscript and aim to present new data in future manuscript. Nonetheless, in the revised manuscript, suggest approaches that we can take to identify the receptor that could be involved in the recognition of β-1,6-glucan. Moreover, we have modified the discussion presenting it based on the data rather than being descriptive.    

      (2) It will be interesting to assess the organization of β-1,6-glucan and other cell wall components in the opaque cells. It is documented that the opaque cells are induced at acidic pH and in the presence of N-acetylglucosamine and CO2. Our data shows that pH has an impact on β-1,6-glucan, which suggests that there will be differential organization of this polysaccharide in the cell wall of opaque cells. As suggested by the reviewer, we will include analysis of opaque cells (and other C. albicans cell types) in future studies.

      With the exception of these major new avenues for this research, our revision can address each of the comments provided by the reviewers.

    1. Author response:

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane ATG8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage. 

      Strengths: 

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process. 

      Weaknesses: 

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved. 

      We agree with the reviewer.  We do however show how at least one aspect of ATG8ylation contributes to the proper retromer function, which occurs via lysosomal membrane maintenance and repair. Understanding the more direct effects on retromer will require a separate study. We will emphasize this in the revised manuscript and point out the limitations of the present work.

      Reviewer #2 (Public Review): 

      Summary:

      Padder et al. demonstrate that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. 

      Strengths: 

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking. 

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be differences in regulation dependent on the stress imposed on a cell. 

      We thank the reviewer on the overall assessment of the strengths of the study.

      We have discussed in the manuscript the elegant study by Roy et al., PMID 28602683. To accommodate reviewer’s comment, we will additionally emphasize in the text that our study is focused on basal conditions and conditions that perturb endolysosomal compartments. We agree with the reviewer that under metabolic stress conditions (such as glucose limitation) more complex pathways may be engaged and will acknowledge that in the discussion.

      Weaknesses: 

      (1) Additional controls are needed to clarify the role of CASM in the control of retromer function. Because the manuscript proposes both CASM-dependent and independent pathways in the ATG5 mediated regulation of the retromer, it is important to provide robust evidence that CASM is required for retromer-dependent GLUT1 sorting to the plasma membrane vs. lysosome. The experiments with monsensin in Fig. 7C-E are consistent with but not unequivocally corroborative of a role for CASM.

      We fully agree with the reviewer. In fact, our data with bafilomycin A1 treatment causing GLUT1 miss-sorting (manuscript line 317) show that it is the perturbance of lysosomes  and not CASM per se that leads to mis-sorting of GLUT1 (Fig. 7D,E). Note that it has been shown (PMIDs: 28296541, 25484071 and 37796195) that although bafilomycin A1 deacidifies lysosomes it does not induce but instead inhibits CASM. This is because bafilomycin A1 cases dissociation of V1 and V0 sectors of V-ATPase, unlike other CASM-inducing agents which promote V1 V0 association. Complementing this, our data with ATG2AB DKO and ESCRT VPS37A KO (Fig. 8A-F) indicate that the repair of lysosomes is important to keep the retromer machinery functional (as illustrated in Fig. 8G). This may be one of the effector mechanisms downstream of membrane atg8ylation in general and hence also downstream of CASM. We will revise Fig. 7 title to read “Lysosomal damage causes GLUT1 mis-sorting” and will explain these relationships in the text.

      Based on the results shown with ATG16KO in Fig 4A-D, rescue experiments of these 16KO cells with WT vs. C-terminal WD40 mutant versions of ATG16 will specifically assess the requirement for CASM and potentially provide more rigorous support for the conclusions drawn. 

      We will carry out the experiment proposed by the reviewer for the planned revision.

      (2) Also, the role of TBC1D5 should be further clarified. In Fig S7, are there any changes in the interactions between TBC1D5 and VPS35 in response to LLOMe or other agents utilized to induce CASM?

      We thank the reviewer for pointing this out. We do have data with VPS35 in co-IPs shown in Fig. S7.  There is no change in the amounts of VPS35 or TBC1D5 in GFP-LC3A co-IPs. We will include a graph with quantification in the revised manuscript and emphasize this point.

      Does TBC1D5 loss-of-function modulate the numbers of GLUT1 and Gal3 puncta observed in ATG5 deficient cells in response to LLOMe? 

      We agree that TBC1D5 is an interesting aspect. However, because TBC1D5 does not change its interactions in the experiments in our study, we consider this topic (i.e. whether TBC1D5 phenocopies VPS35 and ATG5 KOs in its effects on Gal3) to be beyond the scope of the present work. We underscore that LLOMe (lysosomal damage) mis-sorts GLUT1 even without any genetic intervention (e.g., in WT cells in the absence of ATG5 KO; Fig. 7). Thus, in our opinion the effects of TBC1D5 inactivation may be a moot point.

      (3) Finally, the studies here are motivated by experiments in Fig. S1 (as well as other studies from the Deretic and Stallings labs) suggesting unique autophagy-independent functions for ATG5 in myeloid cells and neutrophils in susceptibility to Mycobacterium tuberculosis infection. However, it is curious that no attempt is made to relate the mechanistic data regarding the retromer or GLUT1 receptor mis-sorting back to the infectious models. Do myeloid cells or neutrophils lacking ATG5 have deficiencies in glucose uptake or GLUT1 cell surface levels? 

      Reviewer’s point is well taken. Glucose uptake, its metabolism, and diabetes underly resurgence in TB in certain populations and are important factors in a range of other diseases. This was alluded to in our discussion (lines 461-469). However, these are complex topics for future studies. We will expand this section of the discussion.

      Reviewer #3 (Public Review): 

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1. Nevertheless, there are several issues that the authors need to address to further clarify their conclusions. 

      (1) The authors confirmed the interaction between Atg5 and the Retromer complex through Co-IP experiments. Is the interaction between Atg5 and the Retromer direct? If it is direct, which Retromer complex protein regulates the interaction with Atg5? Additionally, does ATG5 K130R mutant enhance its interaction with the Retromer? 

      AlphaFold modeling in the initial submission of our study to eLife (absent from the current version) suggested the possibility of a direct interaction between ATG5 and VPS35 with ATG12—ATG5 complex facing outwards, in which case K130R would not matter. However, mutational experiments in putative contact residues did not alter association in co-IPs. So either ATG5 interacts with other retromer subunits or more likely is in a larger protein complex containing retromer. It will take a separate study to dissect associations and find direct interaction partners. We can provide our data on the currently available modeling and mutational analyses in a full point-for-point rebuttal but believe that since they are inconclusive, they should not be included in the study.

      (2) To more directly elucidate how ATG5 regulates Retromer function by interacting with the Retromer and participates in the trafficking of GLUT1 to the plasma membrane, the authors should identify which region or crucial amino acid residues of ATG5 regulate its interaction with the Retromer. Additionally, they should test whether mutations in ATG5 that disrupt its interaction with the Retromer affect Retromer function (such as participating in the trafficking of GLUT1 to the plasma membrane) and whether they affect Atg8ylation. They also need to assess whether these mutations influence canonical autophagy and lysosomal sensitivity to damage. 

      Please see the response to point 1.

      We thank the editors and reviewers for their assessment, constructive criticisms and recommendations.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Xia et al., 2019). So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2, but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2. (Binda et al., 2010; Fang et al., 2010 )

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage, which is interesting. I think we may have used different parameters in the confocal laser shooting process(Ancelin et al., 2016). We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting ((Ancelin et al., 2016)). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development. The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      References

      Xia W, Xu J, Yu G, Yao G, Xu K, Ma X, Zhang N, Liu B, Li T, Lin Z, Chen X, Li L, Wang Q, Shi D, Shi S, Zhang Y, Song W, Jin H, Hu L, Bu Z, Wang Y, Na J, Xie W, Sun YP. Resetting histone modifications during human parental-to-zygotic transition. Science. 2019 Jul 26;365(6451):353-360. doi: 10.1126/science.aaw5118. Epub 2019 Jul 4. PMID: 31273069.

      Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.

      Fang R, Barbera AJ, Xu Y, Rutenberg M, Leonor T, Bi Q, Lan F, Mei P, Yuan GC, Lian C, Peng J, Cheng D, Sui G, Kaiser UB, Shi Y, Shi YG. Human LSD2/KDM1b/AOF1 regulates gene transcription by modulating intragenic H3K4me2 methylation. Mol Cell. 2010 Jul 30;39(2):222-33. doi: 10.1016/j.molcel.2010.07.008. PMID: 20670891; PMCID: PMC3518444.

      Ancelin K, Syx L, Borensztein M, Ranisavljevic N, Vassilev I, Briseño-Roa L, Liu T, Metzger E, Servant N, Barillot E, Chen CJ, Schüle R, Heard E. Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. Elife. 2016 Feb 2;5:e08851. doi: 10.7554/eLife.08851. PMID: 26836306; PMCID: PMC4829419.

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors did a great job addressing the weaknesses I raised in the previous round of review, except on the generalizability of the current result in the larger context of multi-attribute decision-making. It is not really a weakness of the manuscript but more of a limitation of the studied topic, so I want to keep this comment for public readers.

      The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, where a multiplicative rule is hard to formulate?

      We thank the reviewer for the comment. With regards whether the current type of stimuli may have biased participants to use an additive rule rather, we believe many other forms of stimuli for representing choice attributes would be equally likely to cause a similar bias. This is because the additive strategy is an inherently simplistic and natural way to integrate different pieces of non-interacting information. More importantly, even though it is easy to employ an additive strategy, most participants still demonstrated some levels of employing the multiplicative rule. However, it would indeed be interesting for future studies to explore whether the current composite model remains dominant in situations where the optimal solutions require an additive or subtractive rule, such as those concerning quality and price.

      “The same would apply even with a different choice of cues as long as the information is conveyed by two independent visual features.”

      “While the additive strategy is a natural and simple approach for integrating non-interacting pieces of information, to some extent, participants also used the multiplicative strategy that was optimal in the current experiment. A general question for such composite models is whether people mix two strategies in a consistent manner on every trial or whether there is some form of probabilistic selection occurring between the two strategies on each trial such that only one strategy is used on any given trial while, on average, one strategy is more probable than the other. It would also be interesting to examine whether a composite model is appropriate in contexts where the optimal solution is additive or subtractive, such as those concerning quality and price.”


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The current study provided a follow-up analysis using published datasets focused on the individual variability of both the distraction effect (size and direction) and the attribute integration style, as well as the association between the two. The authors tried to answer the question of whether the multiplicative attribute integration style concurs with a more pronounced and positively oriented distraction effect.

      Strengths:

      The analysis extensively examined the impacts of various factors on decision accuracy, with a particular focus on using two-option trials as control trials, following the approach established by Cao & Tsetsos (2022). The statistical significance results were clearly reported.

      The authors meticulously conducted supplementary examinations, incorporating the additional term HV+LV into GLM3. Furthermore, they replaced the utility function from the expected value model with values from the composite model.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #1 Comment 1

      Weaknesses:

      There are several weaknesses in terms of theoretical arguments and statistical analyses.

      First, the manuscript suggests in the abstract and at the beginning of the introduction that the study reconciled the "different claims" about "whether distraction effect operates at the level of options' component attributes rather than at the level of their overall value" (see line 13-14), but the analysis conducted was not for that purpose. Integrating choice attributes in either an additive or multiplicative way only reflects individual differences in combining attributes into the overall value. The authors seemed to assume that the multiplicative way generated the overall value ("Individuals who tended to use a multiplicative approach, and hence focused on overall value", line 20-21), but such implicit assumption is at odds with the statement in line 77-79 that people may use a simpler additive rule to combine attributes, which means overall value can come from the additive rule.

      We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent. Within this manuscript, our primary focus is on the different methods of value integration in which the overall value is computed (i.e., additive, multiplicative, or both), rather than the interaction at the individual level of attributes. However, we do not exclude the possibility that the distractor effect may occur at multiple levels. Nevertheless, in light of the reviewer’s comment, we agree that we should focus the argument on whether distractors facilitate or impair decision making and downplay the separate argument about the level at which distractor effects operate. We have now revised the abstract:

      “It is widely agreed that people make irrational decisions in the presence of irrelevant distractor options. However, there is little consensus on whether decision making is facilitated or impaired by the presence of a highly rewarding distractor or whether the distraction effect operates at the level of options’ component attributes rather than at the level of their overall value. To reconcile different claims, we argue that it is important to incorporate consideration of the diversity of people’s ways of decision making. We focus on a recent debate over whether people combine choice attributes in an additive or multiplicative way. Employing a multi-laboratory dataset investigating the same decision making paradigm, we demonstrated that people used a mix of both approaches and the extent to which approach was used varied across individuals. Critically, we identified that this variability was correlated with the effect of the distractor on decision making. Individuals who tended to use a multiplicative approach to compute value, showed a positive distractor effect. In contrast, in individuals who tended to use an additive approach, a negative distractor effect (divisive normalisation) was prominent. These findings suggest that the distractor effect is related to how value is constructed, which in turn may be influenced by task and subject specificities. Our work concurs with recent behavioural and neuroscience findings that multiple distractor effects co-exist.” (Lines 12-26)

      Furthermore, we acknowledge that the current description of the additive rule could be interpreted in several ways. The current additive utility model described as:

      where  is the options’ utility,  is the reward magnitude,  is the probability, and  is the magnitude/probability weighing ratio . If we perform comparison between values according to this model (i.e., HV against LV), we would arrive at the following comparison:

      If we rearrange (1), we will arrive at:

      While equations (1) and (2) are mathematically equivalent, equation (1) illustrates the interpretation where the comparison of the utilities occurs after value integration and forming an overall value. On the other hand, equation (2) can be broadly interpreted as the comparison of individual attributes in the absence of an overall value estimate for each option. Nonetheless, while we do not exclude the possibility that the distractor effect may occur at multiple levels, we have made modifications to the main manuscript employ more consistently a terminology referring to different methods of value estimation while recognizing that our empirical results are compatible with both interpretations.

      Reviewer #1 Comment 2

      The second weakness is sort of related but is more about the lack of coherent conceptual understanding of the "additive rule", or "distractor effect operates at the attribute level". In an assertive tone (lines 77-80), the manuscript suggests that a weighted sum integration procedure of implementing an "additive rule" is equal to assuming that people compare pairs of attributes separately, without integration. But they are mechanistically distinct. The additive rule (implemented using the weighted sum rule to combine probability and magnitude within each option and then applying the softmax function) assumes value exists before comparing options. In contrast, if people compare pairs of attributes separately, preference forms based on the within-attribute comparisons. Mathematically these two might be equivalent only if no extra mechanisms (such as inhibition, fluctuating attention, evidence accumulation, etc) are included in the within-attribute comparison process, which is hardly true in the three-option decision.

      We thank the reviewer for the comment. As described in our response to Reviewer #1 Comment 1, we are aware and acknowledge that there may be multiple possible interpretations of the additive rule. We also agree with the reviewer that there may be additional mechanisms that are involved in three- or even two- option decisions, but these would require additional studies to tease apart. Another motivation for the approach used here, which does not explicitly model the extra mechanisms the reviewer refers to was due to the intention of addressing and integrating findings from previous studies using the same dataset [i.e. (Cao & Tsetsos, 2022; Chau et al., 2020)]. Lastly, regardless of the mechanistic interpretation, our results show a systematic difference in the process of value estimation. Modifications to the manuscript text have been made consistent with our motivation (please refer to the reply and the textual changes proposed in response to the reviewer’s previous comment: Reviewer #1 Comment 1).

      Reviewer #1 Comment 3

      Could the authors comment on the generalizability of the current result? The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, whether a multiplicative rule is hard to formulate?

      We thank the reviewer for the comment. We agree with the observation that the stimulus space, with colour linearly correlated with magnitude, and orientation linearly correlated with probability, may bias subjects towards an additive rule. But that’s indeed the point: in order to maximise reward, subjects should have focused on the outcome space without being driven by the stimulus space. In practice, people are more or less successful in such endeavour. Nevertheless, we argue that the specific choice of visual stimuli we used is no more biased towards additive space than any other. In fact, as long as two or more pieces of information are provided for each option, as opposed to a single cue whose value was previously learned, there will always be a bias towards an additive heuristic (a linear combination), regardless of whether the cues are shapes, colours, graphs, numbers, words.

      As the reviewer suggested, the dataset analyzed in the current manuscript suggests that the participants were leaning towards the additive rule. Although there was a general tendency using the additive rule while choosing between the rectangular bars, we can still observe a spread of individuals using either, or both, additive and multiplicative rules, suggesting that there was indeed diversity in participants’ decision making strategies in our data.

      In previous studies, it was observed that human and non-human individuals used a mix of multiplicative and additive rules when they were tested on experimental paradigms different from ours (Bongioanni et al., 2021; Farashahi et al., 2019; Scholl et al., 2014). It was also observed that positive and negative distractor effects can be both present in the same data set when human and non-human individuals made decisions about food and social partner (Chang et al., 2019; Louie et al., 2013). It was less clear in the past whether the precise way a distractor affects decision making (i.e., positive/negative distractor effect) is related to the use of decision strategy (i.e., multiplicative/additive rules) and this is exactly what we are trying to address in this manuscript. A follow-up study looking at neural data (such as functional magnetic resonance imaging data) could provide a better understanding of the mechanistic nature of the relationship between distractor effects and decision strategy that we identified here.

      We agree with the reviewer that it is true that a multiplicative strategy may not be applicable to some decision contexts. Here it is important to look at the structure of the optimal solution (the one maximizing value in the long run). Factors modulating value (such as probability and temporal delay) require a non-linear (e.g., multiplicative solution), while factors of the cost-benefit form (such as effort and price) require a linear solution (e.g., subtraction). In the latter scenario the additive heuristic would coincide with the optimal solution, and the effect addressed in this study may not be revealed. Nonetheless, the present data supports the notion of distinct neural mechanisms at least for probabilistic decision-making, and is likely applicable to decision-making in general.

      Our findings, in conjunction with the literature, also suggest that a positive distractor effect could be a general phenomenon in decision mechanisms that involve the medial prefrontal cortex. For example, it has been shown that the positive distractor effect is related to a decision mechanism linked to medial prefrontal cortex [especially the ventromedial prefrontal cortex (Chau et al., 2014; Noonan et al., 2017)]. It is also known a similar brain region is involved not only when individuals are combining information using a multiplicative strategy (Bongioanni et al., 2021), but also when they are combining information to evaluate new experience or generalize information (Baram et al., 2021; Barron et al., 2013; Park et al., 2021). We have now revised the Discussion to explain this:

      “In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 260-274)

      Reviewer #1 Comment 4

      The authors did careful analyses on quantifying the "distractor effect". While I fully agree that it is important to use the matched two-option trials and examine the interaction terms (DV-HV)T as a control, the interpretation of the results becomes tricky when looking at the effects in each trial type. Figure 2c shows a positive DV-HV effect in two-option trials whereas the DV-HV effect was not significantly stronger in three-option trials. Further in Figure 5b,c, in the Multiplicative group, the effect of DV-HV was absent in the two-option trials and present in the three-option trials. In the Additive group, however, the effect of DV-HV was significantly positive in the two-option trials but was significantly lowered in the three-option trials. Hence, it seems the different distractor effects were driven by the different effects of DV-HV in the two-option trials, rather than the three-option trials?

      We thank the reviewer for the comment. While it may be a bit more difficult to interpret, the current method of examining the (DV−HV)T term rather than (DV−HV) term was used because it was the approach used in a previous study (Cao & Tsetsos, 2022).

      During the design of the original experiments, trials were generated pseudo-randomly until the DV was sufficiently decorrelated from HV−LV. While this method allows for better group-level examination of behaviour, Cao and Tsetsos were concerned that this approach may have introduced unintended confounding covariations to some trials. In theory, one of the unintended covariations could occur between the DV and specific sets of reward magnitude and probability of the HV and LV. The covariation between parameters can lead to an observable positive distractor effect in the DV−HV as a consequence of the attraction effect or an unintended byproduct of using an additive method of integrating attributes [for further elaboration, please refer to Figure 1 in (Cao & Tsetsos, 2022)]. While it may have some limitations, the approach suggested by Cao and Tsetsos has the advantage of leveraging the DV−HV term to absorb any variance contributed by possible confounding factors such that true distractor effects, if any, can be detected using the (DV−HV)T term.

      Reviewer #1 Comment 5

      Note that the pattern described above was different in Supplementary Figure 2, where the effect of DV-HV on the two-option trials was negative for both Multiplicative and Additive groups. I would suggest considering using Supplementary Figure 2 as the main result instead of Figure 5, as it does not rely on multiplicative EV to measure the distraction effect, and it shows the same direction of DV-HV effect on two-option trials, providing a better basis to interpret the (DV-HV)T effect.

      We thank the reviewer for the comments and suggestion. However, as mentioned in the response to Reviewer #1 Comment 4, the current method of analysis adopted in the manuscript and the interpretation of only (DV−HV)T is aimed to address the possibility that the (DV−HV) term may be capturing some confounding effects due to covariation. Given that the debate that is addressed specifically concerns the (DV−HV)T term, we elected to display Figure 5 within the main text and keep the results of the regression after replacing the utility function with the composite model as Supplementary Figure 5 (previously labelled as Supplementary Figure 2).

      Reviewer #2 (Public Review):

      This paper addresses the empirical demonstration of "distractor effects" in multi-attribute decision-making. It continues a debate in the literature on the presence (or not) of these effects, which domains they arise in, and their heterogeneity across subjects. The domain of the study is a particular type of multi-attribute decision-making: choices over risky lotteries. The paper reports a re-analysis of lottery data from multiple experiments run previously by the authors and other laboratories involved in the debate.

      Methodologically, the analysis assumes a number of simple forms for how attributes are aggregated (adaptively, multiplicatively, or both) and then applies a "reduced form" logistic regression to the choices with a number of interaction terms intended to control for various features of the choice set. One of these interactions, modulated by ternary/binary treatment, is interpreted as a "distractor effect."

      The claimed contribution of the re-analysis is to demonstrate a correlation in the strength/sign of this treatment effect with another estimated parameter: the relative mixture of additive/multiplicative preferences.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #2 Comment 1

      Major Issues

      (1) How to Interpret GLM 1 and 2

      This paper, and others before it, have used a binary logistic regression with a number of interaction terms to attempt to control for various features of the choice set and how they influence choice. It is important to recognize that this modelling approach is not derived from a theoretical claim about the form of the computational model that guides decision-making in this task, nor an explicit test for a distractor effect. This can be seen most clearly in the equations after line 321 and its corresponding log-likelihood after 354, which contain no parameter or test for "distractor effects". Rather the computational model assumes a binary choice probability and then shoehorns the test for distractor effects via a binary/ternary treatment interaction in a separate regression (GLM 1 and 2). This approach has already led to multiple misinterpretations in the literature (see Cao & Tsetsos, 2022; Webb et al., 2020). One of these misinterpretations occurred in the datasets the authors studied, in which the lottery stimuli contained a confound with the interaction that Chau et al., (2014) were interpreting as a distractor effect (GLM 1). Cao & Tsetsos (2022) demonstrated that the interaction was significant in binary choice data from the study, therefore it can not be caused by a third alternative. This paper attempts to address this issue with a further interaction with the binary/ternary treatment (GLM 2). Therefore the difference in the interaction across the two conditions is claimed to now be the distractor effect. The validity of this claim brings us to what exactly is meant by a "distractor effect."

      The paper begins by noting that "Rationally, choices ought to be unaffected by distractors" (line 33). This is not true. There are many normative models that allow for the value of alternatives (even low-valued "distractors") to influence choices, including a simple random utility model. Since Luce (1959), it has been known that the axiom of "Independence of Irrelevant Alternatives" (that the probability ratio between any two alternatives does not depend on a third) is an extremely strong axiom, and only a sufficiency axiom for a random utility representation (Block and Marschak, 1959). It is not a necessary condition of a utility representation, and if this is our definition of rational (which is highly debatable), not necessary for it either. Countless empirical studies have demonstrated that IIA is falsified, and a large number of models can address it, including a simple random utility model with independent normal errors (i.e. a multivariate Probit model). In fact, it is only the multinomial Logit model that imposes IIA. It is also why so much attention is paid to the asymmetric dominance effect, which is a violation of a necessary condition for random utility (the Regularity axiom).

      So what do the authors even mean by a "distractor effect." It is true that the form of IIA violations (i.e. their path through the probability simplex as the low-option varies) tells us something about the computational model underlying choice (after all, different models will predict different patterns). However we do not know how the interaction terms in the binary logit regression relate to the pattern of the violations because there is no formal theory that relates them. Any test for relative value coding is a joint test of the computational model and the form of the stochastic component (Webb et al, 2020). These interaction terms may simply be picking up substitution patterns that can be easily reconciled with some form of random utility. While we can not check all forms of random utility in these datasets (because the class of such models is large), this paper doesn't even rule any of these models out.

      We thank the reviewer for the comment. In this study, one objective is to address an issue raised by Cao and Tsetsos (2022), suggesting that the distractor effect claimed in the Chau et al. (2014) study was potentially confounded by unintended correlation introduced between the distractor and the chooseable options. They suggested that this could be tested by analyzing the control binary trials and the experimental ternary trials in a single model (i.e., GLM2) and introducing an interaction term (DV−HV)T. The interaction term can partial out any unintended confound and test the distractor effect that was present specifically in the experimental ternary trials. We adopted these procedures in our current studies and employed the interaction term to test the distractor effects. The results showed that overall there was no significant distractor effect in the group. We agree with the reviewer’s comment that if we were only analysing the ternary trials, a multinomial probit model would be suitable because it allows noise correlation between the choices. Alternatively, had a multinomial logistic model been applied, a Hausman-McFadden Test could be run to test whether the data violates the assumption of independence of irrelevant alternatives (IIA). However, in our case, a binomial model is preferred over a multinomial model because of: (1) the inclusion of the binary trials, and (2) the small number of trials in which the distractor was chosen (the median was 4% of all ternary trials).

      However, another main objective of this study is to consider the possibility that the precise distractor effect may vary across individuals. This is exactly why we employed the composite model to estimate individual’s decision making strategy and investigated how that varied with the precise way the distractor influenced decision making.

      In addition, we think that the reviewer here is raising a profound point and one with which we are in sympathy; it is true that random noise utility models can predict deviations from the IIA axiom. Central to these approaches is the notion that the representations of the values of choice options are noisy. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion as if each sample were being drawn from a distribution. As a consequence, the value of a distractor that is “drawn” during a decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Our understanding is that Webb, Louie and colleagues (Louie et al., 2013; Webb et al., 2020) suggest an explanation approximately along these lines when they reported a negative distractor effect during some decisions, i.e., they follow the predictions of divisive normalization suggesting that decisions become more random as the distractor’s value is greater.

      An alternative approach, however, assumes that rather than noise in the representation of the option itself, there is noise in the comparison process when the two options are compared. This is exemplified in many influential decision making models including evidence accumulation models such as drift diffusion models (Shadlen & Shohamy, 2016) and recurrent neural network models of decision making (Wang, 2008). It is this latter type of model that we have used in our previous investigations (Chau et al., 2020; Kohl et al., 2023). However, these two approaches are linked both in their theoretical origin and in the predictions that they make in many situations (Shadlen & Shohamy, 2016). We therefore clarify that this is the case in the revised manuscript as follows:

      “In the current study and in previous work we have used or made reference to models of decision making that assume that a noisy process of choice comparison occurs such as recurrent neural networks and drift diffusion models (Shadlen & Shohamy, 2016; Wang, 2008). Under this approach, positive distractor effects are predicted when the comparison process becomes more accurate because of an impact on the noisy process of choice comparison (Chau et al., 2020; Kohl et al., 2023). However, it is worth noting that another class of models might assume that a choice representation itself is inherently noisy. According to this approach, on any given decision a sample is drawn from a distribution of value estimates in a noisy representation of the option. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion. As a consequence, the value of a distractor that is “drawn” during decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Louie and colleagues (Louie et al., 2013) suggest an explanation approximately along these lines when they reported a positive distractor effect during some decisions. Such different approaches share theoretical origins (Shadlen & Shohamy, 2016) and make related predictions about the impact of distractors on decision making.” (Lines 297-313)

      Reviewer #2 Comment 2

      (2) How to Interpret the Composite (Mixture) model?

      On the other side of the correlation are the results from the mixture model for how decision-makers aggregate attributes. The authors report that most subjects are best represented by a mixture of additive and multiplicative aggregation models. The authors justify this with the proposal that these values are computed in different brain regions and then aggregated (which is reasonable, though raises the question of "where" if not the mPFC). However, an equally reasonable interpretation is that the improved fit of the mixture model simply reflects a misspecification of two extreme aggregation processes (additive and EV), so the log-likelihood is maximized at some point in between them.

      One possibility is a model with utility curvature. How much of this result is just due to curvature in valuation? There are many reasonable theories for why we should expect curvature in utility for human subjects (for example, limited perception: Robson, 2001, Khaw, Li Woodford, 2019; Netzer et al., 2022) and of course many empirical demonstrations of risk aversion for small stakes lotteries. The mixture model, on the other hand, has parametric flexibility.

      There is also a large literature on testing expected utility jointly with stochastic choice, and the impact of these assumptions on parameter interpretation (Loomes & Sugden, 1998; Apesteguia & Ballester, 2018; Webb, 2019). This relates back to the point above: the mixture may reflect the joint assumption of how choice departs from deterministic EV.

      We thank the reviewer for the comment. They are indeed right to mention the vast literature on curvature in subjective valuation; however it is important to stress that the predictions of the additive model with linear basis functions are quite distinct for the predictions of a multiplicative model with non-linear basis functions. We have tested the possibility that participants’ behaviour was better explained by the latter and we showed that this was not the case. Specifically, we have added and performed model fitting on an additional model with utility curvature based on prospect theory (Kahneman & Tversky, 1979) with the weighted probability function suggested by (Prelec, 1998):

      where  and  represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively.  is the weighted magnitude and  is the weighted probability, while  and  are the corresponding distortion parameters. This prospect theory (PT) model is included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720). We have now included these results in the main text and Supplementary Figure 2:

      “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)

      Reviewer #2 Comment 3

      3) So then how should we interpret the correlation that the authors report?

      On one side we have the impact of the binary/ternary treatment which demonstrates some impact of the low value alternative on a binary choice probability. This may reflect some deep flaws in existing theories of choice, or it may simply reflect some departure from purely deterministic expected value maximization that existing theories can address. We have no theory to connect it to, so we cannot tell. On the other side of the correlation, we have a mixture between additive and multiplicative preferences over risk. This result may reflect two distinct neural processes at work, or it may simply reflect a misspecification of the manner in which humans perceive and aggregate attributes of a lottery (or even just the stimuli in this experiment) by these two extreme candidates (additive vs. EV). Again, this would entail some departure from purely deterministic expected value maximization that existing theories can address.

      It is entirely possible that the authors are reporting a result that points to the more exciting of these two possibilities. But it is also possible (and perhaps more likely) that the correlation is more mundane. The paper does not guide us to theories that predict such a correlation, nor reject any existing ones. In my opinion, we should be striving for theoretically-driven analyses of datasets, where the interpretation of results is clearer.

      We thank the reviewer for their clear comments. Based on our responses to the previous comments it should be apparent that our results are consistent with several existing theories of choice, so we are not claiming that there are deep flaws in them, but distinct neural processes (additive and multiplicative) are revealed, and this does not reflect a misspecification in the modelling. We have revised our manuscript in the light of the reviewer’s comments in the hope of clarifying the theoretical background which informed both our data analysis and our data interpretation.

      First, we note that there are theoretical reasons to expect a third option might impact on choice valuation. There is a large body of work suggesting that a third option may have an impact on the values of two other options (indeed Reviewer #2 refers to some of this work in their Reviewer #2 Comment 1), but the body of theoretical work originates partly in neuroscience and not just in behavioural economics. In many sensory systems, neural activity changes with the intensity of the stimuli that are sensed. Divisive normalization in sensory systems, however, describes the way in which such neural responses are altered also as a function of other adjacent stimuli (Carandini & Heeger, 2012; Glimcher, 2022; Louie et al., 2011, 2013). The phenomenon has been observed at neural and behavioural levels as a function not just of the physical intensity of the other stimuli but as a function of their associated value (Glimcher, 2014, 2022; Louie et al., 2011, 2015; Noonan et al., 2017; Webb et al., 2020).

      Analogously there is an emerging body of work on the combinatorial processes that describe how multiple representational elements are integrated into new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). These studies have originated in neuroscience, just as was the case with divisive normalization, but they may have implications for understanding behaviour. For example, they might be linked to behavioural observations that the values assigned to bundles of goods are not necessarily the sum of the values of the individual goods (Hsee, 1998; List, 2002). One neuroscience fact that we know about such processes is that, at an anatomical level, they are linked to the medial frontal cortex (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). A second neuroscientific fact that we know about medial frontal cortex is that it is linked to any positive effects that distractors might have on decision making (Chau et al., 2014; Noonan et al., 2017). Therefore, we might make use of these neuroscientific facts and theories to predict a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. This is precisely what we did; we predicted the correlation on the basis of this body of work and when we tested to see if it was present, we found that indeed it was. It may be the case that other behavioural economics theories offer little explanation of the associations and correlations that we find. However, we emphasize that this association is predicted by neuroscientific theory and in the revised manuscript we have attempted to clarify this in the Introduction and Discussion sections:

      “Given the overlap in neuroanatomical bases underlying the different methods of value estimation and the types of distractor effects, we further explored the relationship. Critically, those who employed a more multiplicative style of integrating choice attributes also showed stronger positive distractor effects, whereas those who employed a more additive style showed negative distractor effects. These findings concur with neural data demonstrating that the medial prefrontal cortex (mPFC) computes the overall values of choices in ways that go beyond simply adding their components together, and is the neural site at which positive distractor effects emerge (Barron et al., 2013; Bongioanni et al., 2021; Chau et al., 2014; Fouragnan et al., 2019; Noonan et al., 2017; Papageorgiou et al., 2017), while divisive normalization was previously identified in the posterior parietal cortex (PPC) (Chau et al., 2014; Louie et al., 2011).” (Lines 109-119)

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.

      In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)

      Reviewer #2 Comment 4

      (4) Finally, the results from these experiments might not have external validity for two reasons. First, the normative criterion for multi-attribute decision-making differs depending on whether the attributes are lotteries or not (i.e. multiplicative vs additive). Whether it does so for humans is a matter of debate. Therefore if the result is unique to lotteries, it might not be robust for multi-attribute choice more generally. The paper largely glosses over this difference and mixes literature from both domains. Second, the lottery information was presented visually and there is literature suggesting this form of presentation might differ from numerical attributes. Which is more ecologically valid is also a matter of debate.

      We thank the reviewer for the comment. Indeed, they are right that the correlation we find between value estimation style and distractor effects may not be detected in all contexts of human behaviour. What the reviewer suggests goes along the same lines as our response to Reviewer #1 Comment 3, multi-attribute value estimation may have different structure: in some cases, the optimal solution may require a non-linear (e.g., multiplicative) response as in probabilistic or delayed decisions, but other cases (e.g., when estimating the value of a snack based on its taste, size, healthiness, price) a linear integration would suffice. In the latter kind of scenarios, both the optimal and the heuristic solutions may be additive and people’s value estimation “style” may not be teased apart. However, if different neural mechanisms associated with difference estimation processes are observed in certain scenarios, it suggests that these mechanisms are always present, even in scenarios where they do not alter the predictions. Probabilistic decision-making is also pervasive in many aspects of daily life and not just limited to the case of lotteries.

      While behaviour has been found to differ depending on whether lottery information is presented graphically or numerically, there is insufficient evidence to suggest biases towards additive or multiplicative evaluation, or towards positive or negative distractor effects. As such, we may expect that the correlation that we reveal in this paper, grounded in distinct neural mechanisms, would still hold even under different circumstances.

      Taking previous literature as examples, similar patterns of behaviour have been observed in humans when making decisions during trinary choice tasks. In a study conducted by Louie and colleagues (Louie et al., 2013; Webb et al., 2020), human participants performed a snack choice task where their behaviour could be modelled by divisive normalization with biphasic response (i.e., both positive and negative distractor effects). While these two studies only use a single numerical value of price for behavioural modelling, these prices should originate from an internal computation of various attributes related to each snack that are not purely related to lotteries. Expanding towards the social domain, studies of trinary decision making have considered face attractiveness and averageness (Furl, 2016), desirability of hiring (Chang et al., 2019), as well as desirability of candidates during voting (Chang et al., 2019). These choices involve considering various attributes unrelated to lotteries or numbers and yet, still display a combination of positive distractor and negative distractor (i.e. divisive normalization) effects, as in the current study. In particular, the experiments carried out by Chang and colleagues (Chang et al., 2019) involved decisions in a social context that resemble real-world situations. These findings suggests that both types of distractor effects can co-exist in other value based decision making tasks (Li et al., 2018; Louie et al., 2013) as well as decision making tasks in social contexts (Chang et al., 2019; Furl, 2016).

      Reviewer #2 Comment 5

      Minor Issues:

      The definition of EV as a normative choice baseline is problematic. The analysis requires that EV is the normative choice model (this is why the HV-LV gap is analyzed and the distractor effect defined in relation to it). But if the binary/ternary interaction effect can be accounted for by curvature of a value function, this should also change the definition of which lottery is HV or LV for that subject!

      We thank the reviewer for the comment. While the initial part of the paper discussed results that were defined by the EV model, the results shown in Supplementary Figure 2 were generated by replacing the utility function based on values obtained by using the composite model. Here, we have also redefined the definition of HV or LV for each subject depending on the updated value generated by the composite model prior to the regression.

      References

      Apesteguia, J. & Ballester, M. Monotone stochastic choice models: The case of risk and time preferences. Journal of Political Economy (2018).

      Block, H. D. & Marschak, J. Random Orderings and Stochastic Theories of Responses. Cowles Foundation Discussion Papers (1959).

      Khaw, M. W., Li, Z. & Woodford, M. Cognitive Imprecision and Small-Stakes Risk Aversion. Rev. Econ. Stud. 88, 1979-2013 (2020).

      Loomes, G. & Sugden, R. Testing Different Stochastic Specificationsof Risky Choice. Economica 65, 581-598 (1998).

      Luce, R. D. Indvidual Choice Behaviour. (John Wiley and Sons, Inc., 1959).

      Netzer, N., Robson, A. J., Steiner, J. & Kocourek, P. Endogenous Risk Attitudes. SSRN Electron. J. (2022) doi:10.2139/ssrn.4024773.

      Robson, A. J. Why would nature give individuals utility functions? Journal of Political Economy 109, 900-914 (2001).

      Webb, R. The (Neural) Dynamics of Stochastic Choice. Manage Sci 65, 230-255 (2019).

      Reviewer #3 (Public Review):

      Summary:

      The way an unavailable (distractor) alternative impacts decision quality is of great theoretical importance. Previous work, led by some of the authors of this study, had converged on a nuanced conclusion wherein the distractor can both improve (positive distractor effect) and reduce (negative distractor effect) decision quality, contingent upon the difficulty of the decision problem. In very recent work, Cao and Tsetsos (2022) reanalyzed all relevant previous datasets and showed that once distractor trials are referenced to binary trials (in which the distractor alternative is not shown to participants), distractor effects are absent. Cao and Tsetsos further showed that human participants heavily relied on additive (and not multiplicative) integration of rewards and probabilities.

      The present study by Wong et al. puts forward a novel thesis according to which interindividual differences in the way of combining reward attributes underlie the absence of detectable distractor effect at the group level. They re-analysed the 144 human participants and classified participants into a "multiplicative integration" group and an "additive integration" group based on a model parameter, the "integration coefficient", that interpolates between the multiplicative utility and the additive utility in a mixture model. They report that participants in the "multiplicative" group show a negative distractor effect while participants in the "additive" group show a positive distractor effect. These findings are extensively discussed in relation to the potential underlying neural mechanisms.

      Strengths:

      - The study is forward-looking, integrating previous findings well, and offering a novel proposal on how different integration strategies can lead to different choice biases.

      - The authors did an excellent job of connecting their thesis with previous neural findings. This is a very encompassing perspective that is likely to motivate new studies towards a better understanding of how humans and other animals integrate information in decisions under risk and uncertainty.

      - Despite that some aspects of the paper are very technical, methodological details are well explained and the paper is very well written.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #3 Comment 1

      Weaknesses:

      The authors quantify the distractor variable as "DV - HV", i.e., the relative distractor variable. Do the conclusions hold when the distractor is quantified in absolute terms (as "DV", see also Cao & Tsetsos, 2023)? Similarly, the authors show in Suppl. Figure 1 that the inclusion of a HV + LV regressor does not alter their conclusions. However, the (HV + LV)*T regressor was not included in this analysis. Does including this interaction term alter the conclusions considering there is a high correlation between (HV + LV)*T and (DV - HV)*T? More generally, it will be valuable if the authors assess and discuss the robustness of their findings across different ways of quantifying the distractor effect.

      We thank the reviewer for the comment. In the original manuscript we had already demonstrated that the distractor effect was related to the integration coefficient using a number of complementary analyses. They include Figure 5 based on GLM2, Supplementary Figure 3 based on GLM3 (i.e., adding the HV+LV term to GLM2), and Supplementary Figure 4 based on GLM2 but applying the utility estimate from the composite model instead of expected value (EV). These three sets of analyses produced comparable results. The reason why we elected not to include the (HV+LV)T term in GLM3 (Supplementary Figure 3) was due to the collinearity between the regressors in the GLM. If this term is included in GLM3, the variance inflation factor (VIF) would exceed an acceptable level of 4 for some regressors. In particular, the VIF for the (HV+LV) and (HV+LV)T regressors is 5.420, while the VIF for (DV−HV) and (DV−HV)T is 4.723.

      Here, however, we consider the additional analysis suggested by the reviewer and test whether similar results are obtained. We constructed GLM4 including the (HV+LV)T term but replacing the relative distractor value (DV-HV) with the absolute distractor value (DV) in the main term and its interactions, as follows:

      GLM4:

      A significant negative (DV)T effect was found for the additive group [t(72)=−2.0253, p=0.0465] while the multiplicative group had a positive trend despite not reaching significance. Between the two groups, the (DV)T term was significantly different [t(142)=2.0434, p=0.0429]. While these findings suggest that the current conclusions could be partially replicated, simply replacing the relative distractor value with the absolute value in the previous analyses resulted in non-significant findings. Taking these results together with the main findings, it is possible to conclude that the positive distractor effect is better captured using the relative DV-HV term rather than the absolute DV term. This would be consistent with the way in which option values are envisaged to interact with one another in the mutual inhibition model (Chau et al., 2014, 2020) that generates the positive distractor effect. The model suggests that evidence is accumulated as the difference between the excitatory input from the option (e.g. the HV option) and the pooled inhibition contributed partly by the distractor. We have now included these results in the manuscript:

      “Finally, we performed three additional analyses that revealed comparable results to those shown in Figure 5. In the first analysis, reported in Supplementary Figure 3, we added an  term to the GLM, because this term was included in some analyses of a previous study that used the same dataset (Chau et al., 2020). In the second analysis, we added an  term to the GLM. We noticed that this change led to inflation of the collinearity between the regressors and so we also replaced the (DV−HV) term by the DV term to mitigate the collinearity (Supplementary Figure 4). In the third analyses, reported in Supplementary Figure 5, we replaced the utility terms of GLM2. Since the above analyses involved using HV, LV, and DV values defined by the normative Expected Value model, here, we re-defined the values using the composite model prior to applying GLM2. Overall, in the Multiplicative Group a significant positive distractor effect was found in Supplementary Figures 3 and 4. In the Additive Group a significant negative distractor effect was found in Supplementary Figures 3 and 5. Crucially, all three analyses consistently showed that the distractor effects were significantly different between the Multiplicative Group and the Additive Group.” (Lines 225-237)

      Reviewer #3 Comment 2

      The central finding of this study is that participants who integrate reward attributes multiplicatively show a positive distractor effect while participants who integrate additively show a negative distractor effect. This is a very interesting and intriguing observation. However, there is no explanation as to why the integration strategy covaries with the direction of the distractor effect. It is unlikely that the mixture model generates any distractor effect as it combines two "context-independent" models (additive utility and expected value) and is fit to the binary-choice trials. The authors can verify this point by quantifying the distractor effect in the mixture model. If that is the case, it will be important to highlight that the composite model is not explanatory; and defer a mechanistic explanation of this covariation pattern to future studies.

      We thank the reviewer for the comment. Indeed, the main purpose of applying the mixture model was to identify the way each participants combined attributes and, as the reviewer pointed out, the mixture model per se is context independent. While we acknowledge that the mixture model is not a mechanistic explanation, there is a theoretical basis for the observation that these two factors are linked.

      Firstly, studies that have examined the processes involved when humans combine and integrate different elements to form new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023) have implicated the medial frontal cortex as a crucial region (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). Meanwhile, previous studies have also identified that positive distractor effects are linked to the medial frontal cortex (Chau et al., 2014; Noonan et al., 2017). Therefore, the current study utilized these two facts to establish the basis for a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. Nevertheless, we agree with the reviewer that it will be an important future direction to look at how the covariation pattern emerges in a computational model. We have revised the manuscript in an attempt to address this issue.

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.

      In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)

      Reviewer #3 Comment 3

      -  Correction for multiple comparisons (e.g., Bonferroni-Holm) was not applied to the regression results. Is the "negative distractor effect in the Additive Group" (Fig. 5c) still significant after such correction? Although this does not affect the stark difference between the distractor effects in the two groups (Fig. 5a), the classification of the distractor effect in each group is important (i.e., should future modelling work try to capture both a negative and a positive effect in the two integration groups? Or just a null and a positive effect?).

      We thank the reviewer for the comment. We have performed Bonferroni-Holm correction and as the reviewer surmised, the negative distractor effect in the additive group becomes non-significant. However, we have to emphasize that our major claim is that there was a covariation between decision strategy (of combining attributes) and distractor effect (as seen in Figure 4). That analysis does not imply multiple comparisons. The analysis in Figure 5 that splits participants into two groups was mainly designed to illustrate the effects for an easier understanding by a more general audience. In many cases, the precise ways in which participants are divided into subgroups can have a major impact on whether each individual group’s effects are significant or not. It may be possible to identify an optimal way of grouping, but we refrained from taking such a trial-and-error approach, especially for the analysis in Figure 5 that simply supplements the point made in Figure 4. The key notion we would like the readers to take away is that there is a spectrum of distractor effects (ranging from negative to positive) that will vary depending on how the choice attributes were integrated.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer #1 Recommendations 1

      Enhancements are necessary for the quality of the scientific writing. Several sentences have been written in a negligent manner and warrant revision to ensure a higher level of rigor. Moreover, a number of sentences lack appropriate citations, including but not restricted to:

      - Line 39-41.

      - Line 349-350 (also please clarify what it means by parameter estimate" is very accurate: correlation?).

      We thank the reviewer for the comment. We have made revisions to various parts of the manuscript to address the reviewer’s concerns.

      “Intriguingly, most investigations have considered the interaction between distractors and chooseable options either at the level of their overall utility or at the level of their component attributes, but not both (Chau et al., 2014, 2020; Gluth et al., 2018).” (Lines 40-42)

      “Additional simulations have shown that the fitted parameters can be recovered with high accuracy (i.e., with a high correlation between generative and recovered parameters).” (Lines 414-416)

      Reviewer #1 Recommendations 2

      Some other minor suggestions:

      - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).

      - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).

      - Adding some figure titles on Figure 2 so it is clear what each panel stands for.

      - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?

      - Line 298: binomial linking function (instead of binomial distribution).

      - Line 100: composite, not compositive.

      - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?

      We thank the reviewer for the suggestions. We have made revisions to the title and various parts of the manuscript to address the reviewer’s concerns.

      - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).

      We have now revised the manuscript:

      “Distractor effects in decision making are related to the individual’s style of integrating choice attributes” (title of the manuscript)

      “More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 99-100)

      “While these results may seem to suggest that a distractor effect was not present at an overall group level, we argue that the precise way in which a distractor affects decision making is related to how individuals integrate the attributes.” (Lines 164-167)

      - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).

      We have also modified all Figures to remove the intercept.

      - Adding some figure titles on Figure 2 so it is clear what each panel stands for.

      We have added titles accordingly.

      - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?

      In conjunction with addressing Reviewer #3 Recommendation 6, we have adapted the violin plots into histograms for a better representation of the values.

      - Line 298: binomial linking function (instead of binomial distribution).

      - Line 100: composite, not compositive.

      - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?

      We have made revisions accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #2 Recommendations 1

      Line 294. The definition of DV, HV, LV is not sufficient. Presumably, these are the U from the following sections? Or just EV? But this is not explicitly stated, rather they are vaguely referred to as values." The computational modelling section refers to them as utilities. Are these the same thing?

      We thank the reviewer for the suggestion. We have clarified that the exact method for calculating each of the values and updated the section accordingly.

      “where HV, LV, and DV refer to the values of the chooseable higher value option, chooseable lower value option, and distractor, respectively. Here, values (except those in Supplementary Figure 5) are defined as Expected Value (EV), calculated by multiplying magnitude and probability of reward.” (Lines 348-350)

      Reviewer #2 Recommendations 2

      The analysis drops trials in which the distractor was chosen. These trials are informative about the presence (or not) of relative valuation or other factors because they make such choices more (or less) likely. Ignoring them is another example of the analysis being misspecified.

      We thank the reviewer for the suggestion and this is related to Major Issue 1 raised by the same reviewer. In brief, we adopted the same methods implemented by Cao and Tsetsos (Cao and Tsetsos, 2022) and that constrained us to applying a binomial model. Please refer to our reply to Major Issue 1 for more details.

      Reviewer #2 Recommendations 3

      Some questions and suggestions on statistics and computational modeling:

      Have the authors looked at potential collinearity between the regressors in each of the GLMs?

      We thank the reviewer for the comment. For each of the following GLMs, the average variance inflation factor (VIF) has been calculated as follows:

      GLM2 using the Expected Value model:

      Author response table 1.

      GLM2 after replacing the utility function based on the normative Expected Value model with values obtained by using the composite model:

      Author response table 2.

      GLM3:

      Author response table 3.

      As indicated in the average VIF values calculated, none of them exceed 4, suggesting that the estimated coefficients were not inflated due to collinearity between the regressor in each of the GLMs.

      Reviewer #2 Recommendations 4

      - Correlation results in Figure 4. What is the regression line displayed on this plot? I suspect the regression line came from Pearson's correlation, which would be inconsistent with the Spearman's correlation reported in the text. A reasonable way would be to transform both x and y axes to the ranked data. However, I wonder why it makes sense to use ranked data for testing the correlation in this case. Those are both scalar values. Also, did the authors assess the influence of the zero integration coefficient on the correlation result? Importantly, did the authors redo the correlation plot after defining the utility function by the composite models?

      We thank the reviewer for the suggestion. The plotted line in Figure 4 was based on the Pearson’s correlation and we have modified the text to also report the Pearson’s correlation result as well.

      If we were to exclude the 32 participants with integration coefficients smaller than 1×10-6 from the analysis, we still observe a significant positive Pearson’s correlation [r(110)=0.202, p=0.0330].

      Author response image 1.

      Figure 4 after excluding 32 participants with integration coefficients smaller than 1×10-6.

      “As such, we proceeded to explore how the distractor effect (i.e., the effect of (DV−HV)T obtained from GLM2; Figure 2c) was related to the integration coefficient (η) of the optimal model via a Pearson’s correlation (Figure 4). As expected, a significant positive correlation was observed [r(142)=0.282, p=0.000631]. We noticed that there were 32 participants with integration coefficients that were close to zero (below 1×10-6). The correlation remained significant even after removing these participants [r(110)=0.202, p=0.0330].” (Lines 207-212)

      The last question relates to results already included in Supplementary Figure 5, in which the analyses were conducted using the utility function of the composite model. We notice that although there was a difference in integration coefficient between the multiplicative and additive groups, a correlational analysis did not generate significant results [r(142)=0.124, p=0.138]. It is possible that the relationship became less linear after applying the composite model utility function. However, it is noticeable that in a series of complementary analyses (Figure 5: r(142)=0.282, p=0.000631; Supplementary Figure 3: r(142)=0.278, p=0.000746) comparable results were obtained.

      Reviewer #2 Recommendations 5

      - From lines 163-165, were the models tested on only the three-option trials or both two and three-opinion trials? It is ambiguous from the description here. It might be worth checking the model comparison based on different trial types, and the current model fitting results do not tell an absolute sense of the goodness of fit. I would suggest including the correctly predicted trial proportions in each trial type from different models.

      We thank the reviewer for the suggestion. We have only modeled the two-option trials and the key reason for this is because the two-option trials can arguably provide a better estimate of participants’ style of integrating attributes as they are independent of any distractor effects. This was also the same reason why Cao and Tsetsos applied the same approach when they were re-analyzing our data (Cao and Tsetsos, 2022). We have clarified the statement accordingly.

      “We fitted these models exclusively to the Two-Option Trial data and not the Distractor Trial data, such that the fitting (especially that of the integration coefficient) was independent of any distractor effects, and tested which model best describes participants’ choice behaviours.” (Lines 175-178)

      Reviewer #2 Recommendations 6

      - Along with displaying the marginal distributions of each parameter estimate, a correlation plot of these model parameters might be useful, given that some model parameters are multiplied in the value functions.

      We thank the reviewer for the suggestion. We have also generated the correlation plot of the model parameters. The Pearson’s correlation between the magnitude/probability weighting and integration coefficient was significant [r(142)=−0.259, p=0.00170]. The Pearson’s correlation between the inverse temperature and integration coefficient was not significant [r(142)=−0.0301, p=0.721]. The Pearson’s correlation between the inverse temperature and magnitude/probability weighting was not significant [r(142)=−0.0715, p=0.394].

      “Our finding that the average integration coefficient  was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule. However, it also shows rather than being fully additive ( =0) or multiplicative ( =1), people’s choice behaviour is best described as a mixture of both. Supplementary Figure 1 shows the relationships between all the fitted parameters.” (Lines 189-193)

      Reviewer #2 Recommendations 7

      Have the authors tried any functional transformations on amounts or probabilities before applying the weighted sum? The two attributes are on entirely different scales and thus may not be directly summed together.

      We thank the reviewer for the comment. Amounts and probabilities were indeed both rescaled to the 0-1 interval before being summed, as explained in the methods (Line XXX). Additionally, we have now added and performed model fitting on an additional model with utility curvature based on the prospect theory (Kahneman & Tversky, 1979) and a weighted probability function (Prelec, 1998):

      where  and  represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively.  is the weighted magnitude and  is the weighted probability, while  and  are the corresponding distortion parameters. This prospect theory (PT) model was included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains as the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).

      “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)

      Reviewer #3 (Recommendations For The Authors):

      Reviewer #3 Recommendations 1

      - In the Introduction (around line 48), the authors make the case that distractor effects can co-exist in different parts of the decision space, citing Chau et al. (2020). However, if the distractor effect is calculated relative to the binary baseline this is no longer the case.

      - Relating to the above point, it might be useful for the authors to make a distinction between effects being non-monotonic across the decision space (within individuals) and effects varying across individuals due to different strategies adopted. These two scenarios are conceptually distinct.

      We thank the reviewer for the comment. Indeed, the ideas that distractor effects may vary across decision space and across different individuals are slightly different concepts. We have now revised the manuscript to clarify this:

      “However, as has been argued in other contexts, just because one type of distractor effect is present does not preclude another type from existing (Chau et al., 2020; Kohl et al., 2023). Each type of distractor effect can dominate depending on the dynamics between the distractor and the chooseable options. Moreover, the fact that people have diverse ways of making decisions is often overlooked. Therefore, not only may the type of distractor effect that predominates vary as a function of the relative position of the options in the decision space, but also as a function of each individual’s style of decision making.” (Lines 48-54)

      Reviewer #3 Recommendations 2

      - The idea of mixture models/strategies has strong backing from other Cognitive Science domains and will appeal to most readers. It would be very valuable if the authors could further discuss the potential level at which their composite model might operate. Are the additive and EV quantities computed and weighted (as per the integration coefficient) within a trial giving rise to a composite decision variable? Or does the integration coefficient reflect a probabilistic (perhaps competitive) selection of one strategy on a given trial? Perhaps extant neural data can shed light on this question.

      We thank the reviewer for the comment. The idea is related to whether the observed mixture in integration models derives from value being actually computed in a mixed way within each trial, or each trial involves a probabilistic selection between the additive and multiplicative strategies. We agree that this is an interesting question and to address it would require the use of some independent continuous measures to estimate the subjective values in quantitative terms (instead of using the categorical choice data). This could be done by collecting pupil size data or functional magnetic resonance imaging data, as the reviewer has pointed out. Although the empirical work is beyond the scope of the current behavioural study, it is worth bringing up this point in the Discussion:

      “The current finding involves the use of a composite model that arbitrates between the additive and multiplicative strategies. A general question for such composite models is whether people mix two strategies in a consistent manner on every trial or whether there is some form of probabilistic selection occurring between the two strategies on each trial such that only one strategy is used on any given trial while, on average, one strategy is more probable than the other. To test which is the case requires an independent estimation of subjective values in quantitative terms, such as by pupillometry or functional neuroimaging. Further understanding of this problem will also provide important insight into the precise way in which distractor effects operate at the single-trial level.” (Lines 275-282)

      Reviewer #3 Recommendations 3

      Line 80 "compare pairs of attributes separately, without integration". This additive rule (or the within-attribute comparison) implies integration, it is just not multiplicative integration.

      We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent.

      “For clarity, we stress that the same mathematical formula for additive value can be interpreted as meaning that 1) subjects first estimate the value of each option in an additive way (value integration) and then compare the options, or 2) subjects compare the two magnitudes and separately compare the two probabilities without integrating dimensions into overall values. On the other hand, the mathematical formula for multiplicative value is only compatible with the first interpretation. In this paper we focus on attribute combination styles (multiplicative vs additive) and do not make claims on the order of the operations. More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 92-100)

      Reviewer #3 Recommendations 4

      - Not clear why the header in line 122 is phrased as a question.

      We thank the reviewer for the suggestion. We have modified the header to the following:

      “The distractor effect was absent on average” (Line 129)

      Reviewer #3 Recommendations 5

      - The discussion and integration of key neural findings with the current thesis are outstanding. It might help the readers if certain statements such as "the distractor effect is mediated by the PPC" (line 229) were further unpacked.

      We thank the reviewer for the suggestion. We have made modifications to the original passage to further elaborate the statement.

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016).” (Lines 250-253)

      Reviewer #3 Recommendations 6

      - In Fig. 3c, there seem to be many participants having the integration coefficient close to 0 but the present violin plot doesn't seem to best reflect this highly skewed distribution. A histogram would be perhaps better here.

      We thank the reviewer for the suggestion. We have modified the descriptive plots to use histograms instead of violin plots.

      “Figures 3c, d and e show the fitted parameters of the composite model: , the integration coefficient determining the relative weighting of the additive and multiplicative value ( , ); , the magnitude/probability weighing ratio ( , ); and , the inverse temperature ( , ). Our finding that the average integration coefficient  was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule.” (Lines 186-191)

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      We thank the reviewer for his careful reading, which enabled us to improve the quality of this manuscript. We have addressed some major criticisms, and in particular, we have now included the characterization of the impact of BMP2 on other lines as well as the study of the impact of reversion of the H3.3K27M mutation (Figure 3 - figure supplement 1C-D). This control, judiciously proposed by the reviewer, seems more relevant than using mutant H3.1K27M / ACVR1 lines, given the possibility of BMP2 action via other receptors.


      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      Mutational analysis of diffuse midline glioma (DMG) found that ACVR1 mutations, which up-regulate the BMP signaling pathway are found in most H3.1K27M, but not H3.3K27M DMG cases. In this manuscript, Huchede et al attempted to determine whether the BMP signaling pathway has any role in H3.3K27M DMG tumors. They found that the BMP signaling is activated to a similar level in H3.3K27M DMG cells with wild-type ACVR1 compared to ACVR1 DMG cells, likely due to the expression of BMP7 or BMP2. They went on to test whether cells treated with BMP7 or BMP2 treatments affected the gene expression and cell fitness of tumor cells with H3.3K27M mutation. They concluded that BMP2/7 synergizes with H3.3K27M to induce a transcriptomic rewiring associated with a quiescent but invasive cell state. The major issue for this conclusion is that the authors did not use the right models/controls to obtain results to support this conclusion as detailed below. Therefore, in order to strengthen the conclusion, the authors need to address the major concerns below.

      Strength:

      This paper addresses an important question in the DMG field.

      Major concerns/weakness:

      (1) All the results in Fig. 2 utilized two glioma lines SF188 and Res259. The authors should repeat all these experiments in a couple of H3.3K27M DMG lines by deleting the H3.3K27M mutation first.

      We thank the referee for his/her comments that have helped us to strengthen our conclusions. Although we were rather interested in studying how the BMP pathway can participate in installing a particular cell state at the time of expression of the K27M mutation, we have now included the characterization of the native H3.3K27M BT245 and SU-DIPGXIII cell lines, and their counterparts in which the mutation was reverted by CRISPRCas9 (Harutyunyan et al., 2019). As shown in Figure 3-figure supplement D, the growth arrest induced by BMP2 seems indeed to be specific of the K27M epigenetic context, which could also be required to settle a positive regulation loop to activate the BMP pathway, as mentioned in the Discussion.

      (2) Fig. 3. The experiments of BMP2 treatment should be repeated in other H3.3K27M DMG lines using H3.1K27M ACVR1 mutant tumor lines as controls.

      The use of mutant ACVR1 lines is interesting, but their control status seems questionable, as the addition of BMPs could have a cumulative effect on the effect of the mutation, notably by activating other receptors in the pathway. But we have now included 3 different cell lines (HSJD-DIPG-014, BT245 and SU-DIPGXIII), and observed similar impact of BMP2 with growth arrest as a readout (Figure 3-figure supplement C-D)

      Minor concerns

      Fig.2A. BMP2 expression increased in H3.3K27M SF188 cells. Therefore, the statement "whereas BMP2 and BMP4 expressions are not significantly modified (Figure 2A and Figure 2-figure supplement A-B)" is not accurate.

      The referee is absolutely right, and we have corrected this statement.

      Reviewer #2 (Public Review):

      The manuscript by Huchede et al investigates the BMP pathway in H3K27M-mutant gliomas carrying or not activating mutations in ALK2 (ACVR1). Their results in cell lines and in datasets acquired from the literature on patient tumors indicate that the BMP signaling pathway is activated at similar levels between ACVR1 wild-type and mutant tumors. The group further identifies BMP2 and BMP7 as possibly the main activators of the pathway in cells. They then show that BMP2 and 7 crosstalk with the H3 mutation and synergize to induce transcriptomic rewiring leading to an invasive cell state.

      The paper is well-written and easy to follow with a robust experimental plan and datasets supporting the claims. While previous work (acknowledged by the authors) indicated activation of BMP in H3K27M tumors, wild type for the ACVR1 mutation this paper is a nice addition and provides further mechanistic cues as to the importance of the BMP pathway and specific members in these deadly brain cancers. The effect of these BMPs in quiescence and invasion is of particular interest.

      We thank the referee for his/her supportive comments.

      A few suggestions to clarify the message are provided below 1- In thalamic diffuse midline gliomas, the BMP pathway should not be activated as it is in the pons. The authors should identify thalamic tumors in the datasets they explored and patients-derived cell lines from thalamic tumors available to investigate whether this pathway is active across all H3.3K27M mutants in the brain midline or specifically in tumors from the pons.

      The inter-patient variability observed in the level of activation of the BMP pathway may indeed be due, at least in part, to different tumor locations. However, we failed to find this information in the publicly available datasets that we used. We however included this element in the Discussion part.

      (2) There are ~20% H3.3K27M tumors that carry an ACVR1 mutation and similar numbers of H3.1K27M that are wild type for this gene. Can the authors identify these outliers in their datasets and assess the activation of BMP2 and 7 or other BMP pathway members in this context?

      We have now included the outliers present in our datasets in the legends of Figure 1B and Figure 1-figure supplement B and F. From the few samples available to document these outliers in the cohorts that we used, we have not observed major differences regarding the expression levels of BMP2/7 or BMP pathway members and have discussed the fact that it may result from the establishment in all cases of a feedback loop of activation.

      In all this is an interesting paper that provides meaningful data to pursue clinical targeting of the BMP pathway, which would be a nice addition to the field.

      We thank the reviewer for his/her supportive comments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Vengayil et al. presented a role for Ubp3 for mediating inorganic phosphate (Pi) compartmentalization in cytosol and mitochondria, which regulates metabolic flux between cytosolic glycolysis and mitochondrial processes. Although the exact function of increased Pi in mitochondria is not investigated, findings have valuable implications for understanding the metabolic interplay between glycolysis and respiration under glucose-rich conditions. They showed that UBP3 KO cells regulated decreased glycolytic flux by reducing the key Pi-dependent-glycolytic enzyme abundances, consequently increasing Pi compartmentalization to mitochondria. Increased mitochondria Pi increases oxygen consumption and mitochondrial membrane potential, indicative of increased oxidative phosphorylation. In conclusion, the authors reported that the Pi utilization by cytosolic glycolytic enzymes is a key process for mitochondrial repression under glucose conditions.

      Comments on revised version:

      This reviewer appreciates the author's responses addressing some of the concerns.

      (1) However, the concern of reproducibility and experimental methods applied to the study is still valid, particularly considering that many conclusions were drawn from western blot analysis. The authors used separate gel loading controls for western blot analysis, which is not a valid method. Considering loading and other errors/discrepancies during the transfer phase of the assay, the direct control should be analyzing the membrane after transfer or using an internal control antibody on the same membrane. None of the western blots are indicated with marker sizes, and it isn't very clear how many repeats there are and whether those repeats are biological or technical repeats.

      We thank the reviewer for raising this concern. This point requires detailed clarification regarding two key points: the first one regarding the use of Coomassie stained gels over internal ‘housekeeping gene’ antibodies, and the second one regarding the challenges in performing controls for western blots In case of high abundance proteins such as glycolytic enzymes.

      (1) In our western blots, we have used Coomassie stained gel as a loading control for all our western blots. This is performed by cutting one half of the gel and using it for transfer followed by blotting and using the other half for Coomassie staining. I.e. This is not two separate gels that are loaded, but the same gel. Practically, this is no different from cutting a membrane to blot with different antibodies. This method is of course valid method for normalizing western blot data, and is used by multiple studies, for the reasons mentioned below. The historical use of a ‘house-keeping’ gene as a loading control for western blotting assumes that the protein levels of these does not change under different conditions. However, this approach has multiple, severe limitations (since a ‘housekeeping gene’ is entirely contextual, and indeed), and therefore it is correct to use total protein as a loading control. This is indeed recommended for use by multiple studies (Collins et al., 2015). Coomassie staining for total protein is far more reliable than using house-keeping genes as a loading control in western blots (Welinder and Ekblad, 2011). A notable example would be GAPDH itself, which is widely used as a loading control in many studies. As is clear from our data in this manuscript, GAPDH levels itself decrease in ubp3Δ cells. Had we used GAPDH as a loading control, we wouldn’t have identified the decrease in glycolytic enzymes in ubp3Δ cells, and this story would have met with a tragic fate very early on in its inception. We have in fact be very careful with these quantitations, and even before loading samples on gels, they are first normalized using a standard protein estimation assay (Bradford), followed by normalized loading, followed by cutting the gel into two parts - one for coomassie staining and protein normalization, and the other for the western blot for the respective proteins. However, in point (2) below, we clarify on why sometimes we have to load a separate gel with normalized protein, which should resolve this point.

      (2) Glycolytic enzymes are highly abundant proteins and to achieve a signal in the linear range of western blot, the protein extracts have to be diluted (up to 25 or 50 times). As discussed under point 1, an internal control ‘housekeeping gene’ antibody is not a reliable method to use as loading control. Even if we want to use an antibody for an internal protein as a control, there are not many proteins that are as abundant as metabolic enzymes and because of this simple reason, the sample dilution results in these proteins not getting detected in the western blot since the signal will be below the limit of detection. This leaves using a separate gel loading control as the only easy to perform, reliable option.

      We would like to further highlight the fact that the changes in metabolic enzymes and ETC proteins that we observe in the ubp3 mutant by western blot, were also independently observed by large scale untargeted quantitative proteomics study by  (Isasa et al., 2015), which we cite extensively in this manuscript. Since an entirelyindependent study, using a completely different (untargeted) method has also shown very similar  changes in proteins that we observe (mitochondrial, and glycolytic enzymes), there should be no room for doubt regarding the altered glycolytic enzyme and ETC protein  levels that we discover in this study.

      None of the western blots are indicated with marker sizes

      We have clearly indicated the marker sizes in all our western blots. Separately, raw images of the blots and Coomassie stained gels have been provided with the manuscript raw data, and is therefore easily available for any interested reader.

      It isn't very clear how many repeats there are and whether those repeats are biological or technical repeats.

      We have already clearly indicated the details of each blot in the figure legends. For example “A representative blot (out of three biological replicates, n=3) and their quantifications are shown. Data represent mean ± SD.” We kindly request the reviewer to thoroughly go through the figure legends for details regarding the western blots, or any other data. We hope this addresses all the reviewer concerns regarding the credibility of our western blot results and the method of using Coomassie stained gels as loading controls in this study.

      (2) Concern regarding citing the Ouyang et al. paper is still valid. This paper is an essential implication in phosphate metabolism and is directly related to some of the findings associated with mitochondrial function, along with conflicting results, which should be discussed in the discussion section. As a reviewer, I do not request citing any paper from the authors in general; however, considering some of the conflicting results here, citing and discussing paper from Ouyang et al. will improve the interoperation/value of their findings.

      As mentioned in detail in our previous response  letter, we do not believe that the study from Ouyang et al., present ‘conflicting results’ of any kind. Nevertheless, in response to the reviewer's suggestion, we have revised the discussion section of our manuscript and added a few points that  incorporate the insights from Ouyang et al. These are in the discussion section (“It is important to highlight that our experiments, whether involving Pi supplementation or Pi limitations, maintain the cellular Pi concentration within the millimolar range and are conducted within a short timeframe (~ 1 hour). This differs significantly from Pi starvation studies, where cells are subjected to prolonged and complete Pi deprivation, triggering extensive metabolic adjustments to sustain available Pi pools, such as an increase in mitochondrial membrane potential, independent of respiration”). We trust that this modification will enhance the interested readers' understanding of our study's overarching conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Cells cultured in high glucose tend to repress mitochondrial biogenesis and activity, a prevailing phenotype type called Crabree effect that observed in different cell types and cancer. Many signaling pathways have been put forward to explain this effect. Vengayil et al proposed a new mechanism involved in Ubp3/Ubp10 and phosphate that controls the glucose repression of mitochondria. The central hypothesis is that ∆ubp3 shift the glycolysis to trehalose synthesis, therefore lead to the increase of Pi availability in the cytosol, then mitochondrial received more Pi and therefore the glucose repression is reduced.

      Strengths:

      The strength is that the authors used an array of different assays to test their hypothesis. Most assays were well designed and controlled.

      Weaknesses:

      I think the main conclusions are not strongly supported by the current dataset. Here are my comments on authors' response and model.

      (1) The authors addressed some of my concerns related to ∆ubp3. But based on the results they observed and discussed, the ∆ubp3 redirect some glycolytic flux to gluconeogenesis while the 0.1% glucose in WT does not. Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in low glucose situation. This should be discussed in the manuscript to make sure readers are not misled to think ∆ubp3 mimic low glucose. It is likely that ∆ubp3 induce proteostasis stress, which is known to activate respiration and trehalose synthesis.

      But based on the results they observed and discussed, the ∆ubp3 redirect some glycolytic flux to gluconeogenesis while the 0.1% glucose in WT does not. Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in low glucose situation.

      We would like to clarify that we do not observe a redirection of glycolytic flux to gluconeogenesis in ubp3 mutant. What we observe is a rewiring of glycolytic flux into increased trehalose synthesis and PPP, and decreased glycolysis. Also, the shift of glycolysis to trehalose synthesis is relevant to WT cells cultured in low glucose. It is a well-known fact that the trehalose synthesis increases with decrease in media glucose. In case of 0.1% glucose, this increase in trehalose is not due to an increase in gluconeogenesis (since the pathways utilizing alternate carbon sources still remain repressed  in 0.1% glucose (Yin et al., 2003)), but by the increase in glycolytic flux towards trehalose. This is also supported by increase in Tps2 protein levels upon decreasing glucose concentration (Shen et al., 2023). We will also note that there are very few studies that actually estimate gluconeogenic flux in cess (and they only rely on steady state measurements). Estimating gluconeogenic flux appropriately is challenging in itself (eg. see Niphadkar et al 2024). 

      In case of glucose concentrations lower than 0.1%, the shift to trehalose synthesis might not be as relevant. We observe that the glycolysis defective mutant tdh2tdh3 cells does not show an increase in trehalose synthesis (Figure 3-figure supplement 1E). However, in this context, the decrease in the rate of GAPDH catalyzed reaction alone appears to be sufficient to increase the Pi levels (Figure 3F) even without an increase in trehalose. Therefore, there might be differences in the relative contributions of these two arms towards Pi balance, based on whether it is low glucose in the environment, or a mutant such as ubp3Δ that modulates glycolytic flux. In ubp3Δ cells, the combination of low rate of GAPDH catalyzed reaction and high trehalose will happen (based on how glycolytic flux is modulated), vs only the low rate of the GAPDH catalyzed reaction in tdh2tdh3 cells. As an end point the increase in Pi happens in both cases, but this happens via slightly differing outcomes. Also note: in terms of free Pi sources a low-glucose condition (with low glycolytic rate) is very different from a no-glucose, respiratory condition (where cells perform very high gluconeogenesis, at a rate that is an order of magnitude higher than in low glucose). In respiration-reliant conditions such as in ethanol, cells switch to high gluconeogenesis, where there is a large increase in trehalose synthesis as a default (eg see Varahan et al 2019). In this condition, trehalose synthesis could become a major source for Pi (eg see Gupta 2021). This could also support the increased mitochondrial respiration. In an ethanol-only medium, the directionality of the GAPDH reaction is itself reversed (i.e. G-1,3-BP → G-3-P). Therefore, this reaction now becomes an added source of Pi, instead of a net consumer of Pi (see illustration in Figure 3G). Therefore, a very reasonable inference is that a combination of increased trehalose and increased 1,3 BPG to G3P conversion can become a Pi source, supporting increased mitochondrial respiration in a non-glucose, respiratory medium.

      We have now clarified these points in the discussion section in the updated version of our manuscript. Lines xxx. We hope that this updated discussion section satisfies the reviewer’s concern regarding how relevant the increase in trehalose synthesis is for altered Pi balance and increased mitochondrial respiration in WT cells.

      It is likely that ∆ubp3 induce proteostasis stress, which is known to activate respiration and trehalose synthesis.

      Apart from some general changes in metabolism, there are no reports whatsoever that suggest that general proteostasis stress can results in an extensive, precise metabolic rewiring - where there is an increased in respiration, mitochondrial de-repression, precise decrease in two limiting glycolytic enzyme levels, and a precise reduction in glycolytic flux, as observed in the ubp3 mutant. If this was the case, deletion of any deubiquitinase should result in an increase in trehalose and respiration which clearly does not happen (as is already clear from the large screen shown in Figure 1)

      However, in response to this query, we performed experiments to assess the extent of proteostasis stress in ubp3 mutants. For this, we have now estimated the changes in global ubiquitination in WT vs ubp3 mutant, and compared this with conditions of moderate proteostasis stress (mild heat shock at 42C/~1hr). These data are now included in the revised manuscript as Figure 1- figure supplement 1J. Notably, our analysis reveals only very minor  alteration in global ubiquitination levels in ubp3 mutants compared to WT cells. This is in very stark contrast to  limited heat stress, where a clear increase in global ubiquitination can be easily observed. Given these data, we can conclude that there is no significant general proteostatic stress in ubp3 mutants, that could induce substantial metabolic rewiring of such precise nature.

      (2) Pi flux: it is known that vacuole can compensate the reduction of Pi in the cytosol. The paper they cited in the response, especially the Van Heerden et al., 2014 showed that the pulse addition of glucose caused transient Pi reduction and then it came back to normal level after 10min or so. If the authors mean the transient change of glycolysis and respiration, they should point that out clearly in the abstract and introduction. If the authors are trying to put out a general model, then the model must be reconsidered.

      In Van Heerden et al., the pulse addition of glucose causes transient Pi reduction due to rapid Pi consumption in glycolysis. The phosphate levels came back to normal level because of the glucose flux into trehalose synthesis releasing free Pi. This is the entire crux of the study and this is the reason why tps2 mutants which cannot synthesize trehalose exhibit a growth defect and have decreased Pi levels. As explained in detail in our early response, the cellular Pi levels are maintained by a relative balance of reactions that consume and release Pi and therefore a change in this balance can change Pi as well. Indeed, if this were not the case, the tps2 mutants would simply maintain the Pi levels similar to WT cells by increasing Pi transport from the medium, which is clearly not the case (eg see Gupta 2021).

      The cytosol has ~50mM Pi (van Eunen et al., 2010 FEBSJ), while only 1-2mM of glycolysis metabolites, not sure why partial reduction of several glycolysis enzymes will cause significant changes in cytosolic Pi level and make Pi the limiting factor for mitochondrial respiration. In response to this comment, the authors explained the metabolic flux that the rapid, continuous glycolysis will drain the Pi pool even each glycolytic metabolite is only 1-2mM. However, the metabolic flux both consume and release Pi, that's why there is such measurement of overall free Pi concentration amid the active metabolism. One possibility is that the observed cytosolic Pi level changes was caused by the measurement fluctuation.

      The measurement fluctuations that we mentioned in our previous response letter was in case of cells grown in high and low glucose, where there are multiple factors such as mitochondrial amount which complicates the Pi measurements. In case of ubp3 mutants which have a similar amount of total mitochondria as that of WT cells, there is minimal fluctuation for Pi measurement. We have done extensive standardization of mitochondrial isolation and Pi measurement in the isolated mitochondria (as explained in detail in the manuscript) to minimize any such fluctuations. 

      However, the metabolic flux both consume and release Pi, that's why there is such measurement of overall free Pi concentration amid the active metabolism

      The reviewer is correct in pointing out that metabolic flux consume and release Pi. However, in glucose grown yeast cells, the rate of glycolysis which is a Pi consuming reaction is higher than any other metabolic pathway. In fact, the glycolytic rate in glucose-grown S. cerevisiae is one of the highest ever observed in any living system. A decrease in glycolysis and an increase in trehalose therefore shifts the balance in Pi utilization and results in increased free Pi in ubp3 cells. For a more detailed theoretical reasoning on the consumption and production of Pi, see Gupta 2021.

      Importantly, the authors measured Pi inside mito for ethanol and glucose, but not the cytosolic Pi, which is the key hypothesis in their model. The model here is that the glycolysis competes with mito for free cytosolic Pi, so it needs to inhibit glycolysis to free up cytosolic Pi for mitochondrial import to increase respiration. I don't see measurement of cytosolic Pi upon different conditions, only the total Pi or mito Pi. The fact is that in Fig.3C they saw WT+Pi in the medium increase total free Pi more than the ∆ubc3, while WT decrease mito Pi compared to WT control and ∆ubc3 and therefore decrease basal OCR upon Pi supplement. A simple math of Pitotal = Pi cyto + Pi mito tells us that if WT has more Pitotal (Fig.3C) but less Pi mito (fig.5 supp 1C), then it has higher Pi cyto. This is contradictory to what the authors tried to rationalize. Furthermore, as I pointed out previously, the isolated mitochondria can import more Pi when supplemented, so if there is indeed higher Picyto, then the mito in WT should import more Pi. So, to address these contradictory points, the authors must measure Pi in the cytosol, which is a critical experiment not done for their model. For example, they hypothesized that adding 2-DG, or ∆ubp3, suppress glycolysis and thus increase the supply of cytosolic Pi for mito to import, but no cytosolic Pi was measured (need absolute value, not the relative fold changes). It is also important to specific how the experiments are done, was the measurement done shortly after adding 2-DG. Given that the cells response to glucose changes/pulses differently in transient vs stable state, the authors are encouraged to specify that.

      (1) Importantly, the authors measured Pi inside mito for ethanol and glucose, but not the cytosolic Pi, which is the key hypothesis in their model. The model here is that the glycolysis competes with mito for free cytosolic Pi, so it needs to inhibit glycolysis to free up cytosolic Pi for mitochondrial import to increase respiration. I don't see measurement of cytosolic Pi upon different conditions, only the total Pi or mito Pi.

      As clearly described in the manuscript, the key hypothesis that emerges is the role of the availability/accessibility of Pi for the mitochondria, in the context of activity. As discussed in detail in the discussion section, this can come from a combination of available Pi pools in the cytosol and increased transport of this Pi to the mitochondria. While it is true that the decreased glycolysis in ubp3 mutants frees up available Pi pools in the cytosol, measurement of cytosolic Pi in these mutants growing in log phase might not necessarily show an increased cytosolic Pi, if the Pi is being actively transported the the mitochondria at a rate higher that the WT, as indicated by the ~6 fold increase in mitochondrial Pi in ubp3 cells. This would require tools such as intracellular fluorescence based-Pi sensors that could accurately capture temporal changes in cytosolic and mitochondrial Pi following glycolytic inhibition. However, these tools are not available till date for use in yeast and measuring cytosolic Pi following glycolytic inhibition over time using colorimetric Pi assays are extremely difficult.  

      However, the reviewer does correctly state that we had not included measurement of cytosolic Pi. Since the mitochondrial Pi estimate was itself a very challenging (and critical) experiment we had originally thought that data was sufficient. We have therefore now performed a series of new experiments, where we first enrich the cytosolic fraction (without mitochondrial contamination), and estimated cytosolic Pi amounts in WT and ubp3 cells. Our Pi measurements indicate a cytosolic Pi concentration in the range of ~35 mM, which is similar to the earlier reported values in yeast. We further observe that the cytosolic Pi is about ~25% lower in ubp3 mutants (~25-27 mM) compared to WT cells (Figure 4B). As mentioned earlier, this would be consistent with higher transport of Pi from the cytosol to the mitochondria in these cells. Effectively, ubp3 cells have a total increase in cellular Pi, and with a Pi pool distribution such that there is increased Pi availability in mitochondria (Figure 4B). This further substantiates this hypothesis of an increased Pi allocation to mitochondria in ubp3 mutants. The reason for increased rate of Pi transport to mitochondria is not immediately clear, but could also come from changes in cytosolic pH - a possibility that we suggest in our discussion, and is discussed in a later section of this response letter as well.   

      (2) The fact is that in Fig.3C they saw WT+Pi in the medium increase total free Pi more than the ∆ubc3, while WT decrease mito Pi compared to WT control and ∆ubc3 and therefore decrease basal OCR upon Pi supplement. A simple math of Pitotal = Pi cyto + Pi mito tells us that if WT has more Pitotal (Fig.3C) but less Pi mito (fig.5 supp 1C), then it has higher Pi cyto. This is contradictory to what the authors tried to rationalize. Furthermore, as I pointed out previously, the isolated mitochondria can import more Pi when supplemented, so if there is indeed higher Picyto, then the mito in WT should import more Pi.

      a) “The fact is that in Fig.3C they saw WT+Pi in the medium increase total free Pi more than the ∆ubc3, while WT decrease mito Pi compared to WT control and ∆ubc3 and therefore decrease basal OCR upon Pi supplement. A simple math of Pitotal = Pi cyto + Pi mito tells us that if WT has more Pitotal (Fig.3C) but less Pi mito (fig.5 supp 1C), then it has higher Pi cyto.”

      In WT cells supplemented with external Pi (WT+Pi), there is an increased total Pi, but a decreased mitochondrial Pi. As discussed in the discussion section in the manuscript, this could be due to the supplemented Pi not being transported to mitochondria. The reviewer is correct in pointing out that as per simple math this should mean that the cytosolic Pi in WT+Pi should be high. We have now assessed cytosolic Pi upon external Pi supplementation, and this is exactly what we observe in our cytosolic Pi measurements now included in the revised manuscript (Figure 5-figure supplement 5C). There is a higher cytosolic Pi in WT+Pi (~52 mM) compared to WT cells (~35 mM) and ubp3 cells (~27 mM). We have now pointed this out in the discussion section in the revised manuscript “Notably, this increased respiration does not happen upon direct Pi supplementation to highly glycolytic WT cells, where the Pi accumulates in cytosol, without increasing mitochondrial Pi (Figure 5-figure supplement 1C).” We hope that these new data completely addresses the reviewer’s concern regarding the Pi allocations in case of WT+Pi cells.

      b) This is contradictory to what the authors tried to rationalize. Furthermore, as I pointed out previously, the isolated mitochondria can import more Pi when supplemented, so if there is indeed higher Picyto, then the mito in WT should import more Pi.

      We would like to clarify that the Pi measurements in WT+Pi absolutely do not contradict our hypothesis. Furthermore, nowhere do we claim that an increase in cytosolic Pi will increase mitochondrial Pi!! On the contrary, we explain in detail that supplementing Pi to WT cells (which increases cytosolic Pi) will not increase respiration if the increased Pi is not being transported to mitochondria. This is exactly what happens in WT+Pi, where Pi accumulates in the cytosol but does not result in increased mitochondrial Pi. The reviewer argues that if there is higher cyto Pi, mitochondria should import more Pi. This is true in case of transport via diffusion where the external concentration dictates the direction of metabolite transport, but is fundamentally wrong in case of transport of metabolites where active transporters and additional regulators are involved. This is the entire basis of the idea of metabolic compartmentalisation where  cells maintain pools of metabolites in different organelles which regulate the cellular metabolic state. A well-studied example is pyruvate, whose cytosolic concentration is high in glycolytic cells, but it's transport to mitochondria is reduced in glycolysis to maintain cytosolic fermentation. As discussed in the manuscript, a logical explanation for Pi supplementation not increasing respiration and mitochondria Pi is that there might be mechanisms in highly glycolytic cells that restrict the transport of Pi to mitochondria, thereby compartmentalizing Pi in the cytosol. One such possible mechanism is pH (discussed in a later section) and it is possible that there are other mechanisms involved. 

      In case of isolated mitochondria, Pi supplementation results in an increased respiration simply because it is an in vitro set up where we supplement metabolites such as pyruvate, malate and ADP along with phosphate to ensure that mitochondria is actively respiring and in this case Pi will be consumed since it is being used for ATP synthesis. This is entirely different from an in vivo scenario where cells are glycolytic, and mechanisms to prevent mitochondrial transport of metabolites such as pyruvate and phosphate are active. 

      c) It is also important to specific how the experiments are done, was the measurement done shortly after adding 2-DG?

      Cells were treated with 2-DG for one hour and respiration was measured. We have mentioned these details clearly in the figure legends and methods.  

      d) The most likely model to me is that, which is also the consensus in the field, is that no matter 2-DG or ∆ubp3, the cells re-wiring metabolism in both cytosol and mitochondria, and it is the total network shift that cause the mitochondrial respiration increase, which requires the increase of mito import of Pi, ADP, O2, and substrates, but not caused/controlled by the Pi that singled out by the authors in their model.

      The aim of our study is only to highlight the importance of mitochondrial Pi availability as a critical factor in controlling mitochondrial respiration. Of course this would require sufficient other factors such as ADP, substrates and oxygen. It cannot be otherwise. However, as we point out in the discussion, a major limiting factor might be Pi availability. While the altered glycolysis in ubp3 mutants might control availability of other factors such as pyruvate and ADP, this is not the focus of our study. We would also like to point out that prior studies show that even though cytosolic ADP decreases in the presence of glucose, this does  not limit mitochondrial ADP uptake, or decrease respiration, due to the very high affinity of the mitochondrial ADP transporter. This is discussed in our discussion section as well. Further we show that the levels of ETC proteins can be altered by changing Pi levels, which places Pi as a major regulator of respiration. We would like to point out once again that studies in other systems have also highlighted a major role of mitochondrial Pi availability in controlling respiration. These references are included in our manuscript (Scheibye-Knudsen et al., 2009, Seifer et al., 2015). This includes a recent study in T cells that clearly shows increased mitochondrial respiration upon overexpressing mitochondrial Pi transporter SLC25A3 alone (Wu et al., 2023). Our manuscript now in fact provides a contextual explanation of these diverse observations from other cellular systems where mitochondrial Pi transport appears to regulate respiration.

      (3) The explanation that cytosolic pH reduction upon glucose depletion/2DG is a mistake. There are a lot of data in the literature showing the opposite. If the authors do think this is true, then need to show the data. Again, it is important to distinguish transient vs stable state for pH changes.

      We observe that directly supplementing Pi to WT cells growing in high glucose does not result in higher mitochondrial Pi or increased respiration. However, supplementing Pi to WT cells increases mitochondrial respiration in the presence of glycolytic inhibitor 2-DG. We therefore merely suggest that cytosolic pH could be an additional regulator of mitochondrial Pi transport, since this will be consistent with the differences in mitochondrial Pi transport in highly glycolytic cells, and cells with decreased glycolysis ( such as 2-DG addition and ubp3 mutant). This is because in mitochondria, Pi is co-transported along with protons. Therefore, changes in cytosolic pH (which changes the proton gradient) will control the mitochondrial Pi transport (Hamel et al., 2004).  The glycolytic rate is itself a major factor that controls cytosolic pH. The cytosolic pH in highly glycolytic cells is maintained ~7, and decreasing glycolysis results in cytosolic acidification (Orij et al., 2011). Therefore, under conditions of decreased glycolysis (such as loss of Ubp3), cytosolic pH becomes acidic. Since mitochondrial Pi transport depends on the proton gradient, a low cytosolic pH would favour mitochondrial Pi transport. Therefore, under conditions of decreased glycolysis (2DG treatment, or loss of Ubp3), where cytosolic pH would be acidic, increasing cytosolic Pi might indirectly increase mitochondria Pi transport, thereby leading to increased respiration. But we certainly do leave alternate interpretations to the imagination of any reader, and are indeed open to them. These are all exciting future directions this study will enable a contextual interpretation of.

      The explanation that cytosolic pH reduction upon glucose depletion/2DG is a mistake.

      We have cited two independent studies which suggest that cytosolic pH decreases upon a decrease in glycolysis (Orij et al.,2011 ,Dechant et al., 2010). This control of cytosolic pH by the glycolytic rate has been extensively shown using glycolytic mutants, cells in low glucose and cells grown in the presence of glycolytic inhibitors. According to the reviewer, this is a mistake and

      there are a lot of data in the literature showing the opposite.

      In our literature review we did not come across any relevant studies that actually show the opposite. If the  reviewer still thinks this is a mistake, the reviewer is welcome to include some of the relevant literature that clearly shows the opposite in the comments, with actual measurements of cytosolic pH. Additionally,  the possible role of cytosolic pH in this context does not affect the conclusions of our study, and we only include this as a possibility in the discussion. Therefore, this is obviously well beyond the scope of experiments in our current study, and considering the extensive data from multiple studies that shows that cytosolic pH decreases under low glycolysis, there is no relevance  to including experiments to address the same in this study. We leave this as a point for an interested reader to think about, and it certainly can nucleate new directions of future study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of the changes

      Changes in the manuscript were made to clarify some ambiguities raised by the reviewers and to improve the report following their recommendations. A summary of the main changes is listed below:

      - The title was changed to better reflect the results of this study - Re-training the model on log transformed FACS scores.

      - Testing the specificity of the FEPS to facial expression of pain within this experimental setup by comparing it to the activation maps obtained from the Warm stimulation condition.

      - Testing for sensitization/habituation of the behavioral measures (FACS scores and pain ratings).

      - Adding a section in the discussion to better address the limitations of this study and provide potential directions for future studies.

      Other changes target areas where the original manuscript may have been ambiguous or lacked precision. To address these concerns, additional details have been incorporated, and certain terms have been revised to ensure a more precise and transparent presentation of the information.

      Public Reviews:

      Reviewer #1 (Public Review):

      Picard et al. report a novel neural signature of facial expressions of pain. In other words, they provide evidence that a specific set of brain activations, as measured by means of functional magnetic resonance imaging (fMRI), can tell us when someone is expressing pain via a concerted activation of distinctive facial muscles. They demonstrate that this signature provides a better characterization of this pain behaviour when compared with other signatures of pain reported by past research. The Facial Expression of Pain Signature (FEPS) thus enriches this collection and, if further validated, may allow scientists to identify the neural structures subserving important non-verbal pain behaviour. I have, however, some reservations about the strength of the evidence, relating to insufficient characterization of the underlying processes involved.

      We are thankful for the summary of our work. We are hopeful that the modifications made in the latest version effectively address these concerns. The changes are outlined in the summary above, and detailed in the following point-by-point response.

      Strengths:

      The study relies on a robust machine-learning approach, able to capitalise on the multivariate nature of the fMRI data, an approach pioneered in the field of pain by one of the authors (Dr. Tor Wager). This paper extends Wager's and other colleagues' work attempting to identify specific combinations of brain structures subserving different aspects of the pain experience while examining the extent of similarity/dissimilarity with the other signatures. In doing so, the study provides further methodological insight into fine-grained network characterization that may inspire future work beyond this specific field.

      We are thankful for the positive comments.

      Weaknesses:

      The main weakness concerns the lack of a targeted experimental design aimed to dissect the shared variance explained by activations both specific to facial expressions and to pain reports. In particular, I believe that two elements would have significantly increased the robustness of the findings:

      (1) Control conditions for both the facial expressions and the sensory input. An efficient signature should not be predictive of neutral and emotional facial expressions (e.g., disgust) other than pain expressions, as well as it should not be predictive of sensations originating from innocuous warm stimulation or other unpleasant but non-painful stimulation.

      We do recognize the lack of specificity testing for the FEPS, especially towards negative emotional facial expressions. This would be relevant to test given the behavioural overlap between the facial expressions of pain and disgust, fear, anger, and sadness (Kunz et al., 2013; Williams, 2003). The experimental design used in this study did not include other negative states. However, we fully support the necessity of collecting data throughout those conditions, and we believe that the present study highlights the importance of such a demonstration. Future research should involve recording facial expressions while exposing participants to stimuli that elicit a range of negative emotions but, to our knowledge, such combination of fMRI and behavioural data is currently unavailable. As raised by the reviewer, this approach would allow us to assess the specificity of the FEPS to the facial expression evoked by pain compared to different affective states. We would like to emphasise that specificity and generalizability testing is a massive amount of work, requiring multiple studies to address comprehensively. A Limitations paragraph addressing this research direction has been added to the Discussion. A conclusion was added to the abstract as follows: “Future studies should explore other pain-relevant manifestations and assess the specificity of the FEPS against other types of aversive or emotional states.”

      (2) Graded intensity of the sensory stimulation: different intensities of the thermal stimulation would have caused a graded facial expression (from neutral to pain) and graded verbal reports (from no pain to strong pain), thus offering a sensitive characterisation of the signal associated with this condition (and the warm control condition).

      However, these conditions are missing from the current design, and therefore we cannot make a strong conclusion about the generalisability of the signature (regardless of whether it can predict better than other signatures - which may/may not suffer from similar or other methodological issues - another potential interesting scientific question!). The authors seem to work on the assumption that the trials where warm stimulation was delivered are of no use. I beg to disagree. As per my previous comment, warm trials (and associated neutral expressions) could be incorporated into the statistical model to increase the classification sensitivity and precision of the FEPS decoding.

      The experience of pain can fluctuate for a fixed intensity or after controlling statistically for the intensity of the stimulation (Woo et al., 2017). Consistent with this, the current study focused on spontaneous facial expression in response to noxious thermal stimuli delivered at a constant intensity that produced moderate to strong pain in every participant. As the reviewer points out, this does not allow us to characterise and compare the stimulus-response function of facial expression and pain ratings. The advantage of the approach adopted is to maximise the number of trials where facial expression is more likely to occur, while ensuring that changes in facial expression and pain ratings are not confounded with changes in stimulus intensity. The manuscript has been revised to clarify that point. However, we do agree that it would be interesting to conduct more studies focusing on facial expression in response to a range of stimulus intensities. This discussion has been added to the Limitations paragraph.

      Furthermore, following the reviewer’s suggestion, we performed complementary analyses on the warm trials in the proposed revisions. The dot product (FEPS scores) between the FEPS and the activation maps associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). The trials in the pain condition were divided into two conditions: null FACS scores (painful trials with no facial response; FACS scores = 0) and non-null FACS scores (painful trials with a facial response; FACS > 0). The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm section in the Methods; lines 427 to 439) as well as the corresponding results (see Results and Discussion; lines 138 to 158). The FEPS scores were larger in the pain condition where a facial response was expressed, compared to both the pain condition without facial expression and the warm condition. These results confirmed the sensitivity of the FEPS to facial expression of pain.

      Reviewer #2 (Public Review):

      Summary:

      The objective of this study was to further our understanding of the brain mechanisms associated with facial expressions of pain. To achieve this, participants' facial expressions and brain activity were recorded while they received noxious heat stimulation. The authors then used a decoding approach to predict facial expressions from functional magnetic resonance imaging (fMRI) data. They found a distinctive brain signature for pain facial expressions. This signature had minimal overlap with brain signatures reflecting other components of pain phenomenology, such as signatures reflecting subjective pain intensity or negative effects.

      We appreciate this concise and accurate summary of our study.

      Strength:

      The manuscript is clearly written. The authors used a rigorous approach involving multivariate brain decoding to predict the occurrence and intensity of pain facial expressions during noxious heat stimulation. The analyses seem solid and well-conducted. I think that this is an important study of fundamental and clinical relevance.

      Weaknesses:

      Despite those major strengths, I felt that the authors did not suffciently explain their own interpretation of the significance of the findings. What does it mean, according to them, that the brain signature associated with facial expressions of pain shows a minimal overlap with other pain-related brain signatures?

      We express our sincere gratitude for the valuable insights and constructive comments on the strengths and weaknesses of the current study. We thank reviewer 2 for the encouragement to reinforce our interpretation of the significance of the findings, while acknowledging the limitations raised by the three reviewers.

      A few questions also arose during my reading.

      Question 1: Is the FEPS really specific to pain expressions? Is it possible that the signature includes a facial expression signal that would be shared with facial expressions of other emotions, especially since it involves socio-affective regulation processes? Perhaps this question should be discussed as a limit of the study?

      We acknowledge this limitation as outlined in response to Reviewer #1. We have incorporated a Limitations paragraph to provide a more in-depth discussion of this limitation and to explore potential future avenues (lines 225 to 268). Again, please note that the demonstration of specificity is an incremental process that requires a systematic comparison with other conditions where facial expressions are produced without pain. A concluding sentence was added to the abstract to encourage specificity testing in future studies. as indicated above.

      Question 2: All AUs are combined together in a composite score for the regression. Given that the authors have other work showing that different AUs may be associated with different components of pain (affective vs. sensory), is it possible that combining all AUs together has decreased the correlation with other pain signatures? Or that the FEPS actually reflects multiple independent signatures?

      The question raised is consistent with the work of Kunz, Lautenbacher, LeBlanc and Rainville (2012), and Kunz, Chen and Rainville (2020). In the current study, the pain-relevant action units were combined in order to increase the number of trials where a facial response to pain was expressed, thus enhancing the robustness of our analyses. Given the limited sample size, our current dataset is unfortunately insufficient to perform such analysis as there would not be enough trials to look at the action units separately or in subgroups. While the approach of combining the different AUs has proven to be valid and useful, we recognize the value of investigating potential independent signatures associated with the different AUs within the FEPS, and examining whether those signatures can lead to more similar patterns compared to previously developed pain signatures. This discussion has been included in the Limitations paragraph in the Discussion (lines 225 to 268).

      Question 3: Is facial expressivity constant throughout the experiment? Is it possible that the expressivity changes between the beginning and the end of the experiment? For instance, if there is a habituation, or if the participant is less surprised by the pain, or in contrast if they get tired by the end of the experiment and do not inhibit their expression as much as they did at the beginning. If facial expressivity changes, this could perhaps affect the correlation with the pain ratings and/or with the brain signatures; perhaps time (trial number) could be added as one of the variables in the model to address this question.

      The concern raised by the reviewer is legitimate. We conducted a mixed-effects model to assess the impact of successive trials and runs on facial expressivity. Results indicate that the FACS scores did not change significantly throughout the experiment, suggesting no notable effect of habituation or sensitization on the facial expressivity in our study. Details about the analysis and the results have been added to the Facial Expression section in the Methods (lines 335 to 346).

      Reviewer #3 (Public Review):

      In this manuscript, Picard et al. propose a Facial Expression Pain Signature (FEPS) as a distinctive marker of pain processing in the brain. Specifically, they attempt to use functional magnetic resonance imaging (fMRI) data to predict facial expressions associated with painful heat stimulation. The main strengths of the manuscript are that it is built on an extensive foundation of work from the research group, and that experience can be observed in the analysis of fMRI data and the development of the machine learning model. Additionally, it provides a comparative account of the similarities of the FEPS with other proposed pain signatures. The main weaknesses of the manuscript are the absence of a proper control condition to assess the specificity of the facial pain expressions, a few relevant omissions in the methodology regarding the original analysis of the data and its purpose, and a biased interpretation of the results.

      I believe that the authors partially succeed in their aims, as described in the introduction, which are to assess the association between pain facial expression and existing pain-relevant brain signatures, and to develop a predictive brain activation model of the facial responses to painful thermal stimulation. However, I believe that there is a clear difference between those aims and the claim of the title, and that the interpretation of the results needs to be more rigorous.

      We wish to express our appreciation for the insightful and constructive critique provided. The limitation pertaining to the absence of specificity testing had been addressed in response to Reviewer #1, and it has been incorporated into the manuscript (lines 251 to 258).

      The commentary made by Reviewer #3 has drawn our attention to a critical concern, namely the potential misalignment between the study findings and our original title. Consequently, we have changed the title to “A distributed brain response predicting the facial expression of acute nociceptive pain”. We also revised the interpretation of the results in the discussion section and we have added a section on limitations.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      I hope the following comments will be useful to improve the manuscript.

      Abstract

      I felt the abstract could be more clear in terms of experimental or scientific questions, hypotheses/expectations, and findings. I also feel the abstract should briefly support the conclusive claim ("is better than...": how better? Or according to what criterion? This may be more relevant than the final conclusive general sentence that does not specifically address the significance of the findings).

      The abstract was revised to reinforce the functional perspective adopted to interpret brain activity produced by noxious stimuli and predicting various pain-relevant manifestations. We also mention explicitly the other pain-relevant signatures against which the FEPS is compared in this report, and we added a concluding sentence highlighting the importance of assessing the specificity of the FEPS in future studies.

      Introduction - background and rationale

      I would postpone the discussion around pain signature and anticipate the one about the brain mechanisms of facial expressions of pain. This will allow you to reinforce the logical flow of rationale, literature gap/question, why the problem is important, and study aims. Only then go for a review of relevant literature on signatures before providing a more specific final paragraph about the study-specific questions, expectations, and implementation. At the moment this is limited to a single very descriptive short paragraph at the end of the intro.

      The introduction was structured to guide the readers through a comprehensive understanding of different pain neurosignatures. The introduction aimed to establish a robust rationale for the subsequent analyses detailed in the results section. Indeed, the presentation of that literature ensured that the discussion around pain signatures is contextualised within a broader continuous framework. We acknowledge the reviewer’s comment on the limited description of the brain mechanisms of facial expression of pain. However, this was addressed in several previous reports of our laboratory (Kunz et al. 2011; Vachon-Presseau et al. 2016; Kunz, Chen, and Rainville 2020). We have added some more details about the brain mechanisms of facial expression, and highlighted those references in the first paragraph of the introduction.

      Methods and Results

      (1) Was there any indication of power based on the previous work or the other signature papers? If yes, how that would inform the present analysis?

      The NPS was trained on 20 participants that experienced 12 trials at each of four different intensities. The assessment of the effect sizes was performed on the Neurological Pain Signature in Han et al. (2022). That study revealed a moderate effect size for predicting between-subject pain reports, and a large one for predicting within-subject pain reports. We trained our model on 34 participants that underwent 16 trials. We expected our results to show a smaller effect size as the current experimental design only allowed us to examine spontaneous changes in the facial expression, as noted in the comments made by Reviewer #1. However, the best way to calculate the unbiased effect size of the results presented in the current study would be to test the unchanged model on new independent datasets (see Reddan, Lindquist, and Wager, 2017). Unfortunately, such datasets do not currently exist.

      (2) I would clarify to the reader what is meant by normal range of thermal pain and why is this relevant. Also, I did not find data about this assessment nor about the assessment of facial expressiveness (or reference to where it can be found).

      We changed this formulation to “All participants included in this study had normal thermal pain sensitivity” and we added a few references. By targeting a healthy population with normal thermal pain sensitivity, our study sought to identify a predictive brain pattern related to facial expression evoked by typical responses to pain that could eventually be generalised to other individuals from the same population. Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods.

      (3) That pain ratings are only weakly associated with facial responses is, in its own right, an interesting finding, as a naïve reader would expect the two to be highly positively correlated. I'd suggest discussing this aspect (in reference to previous research) as it is interesting on both theoretical and empirical grounds.

      The likelihood and the strength of pain facial expression generally increase with pain ratings in response to acute noxious stimuli of increasing physical intensities, thereby leading to a positive association between the two responses that is driven by the stimulus. However, the poor correlation or the dissociation between facial pain expression and pain rating is a very well known phenomenon that can be demonstrated easily using experimental methods where the stimulus intensity is held constant and spontaneous fluctuations are observed in both facial expression and pain ratings. This result was not discussed in the current manuscript as it was already addressed in the work of Kunz et al. (2011) and Kunz, Karos and Vervoot (2018). We added the references to these studies in the revised manuscript (lines 330 to 334).

      (4) It may be worth having CIs throughout the whole set of analyses.

      Thanks for the suggestions, this was an oversight. The confidence intervals have been added in the manuscript where applicable.

      (5) I would clarify if there are two measures of the brain signature: dot-product and activation map. Relatedly, I cannot find where the authors explained what "FEPS pattern expression scores". Can the authors please clarify?

      The clarification has been added in the manuscript (lines 413 to 414).

      (6) There seems to be the assumption that the relationship between pain-relevant brain signatures and facial expressions of pain would be parametric and linear. However, this might not hold true. Did the authors test these assumptions?

      We indeed decided to use a linear regression technique (i.e. LASSO regression) to model the association between the brain activity and the facial expression of pain. The algorithm choice was mainly based on the simplicity and the interpretability of that approach, and our limited number of observations. The choice was also coherent with previous studies in the domain (e.g. Wager et al., 2011; Wager et al., 2013; Krishnan et al. 2016; Woo et al., 2017). Using a linear model, we were able to predict above chance level the facial expression evoked by pain using the fMRI activation. However, it is legitimate to think that more complex non linear models can better capture the brain patterns predictive of that behavioural manifestation of pain.

      (7) Did the authors assess whether the FACS were better to be transformed/normalised? More generally, I would report any data assessment/transformation that has not been reported.

      Thank you for this highly relevant suggestion. FACS scores were indeed not normally distributed and the analyses were conducted again to predict the log transformed FACS scores. This transformation was effective to normalize the distribution (skewness = 0.75, kurtosis = -0.84). The predictive model was confirmed on transformed data.

      (8) Page 12: I am not clear on whether all the signatures are included in the same model (like a multiple regression) or if separate regressions are calculated per signature. The authors seem to imply that several regressions have been computed (possibly one per comparison with each signature?).

      The correlation between the FACS scores and the pain-related signatures was computed separately for each signature. This information has been clarified.

      (9) MVPA: See my main comment about warm trials and experimental/statistical design. For example, the LASSO regression model for the pain trials could be compared with a model using warm trials besides (or instead of) the unfitted model. Otherwise, add the warm trials as another predictor or within the subject level in a dummy fixed factor comprising pain and warm trials.

      The inclusion of warm trials in the model training would be inconsistent with the goal of the main analysis to predict the facial expression of pain when a noxious pain stimulus is presented. Secondary analyses were conducted to compare the response of the FEPS to the warm trials compared to noxious pain trials. The dot product between the FEPS and the activation maps (FEPS scores) associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). Additional contrasts compared the warm trials with the pain trials with and without pain facial expression. The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm in the Methods) as well as the corresponding results (see Results and Discussion).

      (10) I would clarify for the reader why the separate M1 analysis has been run. Although obvious, I feel the reader would benefit from the specific hypothesis about this control analysis being spelled out together with the other statistical hypotheses within the statistical design in a more streamlined manner.

      We extended the discussion on the rationale of that analysis and its interpretation taking into account the most recent results using the log transformed FACS scores (lines 125 to 133).

      (11) The mixed model aimed to assess the relationship between pain ratings FEPS scores and facial scores is a crucial finding. I believe it speaks to the importance of a more complete design, which I already highlighted. I have a couple of technical questions: did the authors assess random slopes too? And, what was the strategy used to determine the random effects structure?

      The linear mixed model considered the participants as a random effect, with random intercepts, considering the grouping structure in our data (i.e., each participant completed multiple trials). The reported results in the original manuscript were considering fixed slopes. However, following the reviewer’s comment, we re-computed the mixed linear models allowing the slopes to vary according to the intensity ratings. The results were changed in the manuscript to represent the output of those models.

      (12) The text from lines 63 to 67 could go in the methods.

      We decided to include those lines within the Result and Discussion section to give the reader more specification about the FACS scores, as this term is subsequently referenced in the following part of the Results and Discussion section. We are concerned that putting this information only in the Methods section would disrupt the reading.

      Reviewer #2 (Recommendations For The Authors):

      p. 4-5. When you report the positive weight clusters, you follow up with a sentence specifying which cognitive processes those brain regions are typically associated with. However, when you report the negative weight clusters, you do not specify the cognitive processes typically associated with those brain areas. I think that providing that information would be helpful to the readers.

      Thanks for noticing this omission. The information has been added in the most recent version of the manuscript (lines 119 to 121).

      p. 9. You specify that the degree of expressiveness of participants was evaluated. How did you evaluate expressiveness? Did you use this variable in your analyses? Were participants excluded based on their degree of expressiveness?

      Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods (lines 285 to 289).

      p. 10. You explain that two certified FACS-coders evaluated the video recordings to rate the frequency of AUs. Could you please provide more details about the frequency measure? I think that there are different ways in which this could have been done. For instance, were the videos decomposed into frames, and then the frequency measured by summing the number of frames in which the AU occurred? Or was it "expression-based", so one occurrence of an AU (frequency of 1) would correspond to the whole period between its activation onset and offset? Both ways have pros and cons. For example, if the frequency represents the number of frames, then it controls for the total duration of the AU activation within a trial (pro); but if there were multiple activations/deactivations of the AU within one trial, this will not be controlled for (con). And vice-versa with the second way of calculating frequency.

      Details about the frequency scores have been added to the manuscript (lines 315 to 319).

      p. 11. When you explained how you calculated the association between the facial expression of pain and pain-related brain signatures, I felt that there was some information missing. Did you use the thresholded maps (available in the published articles), or did you somehow have access to the complete, voxel-by-voxel, raw regression coefficient maps?

      The unthresholded maps were used. The information has been clarified in the latest version of the manuscript, as well as the details about the availability of the maps (see Data Availability section at the end of the manuscript).

      Reviewer #3 (Recommendations For The Authors):

      Format

      The authors will notice that many observations about the manuscript are related to missing information and a lack of graphical representations. I believe the topic and the content of the manuscript are too complex to condense into a short report.

      Title

      The claim of the title is simply not substantiated by the content of the manuscript. Demonstrating that the FEPS is a distinctive (i.e., specific) marker of pain processing requires a substantially different experimental design, with more rigorous controls and a broader set of painful stimulations. The manuscript would benefit from a more accurate title.

      We agree that the title could better align with our findings. We modified the title accordingly : “A distributed brain response predicting the facial expression of acute nociceptive pain”.

      Abstract

      I find it puzzling that the authors claim that there is limited knowledge of the neural correlates of facial expression of pain given what they describe in the first paragraph of the introduction. Besides, they propose to reanalyze a dataset that has been extensively described in Kunz et al. (2011), which is unlikely to provide any new significant information.

      We respectfully disagree with that comment. We considered that three articles (i.e., Kunz et al., 2011; Vachon-presseau et al., 2016; Kunz, Chen and Rainville, 2020) on the topic do constitute limited knowledge, especially if we compare it to the very large body of literature on the neural correlates associated with pain ratings. Except for these three studies, all the other citations pertain to behavioral studies on facial expression of pain, and do not examine the brain activity related to it. Furthermore, we believe that the complementary nature of the analyses performed in Kunz et al. (2011) and in this manuscript offers new insights into our understanding of facial expression in the context of pain. Indeed, the multivariate approach used in this study addresses some limitations present in Kunz et al. (2011) univariate analyses, mainly that it provides a quantifiable way to compare the similarity between different predictive patterns (Reddan and Wager, 2017). We submit that the assessment of the FEPS against several other pain-relevant signatures provides new and important information.

      Furthermore, the abstract does not clearly state the aim, and the first line of the results does not match what the authors claim in the preceding line. The take-home message (last sentence) introduces the concept of a biomarker, which, as stated before, cannot be validated with the current data/experimental design. To put it in plain words, a given facial expression (or a composite score derived from a combination of expressions) cannot be a specific biomarker for pain, because a person can always mimic the same expression without feeling pain. Whether a given facial expression can be predicted from brain activity is a different issue, and whether that prediction can differentiate between painful and non-painful origins of the facial expression is another different issue. Unfortunately, neither of those issues can be tested with the current data/experimental design. The abstract would improve if the authors would circumscribe to what they actually tested, which is accurately described in the last sentence of the Introduction.

      The abstract was revised accordingly. The term ‘biomarker’ was used in accordance with preceding studies in the field (see Reddan and Wager, 2017; Lee et al., 2021). Please note that we applied the same reasoning to fluctuations in pain expression as previous studies have applied to pain ratings. Of course, we can not dismiss the possibility of someone mimicking facial expressions. Similar reasoning applies to subjective reports, as individuals can intentionally overestimate their pain experience conveyed through verbal reports. This is another case of specificity testing that cannot be addressed in the present study (see new conclusion of the abstract and discussion of limitations). The challenge of pain assessment is a classical problem within both the scientific and the clinical literature. Here, we suggest that the consideration of multiple manifestations of pain is necessary to address this challenge and will provide a more comprehensive portrait of pain-related brain function.

      Introduction

      I believe that the Introduction would benefit from a strict definition of what is a marker/biomarker/neuromarkers (all those terms are used in the manuscript) and what are its desirable features (validity, reliability, specificity, etc.). I also believe that the Introduction (and the rest of the text) would benefit from a critical assessment of the term "signature". The Introduction describes four existing "signatures", all of them differing in the experimental condition in which acute nociceptive pain is studied, and proposes a fifth one. Keeping with the analogy, I'm wondering whether they should be called (pain) "signatures" if there is a different one for each experimental acute pain condition, and they are so dissimilar between them when they are tested on the same condition (this dataset).

      The last part of that comment raises fundamental methodological potential limitations that should be addressed in more depth in another article. That point goes beyond the scope of a research article. Regarding the stability aspect of the signatures, most of the signatures have not been studied extensively. It is thus difficult to currently assess their reliability. However, Han et al. (2022) showed high within-individual test-retest reliability for the NPS across eight different studies. Given that pain is a multidimensional experience, it is not surprising to find different patterns of activation predictive of different aspects or dimensions of the pain experience (see Čeko et al., 2022 for a similar discussion applied to negative affect).

      The authors state that "As an automatic behavioral manifestation, pain facial expression might be an indicator of activity in nociceptive systems, perceptual and evaluative processes, or general negative affect." Doesn't it reflect all three of them? (and instead of or?) Why "might"?

      The original sentence has been modified as follows: “As an automatic behavioral manifestation, pain facial expression is considered to be an indicator of activity in nociceptive systems, and to reflect perceptual and affective-evaluative processes” (lines 65 to 67).

      Methods

      The pain scale should be described. Kunz et al. used a 0-100 scale, where 50 was the pain threshold. This is crucial to interpret the 75-80/100 score for the painful thermal intensity.

      The description of the pain scale has been added to the manuscript (lines 299 to 300).

      Ratings for warm and painful temperatures should be reported (ideally plotted with individual-trial/subject data). In the same line of reasoning, FACS scores should be reported as well (ideally plotted with individual-trial/subject data). It would be interesting to explore the across-trial variability of pain ratings and FACS scores. That is, do people keep giving the same ratings and making the same facial expression after 16 trials? How much variability is between trials and between subjects?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1 (also see the new Figures S2 and S4; see also lines 335 to 346).

      How come only painful trials are analyzed? What if the FEPS signature was the same for warm and painful stimulation, thus reflecting the settings (fMRI experiment, stimulation, etc.) rather than the brain response to the stimuli?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1. There was no pain expression in the warm trials and the FEPS shows no response to warm trials. This is now illustrated in the new Figure S4B (see also lines 138 to 158).

      The authors propose to predict the trial-by-trial FACS composite score from the pain ratings using a LMM. However, it is interesting that they aim for an almost constant within- and between-subject pain score (75-80/100) as stated in the Methods. This should theoretically render the linear model invalid since its first (and main) assumption would be that FACS should vary linearly with the pain score. Even if patients were not aware that the temperatures were constant across trials, the variation in pain scores should be explained by random noise for a constant stimulation intensity.

      Reviewer #3 raises an important point that we need to clarify. Contrary to the expectation that FACS responses should be strongly correlated to pain ratings, we posited that these response channels depend at least in part on separate brain networks that may be differentially sensitive to a variety of modulatory mechanisms (attention, emotion, expectancy, motor priming, social context, etc.). This implies that part of the variance in FACS is independent from pain ratings. We, therefore, consider what Reviewer #3 refers to as random noise to be relevant and meaningful fluctuations reflecting endogenous processes influencing one’s experience of pain and differentially affecting various output responses.

      I noticed that fMRI data was analyzed with SPM5 in the original paper (Kunz et al., 2011) and with SPM8 in this manuscript. Was fMRI data re-processed for this manuscript? Were there any differences between the original analysis and this one that might induce changes in the interpretation of results?

      The data were indeed re-processed using SPM8, which was the most recent version available when we started the analyses reported here. We used trial-by-trial activation maps for MVPA, which differs from what was used in the previous study (contrast maps at the level of the conditions, not the trials). We have no reason to believe that the different versions will change the message of this manuscript since those versions do not differ significantly in terms of the fMRI preprocessing pipeline (see SPM8 release notes; https://www.fil.ion.ucl.ac.uk/spm/software/spm8/). Furthermore, the aim of this present study is not to compare the different analysis parameters implemented in SPM5 vs SPM8.

      What is the rationale for including PVP in the comparison among signatures? The experimental settings in which it was devised are distant from those described here.

      The inclusion of the PVP was aimed at enhancing our comparative analysis with the FEPS, as we sought to investigate the potential functional meaning of the FEPS. The PVP was developed to capture the aversive value of pain, a dimension that is conceptually proximal to the interpretation of the facial expression as a manifestation of the affective response to nociceptive pain.

      The LASSO-PCR approach is, in my opinion, not a procedure for (brain) decoding in this context. It is accurately described in the section title as a method for multivariate pattern analysis, or as a variable selection and regularization method for a prediction model. Here, brain activity in specific areas related to pain processing can hardly be described as "encoded", and the method just helps select those activations relevant for explaining a certain outcome (in this case, facial expressions).

      We understand the point made by reviewer #3. The term brain decoding was changed for multivariate pattern analysis in the latest version of the manuscript.

      Details are missing with regards to the dataset split into training, validation, and testing.

      Details about the training and testing procedure were added in the manuscript (lines 383 to 385).

      This might just be ignorance from me, so I apologize in advance, but what are "contrast" fMRI images? They are mentioned three times in the text but not really described. Are they the "Pain > Warm" contrasts from the original paper?

      We apologize for any confusion caused by the use of the term “contrast images” which suggests a direct comparison between two experimental conditions. We have replaced “contrast images” with “activation maps” to provide a more accurate description of the nature of the data used in the multivariate pattern analysis (lines 388 to 389).

      In the "Facial expression" section, the authors run an LMM to test the association between pain ratings (response variable) and facial responses (explanatory variable). If I understand correctly, in the "Multivariate pattern analysis" section they test the association between facial composite scores (response variable) and pain ratings (explanatory variable), but they obtain different results.

      The analyses were recomputed on the log transformed data, as mentioned previously in the response to reviewers 1-2. The first model (in the “Facial expression” section) used the log transformed FACS scores as a dependent variable, the pain ratings as the fixed effect, and the participants as the random effect. The results of that analysis suggested that the transformed facial expression scores were not significantly associated with the pain ratings (p = .07). The second model uses both the FEPS pattern expression scores and pain ratings as fixed effects to predict facial responses. This analysis showed the significant contribution of the FEPS to the prediction of FACS scores (p < .001) and no significant effect of the pain ratings. However, a significant interaction was found (p = .03) suggesting that the prediction of the pain facial expression by the FEPS may vary with pain ratings (i.e. moderator effect). Those results have been clarified in the “Multivariate pattern analysis” section in the Methods (lines 416 to 426).

      In this same section, what are "FEPS pattern expression scores"? They are used three times in the text, but I could not find their description.

      The FEPS pattern expression scores correspond to the dot product between the trial-by-trial activation maps and the unthresholded FEPS signature. This information has been added to the manuscript (lines 413 to 414).

      It would not be far-fetched to hypothesize that FACS scores could be predicted using solely activity from the motor cortex. The authors attempted to do this, but only with information from M1. Why did they not use the entire motor cortex, or better, regions of the motor cortex directly linked with the AUs described in the manuscript?

      The selection of the primary motor area (M1) was based on the results found in Kunz et al. (2011). In this study, M1 showed the strongest correlation with facial expression of pain. There are numerous possibilities of combinations of multiple brain regions considering a variety of criteria based on distributed networks involved in motor, affective, or pain-related processes. We limited our exploration to the region with the strongest hypothesis due to practical feasibility concerns.

      Results and Discussion

      As a general recommendation, results should present individual data whenever possible. For example, the association between signatures and facial expression should be plotted using scatterplots.

      We have added figures showing individual data when it was applicable (Figure S2; Figure S4).

      The authors state that the LASSO-PCR model accounts for the facial responses to pain. I believe this is an overstatement, considering:

      - A Pearson's r of 0.49 is usually considered low/weak correlation (moderate at best). In the same line, an R2 of 0.17 means that only 17% of the variance is explained by the model.

      More nuanced interpretation of the results has been added to the discussion. A section has been added to highlight the limitations of the study.

      - Figure 1 needs to display individual subject data and the ideal regression line.

      The model was trained using a k-fold cross-validation procedure. The regression lines thus represent the model’s prediction for each one of the 10 folds (i.e. each fold is trained and tested on a different subset of the data). A scatter plot including the ideal regression line computed across all trials and subjects was added in supplementary material to illustrate the relation between the FACS scores and the FEPS pattern expression scores (Figure S4).

      - Looking at Figure 1, it is clear that the model has an intercept different from zero. This means that when the FACS score was zero (i.e., volunteers did not make any distinguishable facial expression), the model predicted a score larger than zero. This is not discussed in the manuscript, and in simple terms, it means that there are brain activation patterns when no discernible facial expression is being made by the volunteers. In the original paper by Kunz et al., two groups of subjects were categorized, and one of them was a facially low- or non-expressive group (n=13). This fact is not even mentioned in the manuscript.

      The categorization in the previous report (Kunz et al., 2012) was based on a pre-experimental session. All subjects were included in the current analysis. This is now indicated in the Methods (lines 287 to 289).

      - On the other end of the range in Figure 1, differences between the FACS scores near the maximum range (40) are underestimated by 23 to 33 points! I guess that the RMSE is smaller (6-7 points), because many FACS scores are concentrated on the low end of the scale.

      This is a very interesting comment. A section discussing the limits of the model to predict the lower and higher FACS scores has been added in the manuscript (lines 232 to 250).

      It is of course acceptable to interpret the low similarity between signatures as a sign that each signature describes a different mechanism related to pain processing. However, I believe that a complete discussion should contemplate other competing hypotheses. Considering that all signatures were developed using a similar painful thermal stimulation protocol, it is reasonable to expect larger similarities between signatures. The fact that they are so dissimilar could be a reflection of model overfit, i.e., all these signatures are just fitted to these particular experimental protocols and data, and do not generalize to brain mechanisms of pain processing.

      We appreciate the pertinent observation. We have included a limitations section in which we discussed, among other considerations, the possible overfitting of models and the necessity of pursuing generalizability studies (lines 225 to 268).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study on the regulation of chlorophyll biosynthesis in rice embryos. It provides insights into the genetic and molecular interactions that underlie chlorophyll accumulation, highlighting the inhibition of OsGLK1 by OsNF-YB7 and the broader implications for understanding chloroplast development and seed maturation in angiosperms. The results presented, including mutation analysis, gene expression profiles, and protein interaction studies, provide convincing evidence for the function of OsNF-YB7 as a repressor in the chlorophyll biosynthesis pathway.

      Thank you very much for your positive assessment of our manuscript. We have carefully revised the manuscript according to the reviewers’ valuable suggestions and comments. For more details, please see the point-to-point response to the reviewers below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript investigates the regulation of chlorophyll biosynthesis in rice embryos, focusing on the role of OsNF-YB7. The rigorous experimental approach, combining genetic, biochemical, and molecular analyses, provides a robust foundation for these findings. The research achieves its objectives, offering new insights into chlorophyll biosynthesis regulation, with the results convincingly supporting the authors' conclusions.

      Strengths:

      The major strengths include the detailed experimental design and the findings regarding OsNF-YB7's inhibitory role.

      Weaknesses:

      However, the manuscript's discussion on the practical implications for agriculture and the evolutionary analysis of regulatory mechanisms could be expanded.

      Thank you for your insightful comments and suggestions. In the revised manuscript, we discussed the potential application of the chlorophyllous embryo (please see line 270-274). The presence of chlorophyll in the embryo facilitates photosynthesis at early developmental stages, potentially leading to improved seedling growth and vigor (Smolikova and Medvedev, 2016). In crops such as soybean and canola, green embryo is considered as a valuable trait due to its association with enhanced photosynthetic capacity, which consequently promotes fatty acid biosynthesis (Ruuska et al., 2004). However, chlorophyll degradation must be carefully managed during seed maturation to avoid negative effects on seed viability and meal quality (Chung et al., 2006). Interestingly, the green embryo of lotus (Nelumbo nucifera) is widely used as a food ingredient in Asian, Australia, and North America. It is employed in herbal medicine to treat nervous disorders, insomnia, and other conditions (Zhu et al., 2017; Ha et al., 2022), highlighting the significant potential value of the green embryo.

      In many chloroembryophytes, such as Arabidopsis, the embryo occupies a large proportion of the seed. From an evolutionary perspective, the presence of chlorophyll in the embryo may promote adaptation in such chloroembryophytes because more reserves can be accumulated in the seed through active photosynthesis, better supporting the embryo development and subsequent seedling growth (Sela et al., 2020). On the other hand, some leucoembryophytes, such as rice, have persistent endosperm rich in storage reserves to nourish embryo development (Liu et al., 2022). Gaining the ability to accumulate chlorophyll in the embryo is unnecessary for such species. In agreement with this hypothesis, cholorophyllous embryos are more prevalent in non-endospermous seeds (Dahlgren, 1980). However, we would like to emphasize that the evolutionary force driving the divergence of chloroembryophytes and leucoembryophytes is currently almost completely unknown and deserves in-depth investigation in the future. We discussed the possible evolution of the ability to accumulate chlorophyll in the embryo, please find the details in Line 276-295.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to establish the role of the rice LEC1 homolog OsNF-YB7 in embryo development, especially as it pertains to the development of photosynthetic capacity, with chlorophyll production as a primary focus.

      Strengths:

      The results are well-supported and each approach used complements each other. There are no major questions left unanswered and the central hypothesis is addressed in every figure.

      Weaknesses:

      There are a handful of sections that could use clarifying for readers, but overall this is a solidly composed manuscript.

      The authors clearly achieved their aims; the results compellingly establish a disparity between how this system operates in rice and Arabidopsis. Conclusions are thoroughly supported by the provided data and interpretations. This work will force a reconsideration of the value of Arabidopsis as a model organism for embryo chlorophyll biosynthesis and possibly photosynthesis during embryo maturation more broadly, as rice is a major crop organism and it very clearly does not follow the Arabidopsis model. It will thus be useful to carry out similar tests in other organisms rather than relying on Arabidopsis and attempting to more fully establish the regulatory mechanism in rice.

      Thank you very much for your positive comments. We have carefully revised the manuscript according to your and the other reviewers’ comments and suggestions. Particularly, we emphasized the necessary to carry out similar tests in other organisms rather than relying on Arabidopsis to better understand the regulatory mechanism in rice.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to understand the mechanisms behind chlorophyll biosynthesis in rice, focusing in particular on the role of OsNF-YB7, an ortholog of Arabidopsis LEC1, which is a positive regulator of chlorophyll (Chl) biosynthesis in Arabidopsis. They showed that OsNF-YB7 loss-of-function mutants in rice have chlorophyll-rich embryos, in contrast to Arabidopsis LEC1 loss-of-function mutants. This contrasting phenotype led the authors to carry out extensive molecular studies on OsNF-YB7, including in vitro and in vivo protein interaction studies, gene expression profiling, and protein-DNA interaction assays. The evidence provided well supported the core arguments of the authors, emphasising that OsNF-YB7 is a negative regulator of Chl biosynthesis in rice embryos by mediating the expression of OsGLK1, a transcription factor that regulates downstream Chl biosynthesis genes. In addition, they showed that OsNF-YB7 interacts with OsGLK1 to negatively regulate the expression of OsGLK1, demonstrating the broad involvement of OsNF-YB7 in rice Chl biosynthetic pathways.

      Strengths:

      This study clearly demonstrated how OsNF-YB7 regulates its downstream pathways using several in vitro and in vivo approaches. For example, gene expression analysis of OsNF-YB7 loss-of-function and gain-of-function mutants revealed the expression of selected downstream chl biosynthetic genes. This was further validated by EMSA on the gel. The authors also confirmed this using luciferase assays in rice protoplasts. These approaches were used again to show how the interaction of OsNF-YB7 and OsGLK1 regulates downstream genes. The main idea of this study is very well supported by the results and data.

      Weaknesses:

      From an evolutionary perspective, it is interesting to see how two similar genes have come to play opposite roles in Arabidopsis and rice. It would have been more interesting if the authors had carried out a cross-species analysis of AtLEC1 and OsNF-YB7. For example, overexpressing AtLEC1 in an osnf-yb7 mutant to see if the phenotype is restored or enhanced. Such an approach would help us understand how two similar proteins can play opposite roles in the same mechanism within their respective plant species.

      We appreciate your insightful comments and suggestions. It is a very interesting question whether AtLEC1 can fully restore osnf-yb7, given the possible functional divergence between the genes in terms of regulation of chlorophyll biosynthesis in the embryo. We have previously expressed OsNF-YB7 in the lec1-1 background in Arabidopsis, driven by the native promoter of LEC1 (Niu et al., 2021). We found that OsNF-YB7 could almost completely rescue the embryo defects in Arabidopsis, indicating that OsNF-YB7 plays a resemble role in rice as the LEC1 does in Arabidopsis (Niu et al., 2021). We sought to determine whether AtLEC1 can complement the chlorophyll defect in osnf-yb7. However, given the fact that osnf-yb7 shows severe callus induction defect, which is not surprising, because many studies have shown that LEC1 is indispensable for somatic embryo development in various plant species, we are struggling to obtain the genetic materials for analysis. We have to transform OsNF-YB7pro::AtLEC1 into the WT background first, and then cross the transformant with the osnf-yb7 mutant. This is a time-consuming process in rice, but hopefully we will able to isolate a line expressing OsNF-YB7pro::AtLEC1 in the osnf-yb7 background from the resulting segregating population.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A minor comment regarding the chlorophyll contents quantification in the study. Line 87: "The results showed that WT had an achlorophyllous embryo throughout embryonic development,...." In the TEM result, chloroplast was not observed in the WT embryo sections, indicating a lack of chlorophyll-containing structures, contrary to what was found in the osnf-yb7 embryos where chloroplasts were observed.

      The authors stated that the embryo morphologies and Chl autofluorescence data showed that WT had an achlorophyllous embryo throughout embryonic development. However, the quantification of Chl levels in Figure 1D and Figure 4C showed that WT does produce some chlorophylls, albeit at lower levels than osnf-yb7 or OSGLK-OX embryos (WT values in the two figures are slightly different). This discrepancy warrants clarification to ensure consistency and accuracy in the manuscript's findings.

      We re-evaluated the Chl content in the embryos of WT and OsGLK1-OX mature seeds. The result confirmed our previous finding that WT embryos produce a small amount of chlorophyll (please see the updated Fig. 4C). Notably, we observed that the dark-grown etiolated plants still have measurable chlorophyll content as reported in many studies (for example, Wang et al., 2017; Yoo et al., 2019), suggesting that there is potential bias in measuring chlorophyll content using an absorbance-based approach. We assume this possibly explains the concern you have raised.

      Reviewer #2 (Recommendations For The Authors):

      Mild editing for grammar is needed throughout, e.g. line 73, "It is still a mysterious why plant species".

      We have carefully edited the grammar.

      As a minor point, the placement of figure panels, such as in Figure 1, is not always intuitive.

      Thank you for your suggestion. This figure has been revised as suggested. Please see the updated Fig. 1.

      What is the significance of the two GFP mutants in Figures 2C and 2D? Is one of those the mislabeled Flag mutant?

      The lines showed in Fig. 2C and D were not mislabeled. They were two independent transgenic events, both of which showed that OsNF-YB7 inhibited the expression of OsPORA and OsLHCB4 in rice. The transgenic lines overexpressing OsNF-YB7 tagging with the 3× Flag (NF-YB7-Flag) were also used for this experiment. In agreement, OsPORA and OsLHCB4 were significantly downregulated in the three independent NF-YB7-Flag lines (Fig. S4C), confirming the results showed in Fig. 2C and D.

      In Figures 2G and 2H, what is that enormous band at the bottom of the gel?

      The bands at the bottom of the gel were free probes. We indicated this in the revised figure.

      Not until the Materials and Methods section did I realize that any of this study was being done in tobacco; the Introduction implies it's rice vs. Arabidopsis and it might be a good idea to mention the organism of study somewhere before Figure 6.

      We apologize for any confusion caused by our previous writing. While the majority of this study was performed with rice plants or protoplasts, the split complementary LUC assays and BiFC assays were performed with tobacco. We have specified these in the revised manuscript as suggested.

      Reviewer #3 (Recommendations For The Authors):

      It would be nice if the author could show what the phenotype is in AtLEC1 OX in osnf-yb7 and also OsNF-YB7 OX in atlec1 mutants.

      Thank you for your suggestion. We have previously expressed OsNF-YB7 in the lec1-1 background of Arabidopsis, driven by the native promoter of Arabidopsis LEC1 (Niu et al., 2021). Since OsNF-YB7 could rescue the embryo morphogenesis defects in Arabidopsis (Niu et al., 2021), we assumed that OsNF-YB7 plays a similar role in rice as the LEC1 does in Arabidopsis. However, it remains unknown whether expression of LEC1 in osnf-yb7 may restore the chlorophyllous embryo phenotype in rice. As the generation of genetic material is time-consuming, and especially given the fact that osnf-yb7 has a severe callus induction defect, we are struggling to obtain the complementary line for analysis. We have to transform OsNF-YB7pro::AtLEC1 in a WT background first, and then cross the transformant with the osnf-yb7 mutant. Hopefully, we will be able to isolate a line expressing OsNF-YB7pro::AtLEC1 in osnf-yb7 background, from the derived segregating population. We discussed the reviewer’s concern in the revised manuscript, please see Line 369-376.

      Line 46, I think it is vague to mention that 'Like most plant species'. Some species might have different copy numbers, for example, a single GLK in liverwort M. polymorpha.

      The statement has been revised. Please see Line 46.

      Figures 2F and 5B, why was only one promoter region used for OsLHCB4? It would be better to have more regions like OsPORA.

      Thank you for your comments. Here, we have examined more promoter regions (P1, P2 and P3) in the revised manuscript as suggested, among which, the previously selected promoter region (P3) contains both the G-box and CCAATC motifs that can be potentially recognized by GLK1. Consistent to our previous report, the results showed that OsNF-YB7 (left) and OsGLK1 (right) were associated with the P3 region, but showed no significant differences in the other probes. Please see the results in Fig. 2F and Fig. 5B of the revised manuscript.

      Legend of Figures 2G, H, OsPORA (I), and OsLHCB (J) should be (G) and (H) respectively.

      Corrected.

      References

      Chung, D.W., Pruzinska, A., Hortensteiner, S., and Ort, D.R. (2006). The role of pheophorbide a oxygenase expression and activity in the canola green seed problem. Plant Physiol 142, 88-97.

      Ha, T., Kim, M.S., Kang, B., Kim, K., Hong, S.S., Kang, T., Woo, J., Han, K., Oh, U., Choi, C.W., and Hong, G.S. (2022). Lotus Seed Green Embryo Extract and a Purified Glycosyloxyflavone Constituent, Narcissoside, Activate TRPV1 Channels in Dorsal Root Ganglion Sensory Neurons. J Agric Food Chem 70, 3969-3978.

      Liu, J., Wu, M.W., and Liu, C.M. (2022). Cereal Endosperms: Development and Storage Product Accumulation. Annu Rev Plant Biol 73, 255-291.

      Niu, B., Zhang, Z., Zhang, J., Zhou, Y., and Chen, C. (2021). The rice LEC1-like transcription factor OsNF-YB9 interacts with SPK, an endosperm-specific sucrose synthase protein kinase, and functions in seed development. Plant J 106, 1233-1246.

      Ruuska, S.A., Schwender, J., and Ohlrogge, J.B. (2004). The capacity of green oilseeds to utilize photosynthesis to drive biosynthetic processes. Plant Physiol 136, 2700-2709.

      Sela, A., Piskurewicz, U., Megies, C., Mene-Saffrane, L., Finazzi, G., and Lopez-Molina, L. (2020). Embryonic Photosynthesis Affects Post-Germination Plant Growth. Plant Physiol 182, 2166-2181.

      Smolikova, G.N., and Medvedev, S.S. (2016). Photosynthesis in the seeds of chloroembryophytes. Russ J Plant Physl+ 63, 1-12.

      Wang, Z., Hong, X., Hu, K., Wang, Y., Wang, X., Du, S., Li, Y., Hu, D., Cheng, K., An, B., and Li, Y. (2017). Impaired Magnesium Protoporphyrin IX Methyltransferase (ChlM) Impedes Chlorophyll Synthesis and Plant Growth in Rice. Front Plant Sci 8, 1694.

      Yoo, C.Y., Pasoreck, E.K., Wang, H., Cao, J., Blaha, G.M., Weigel, D., and Chen, M. (2019). Phytochrome activates the plastid-encoded RNA polymerase for chloroplast biogenesis via nucleus-to-plastid signaling. Nat Commun 10, 2629.

      Zhu, M., Liu, T., Zhang, C., and Guo, M. (2017). Flavonoids of Lotus (Nelumbo nucifera) Seed Embryos and Their Antioxidant Potential. J Food Sci 82, 1834-1841.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      As a reviewer for this manuscript, I recognize its significant contribution to understanding the immune response to saprophytic Leptospira exposure and its implications for leptospirosis prevention strategies. The study is well-conceived, addressing an innovative hypothesis with potentially high impact. However, to fully realize its contribution to the field, the manuscript would benefit greatly from a more detailed elucidation of immune mechanisms at play, including specific cytokine profiles, antigen specificity of the antibody responses, and long-term immunity. Additionally, expanding on the methodological details, such as immunophenotyping panels, qPCR normalization methods, and the rationale behind animal model choice, would enhance the manuscript's clarity and reproducibility. Implementing functional assays to characterize effector T-cell responses and possibly investigating the microbiota's role could offer novel insights into the protective immunity mechanisms. These revisions would not only bolster the current findings but also provide a more comprehensive understanding of the potential for saprophytic Leptospira exposure in leptospirosis vaccine development. Given these considerations, I believe that after substantial revisions, this manuscript could represent a valuable addition to the literature and potentially inform future research and vaccine strategy development in the field of infectious diseases.

      Reviewer #2 (Public Review):

      Summary:

      The authors try to achieve a method of protection against pathogenic strains using saprophytic species. It is undeniable that the saprophytic species, despite not causing the disease, activates an immune response. However, based on these results, using the saprophytic species does not significantly impact the animal's infection by a virulent species.

      Strengths:

      Exposure to the saprophytic strain before the virulent strain reduces animal weight loss, reduces tissue kidney damage, and increases cellular response in mice.

      Weaknesses:

      Even after the challenge with the saprophyte strain, kidney colonization and the release of bacteria through urine continue. Moreover, the authors need to determine the impact on survival if the experiment ends on the 15th.

      Reviewer #3 (Public Review):

      Summary:

      Kundu et al. investigated the effects of pre-exposure to a non-pathogenic Leptospira strain in the prevention of severe disease following subsequent infection by a pathogenic strain. They utilized a single or double exposure method to the non-pathogen prior to challenge with a pathogenic strain. They found that prior exposure to a non-pathogen prevented many of the disease manifestations of the pathogen. Bacteria, however, were able to disseminate, colonize the kidneys, and be shed in the urine. This is an important foundational work to describe a novel method of vaccination against leptospirosis. Numerous studies have attempted to use recombinant proteins to vaccinate against leptospirosis, with limited success. The authors provide a new approach that takes advantage of the homology between a non-pathogen and a pathogen to provide heterologous protection. This will provide a new direction in which we can approach creating vaccines against this re-emerging disease.

      Strengths:

      The major strength of this paper is that it is one of the first studies utilizing a live non-pathogenic strain of Leptospira to immunize against severe disease associated with leptospirosis. They utilize two independent experiments (a single and double vaccination) to define this strategy. This represents a very interesting and novel approach to vaccine development. This is of clear importance to the field.

      The authors use a variety of experiments to show the protection imparted by pre-exposure to the non-pathogen. They look at disease manifestations such as death and weight loss. They define the ability of Leptospira to disseminate and colonize the kidney. They show the effects infection has on kidney architecture and a marker of fibrosis. They also begin to define the immune response in both of these exposure methods. This provides evidence of the numerous advantages this vaccination strategy may have. Thus, this study provides an important foundation for future studies utilizing this method to protect against leptospirosis.

      Weaknesses:

      Although they provide some evidence of the utility of pretreatment with a non-pathogen, there are some areas in which the paper needs to be clarified and expanded.

      The authors draw their conclusions based on the data presented. However, they state the graphs only represent one of two independent experiments. Each experiment utilized 3-4 mice per group. In order to be confident in the conclusions, a power analysis needs to be done to show that there is sufficient power with 3-4 mice per group. In addition, it would be important to show both experiments in one graph which would inherently increase the power by doubling the group size, while also providing evidence that this is a reproducible phenotype between experiments. Overall, this weakens the strength of the conclusions drawn and would require additional statistical analysis or additional replicates to provide confidence in these conclusions.

      A direct comparison between single and double exposure to the non-pathogen is not able to be determined. The ages of mice infected were different between the single (8 weeks) and double (10 weeks) exposure methods, thus the phenotypes associated with LIC infection are different at these two ages. The authors state that this is expected, but do not provide a reasoning for this drastic difference in phenotypes. It is therefore difficult to compare the two exposure methods, and thus determine if one approach provides advantages over the other. An experiment directly comparing the two exposure methods while infecting mice at the same age would be of great relevance to and strengthen this work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments

      (1) Elucidation of Immune Mechanisms: The manuscript intriguingly suggests that exposure to saprophytic Leptospira primes the host for a Th1-biased immune response, contributing to survival and mitigation of disease severity upon subsequent pathogenic challenge. However, the underlying mechanisms remain broadly defined. A more detailed investigation into the cytokine profiles, particularly the levels of IFN-γ, IL-12, and other Th1-associated cytokines, could clarify the mechanism of Th1 bias. Moreover, exploring the role of antigen-presenting cells (APCs) in priming T cells towards a Th1 phenotype would add valuable insights.

      In this study we continue to elucidate the immune mechanisms engaged by pathogenic and non-pathogenic Leptospira as a follow up to our previous work (Shetty et al, 2021 PMID: 34249775, and Kundu et al 2022 PMID 35392072). We, and others, have shown that saprophytic L. biflexa and pathogenic L. interrogans induce major chemo-cytokines associated with Th1 biased immune responses (Shetty et al. 2021; Cagliero et al. 2022; Krangvichian et al. 2023) and engage myeloid immune cells such as macrophages and dendritic cells. The role of antigen presenting cells such as dendritic cells in priming T cells and activating adaptive response is a separate question and can be addressed in the future. To further address this question, a recent mechanistic study (Krangvichian et al. 2023) showed that non-pathogenic leptospires (L. biflexa) promote MoDC maturation and stimulate the proliferation of IFN-γ-producing CD4+ T cells and potentially elicit a Th1-type response in mice, which also supports our current claim and it is referenced in our manuscript.

      (2) Quantitative Analysis of Kidney Colonization: The manuscript reports that pre-exposure to L. biflexa did not prevent the colonization of kidneys by L. interrogans but led to a more regulated immune response and reduced fibrosis. A more nuanced quantification of bacterial loads in the kidneys, using techniques such as CFU counting or more sensitive qPCR methods, could provide a clearer picture of how saprophytic exposure affects the ability of pathogenic Leptospira to establish infection. Additionally, a time-course study showing the kinetics of bacterial colonization and clearance post-infection would be informative.

      We are currently validating digital PCR to use in the future and plan to do time course studies.

      (3) Characterization of B Cell and T Cell Responses: While the manuscript mentions increased B cell frequencies and effector T helper cell responses, specifics regarding the nature of these responses are lacking. For instance, detailing the isotype and specificity of antibodies produced, the proliferation rates of specific B and T cell subsets, and their functional capabilities (e.g., cytotoxicity, help for B cells) would significantly enrich the understanding of the immune response elicited by pre-exposure to saprophytic Leptospira.

      Indeed, additional experiments need to be conducted to flush out the immune responses engaged after pre-exposure to saprophytic Leptospira followed by LIC challenge.

      (4) Comparative Analysis with Other Models of Pre-exposure: The study primarily focuses on pre-exposure to a live saprophytic Leptospira. Including a comparison with pre-exposure to killed saprophytic bacteria, or even to other non-pathogenic microbes, could help discern whether the observed protective effect is unique to live saprophytic Leptospira exposure or if it represents a more general phenomenon of trained immunity.

      Regarding the use of other non-pathogenic microbes, our lab has shown in the past that oral use of probiotic strain Lactobacillus plantarum (Potula et al 2017) also reduces the severity of Leptospirosis by recruiting myeloid cells. Thus, there may be a general phenomenon of trained immunity involved. We added this to the discussion.

      (5) Assessment of Long-term Immunity: The study provides valuable insights into the short-term outcomes following saprophytic Leptospira exposure and subsequent pathogenic challenge. Extending these observations to assess long-term immunity, including memory B and T cell responses several months post-infection, would be crucial for understanding the potential of saprophytic Leptospira exposure in providing lasting protection against leptospirosis.

      Long term immunity is a complex and separate question that we plan to address later.

      Minor Comments

      (1) Technical Specifics of Flow Cytometry Analysis: The manuscript could benefit from including more details on the flow cytometry gating strategy and the specific markers used to identify different immune cell subsets. This addition would aid in the reproducibility of the results and allow for a clearer interpretation of the immune profiling data.

      We included the technical specifics of the flow-cytometry analysis in the materials and methods section. The gating strategy (Fig S1) and the specific markers (TableS1) used to identify different immune cell subsets were incorporated in the supplementary datasheet. The cell specific markers were incorporated in the figures (Fig 5 and 6) under each representative cell subset which facilitates clarity and reproducibility of immune profiling.

      (2) Statistical Methodology for IgG Subtyping: The analysis of IgG subtypes in response to Leptospira exposure is intriguing but would be strengthened by specifying the statistical tests used to compare IgG1, IgG2a, and IgG3 levels between groups. Additionally, discussing the biological significance of the observed differences in IgG subtype levels would provide a more comprehensive understanding of the immune response.

      We applied the ordinary One-way ANOVA test to compare the IgG subtypes between groups followed by a Tukey’s multiple comparison correction analysis (included in the figure legend of Fig 4). We addressed the biological relevance of the observed differences in IgG subtype levels in the discussion section.

      (3) Details on Animal Welfare and Ethical Approval: While the manuscript mentions compliance with institutional animal care and use committee protocols, providing the specific ethical guidelines followed, such as the 3Rs (Replacement, Reduction, Refinement), would reinforce the commitment to ethical research practices.

      This is addressed in our institutional IACUC which is approved and listed in Methods.

      (4) Clarification of Figure Legends: Some figure legends are brief and could be expanded to more thoroughly describe what the figures show, including details on what specific data points, error bars, and statistical symbols represent.

      We updated and expanded the figure legends (Fig 1-4).

      (5) Revision of Introduction and Background: The introduction provides a good overview of leptospirosis and the rationale behind the study. However, it could be further improved by briefly summarizing current challenges in vaccine development against leptospirosis and how understanding the immune response to saprophytic Leptospira could address these challenges.

      We revised the introduction keeping this comment in mind.

      Reviewer #2 (Recommendations For The Authors):

      - Perform the same challenge experiment with a hamster.

      We clarified throughout the manuscript that all the work was done using the C3H-HeJ mouse model which was developed in our lab for the purpose of measuring differences in sublethal and lethal LIC infections. We leave the experiments using hamster to the investigators that have thoroughly validated the hamster model of lethal Leptospira infection.

      - Review the written part where it is understood that the challenge with saprophyte strain before virulence prevents the disease.

      We reviewed the manuscript to be understood that inoculation of mice with a saprophyte Leptospira before pathogenic challenge prevents severe leptospirosis and promotes kidney homeostasis and increased shedding of Leptospira in urine which is interesting. The last 2 sentences of the abstract read: “Thus, mice exposed to live saprophytic Leptospira before facing a pathogenic serovar may withstand infection with far better outcomes. Furthermore, a status of homeostasis may have been reached after kidney colonization that helps LIC complete its enzootic cycle.”

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 83: The authors refer to the classification of Leptospira by old nomenclature. The bacteria are now categorized into clades P1, P2, S1 and S2. See Vincent et al. Revisiting the taxonomy and evolution of pathogenicity of the genus Leptospira through the prism of genomics. PLoS Negl Trop Dis. 2019 May 23;13(5):e0007270. doi: 10.1371/journal.pntd.0007270. PMID: 31120895; PMCID: PMC6532842.

      We have included the categories (S1 for L. biflexa and P1+ for L. interrogans) in introduction and methods but we did not update the figures because we want to be specific about the species used in these experiments. We also include a few sentences on evolution of Leptospira species in discussion and reference Thibeaux 2018, Vincent 2019 and Giraud-Gatineau, 2024.

      (2) Line 133: Please remove the extra line to be consistent with the rest of the method section format.

      We addressed all formatting issues.

      (3) Line 137: Are these primers specific to pathogenic L. interrogans? Or do they cross react with L. biflexa? If not specific, how long does L. biflexa stick around after infection?

      The primers are specific to the genus Leptospira. Surdel et al. in 2022 used 16s rRNA target sequence to amplify L. biflexa Patoc in mice at 6 hours post infection. We did not detect any positive sample for L. biflexa with the 16s rRNA primer set because we do our analysis 30 days and 45 days post inoculation with L. biflexa. We clarified this issue in methods and results.

      (4) Statistical analysis:

      (a) Some of your graphs have more than 4 points on them (such as Figure 4), while the legend still reads "represents one of two independent experiments". Are these actually combined replicates in the same graph? Combining them would provide strength to your conclusions throughout your manuscript and may provide stronger power for comparisons. If they are not included, why are they not included together? Please clarify what is included in each graph, and why the two experiments were not included together.

      We updated the legends with the total number of mice used in the experiment represented in the figure. Figures 1, 2, 4 and S2 contain the combined results from two independent experiments. Figures 3, 5 and 6 represent data from one of two independent experiments. For Fig 3 it would be redundant to show HE images of two experiments. Regarding Figs 5 and 6, the flow-cytometry equipment acquires data at different voltage every single time and biological samples vary between experiments even if all the markers and procedures are the same. So, we reproduce the experiment and show results from one experiment after confirming that the trend between individual experiments are the same.

      (b) If ANOVA was used, were all columns compared to each other? Why in some graphs are "ns" labeled only for certain comparisons? I would suggest removing the "ns" comparisons and only highlighting the significant differences.

      We have incorporated the comparison analysis between control (PBS) versus the PBS-LIC, LB versus LB-LIC and PBS-LIC versus LB-LIC in both the studies although we have compared significance between all groups.

      (5) Line 165: Bacteria were not plated, extract was plated. Perhaps you mean "extract corresponding to 107-108 bacteria"?

      We addressed it as follows: “Nunc MaxiSorp flat-bottom 96 well plates (eBioscience, San Diego, CA) were coated with extracts prepared from 107-108 bacteria per well and incubated at 4℃ overnight” …

      (6) Line 260: The authors claim that "Exposure to non-pathogenic L. biflexa before pathogenic L. interrogans challenge provided a significant immune cell boost with an increase in overall B and helper T cell frequencies..." However, in Figure 5A, the number of B cells in both the PBS2LIC2 and the LB2LIC2 are not significantly different. Thus, the claim is not supported by the evidence provided. It appears that infection with LIC led to similar increases in B cells regardless of pretreatment.

      We rephrased that title to reflect the finding that increased differences were measured in effector Helper T cells between PBS2LIC2 and LB2LIC2 (Figs 5D and 6B, 6C) and we re-wrote this section for clarity.

      (7) Lines 314-315: The authors claim that it protected against kidney fibrosis, however, the data only supports that only a single exposure to LB reduced levels of a marker associated with kidney fibrosis. Fibrosis was never directly measured.

      Indeed, we didn’t do Mason’s Trichrome stain to get supporting data for kidney fibrosis and only measured a fibrosis marker ColA1. We toned down this section: “ …. it may confer protection against kidney fibrosis.”

      (8) Line 317: Authors state that pre-exposure induced higher antibodies in serum, however, this was never shown. Only an increase in IgG2a was shown. Please word this statement to make it clear total antibodies were never measured.

      We did measure total anti-Leptospira interrogans IgM and IgG antibodies. We added the following sentence to description of these results: “In both experiments, total IgM and IgG were significantly increased in PBS-LIC and LB-LIC when compared to the respective controls, but not between PBS-LIC and LB-LIC.  Regarding IgG isotypes, IgG1…”

      (9) Line 323: The authors state that the exposure "induced antibody responses that provided heterologous protection." There is no evidence that the protection is due to the antibody response in these experiments. In fact, they also showed that it induced increased T cell responses.

      We toned down this statement: “In our study, exposure to a saprophytic Leptospira induced antibody responses that may provide heterologous protection against the pathogenic strain of Leptospira.”

      (10) Line 328: The authors us the term "stark difference", however, only slight differences are seen.

      We toned down that statement as follows:  “Differences in antibody titer among the L. interrogans infected….”

      (11) Line 490: reword this sentence to provide clarity and easier to read: "inoculated once with 10^8 L. biflexa at 6 weeks and they were challenged with 10^8 L. interrogans SEROVAR Copenhageni FioCruz (LIC) at 8 weeks."

      We revised the sentence.

      (12) Figure 1 and 2: Quantifying bacteria in culture after infection is not meaningful, as there are numerous factors that can affect the replication in culture after infection, such as how the organ perhaps was cut before placing it in culture. The comparisons in Figure 2E and F therefore are not interpretable. I would suggest presenting this data as Culture Positive or Culture Negative.

      We added these data to the figure under DFM (dark field microscopy).

      (13) Figure 3A: H&E staining often leads to different qualities of stains. But is there a better image that can be chosen for the PBS1LIC1 that provides a better comparison with the other images chosen? This is not worth repeating the experiment to get one, just make the figure look better if you have one available.

      We screened the images again but the one incorporated in the figure3A for PBS1LIC1 is the best.

      (14) Figure 3D: I agree that the PBS-LIC treatment is significant, but please include P value, as it looks very similar to the LB-LIC group. The two LIC groups are not significantly different, so the conclusion would be pre-exposure does not mitigate renal fibrosis marker ColA in the double-exposure study.

      We included the p-values in this figure. The two LIC groups are significantly different (ColA1) in the single exposure experiment, and the in double exposure we don’t expect to be able to measure ColA1 differences because the mice are older (10 wk) when we do the LIC challenge.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      The "optorepressilator", an optically controllable genetic oscillator based on the famous E. coli 3-repressor (LacI, TetR, CI) oscillator "repressilator", was developed. An individual repressilator shows a stable oscillation of the protein levels with a relatively long period that extends a few doubling times of E. coli, but when many cells oscillate, their phases tend to desynchronize. The authors introduced an additional optically controllable promoter through a conformal change of CcaS protein and let it control how much additional CI is produced. By tightly controlling the leak from the added promoter, the authors successfully kept the original repressilator oscillation when the added promoter was not activated. In contrast, the oscillation was stopped by expressing the additional CI. Using this system, the authors showed that it is possible to synchronise the phase of the oscillation, especially when the activation happens as a short pulse at the right phase of the repressilator oscillation. The authors further show that, by changing the frequency of the short pulses, the repressilator was entrained to various ratios to the pulse period, and the author could reconstruct the so-called "Arnold tongues", the signature of entrainment of the nonlinear oscillator to externally added periodic perturbation. The behaviour is consistent with the simplified mathematical model that simulates the protein concentration using ordinary differential equations.

      Strengths:

      Optical control of the oscillation of the protein clock is a powerful and clean tool for studying the synthetic oscillator's response to perturbation in a well-controlled and tunable manner. The article utilizes the plate reader setup for population average measurements and the mother machine setup for single-cell measurements, and they compensate nicely to acquire necessary information.

      Weakness:

      The current paper added the optogenetically controlled perturbation to control the phase of oscillation and entrainment, but there are a few other works that add external perturbation to a collection of cells that individually oscillate to study phase shift and/or entrainment. The current paper lacks discussion about the pros and cons of the current system compared to previously analyzed systems.

      Recommendations

      Even if the main purpose of the current paper is to develop a toolbox, it is beneficial to emphasize the pros and cons of the current system compared to the existing work. In addition to the ref [36] that authors cite but do not discuss concretely, example literature about entrainment includes:

      - Sanchez, P.G.L., Mochulska, V., Denis,  C.M., Mönke, G., Tomita, T., Tsuchida-Straeten, N., Petersen, Y., Sonnen, K., François, P. and Aulehla, A., 2022. Arnold tongue entrainment reveals dynamical principles of the embryonic segmentation clock. Elife, 11, p.e79575.

      - Heltberg M, Kellogg RA, Krishna S, Tay S, Jensen MH. Noise induces hopping between NF-κB entrainment modes. Cell systems. 2016 Dec 21;3(6):532-9.

      There is surely more literature. It is recommended that a solid discussion be added on the relation between existing works and current work.

      We thank the Reviewer for their positive comments on our manuscript. Their main recommendation is to expand literature and discuss how our method compares to previously reported entrainment of genetic oscillators. In summary, we believe that the main advantages of the optorepressilator are the simplicity of the transcriptional network combined with the flexibility of optical control. In the “Discussion” section of the revised manuscript, we now try to highlight this also in connection to the suggested literature.

      Reviewer #2 (Public Reviews):

      Summary

      In this manuscript by Cannarsa et. al., the authors describe the engineering of a light-entrainable synthetic biological oscillator in bacteria. It is based on an upgraded version of one of the first synthetic circuits to be constructed, the repressilator. The authors sought to make this oscillator entrainable by an external forcing signal, analogous to the way natural biological oscillators (like the circadian clock) are synchronized. They reasoned that an optogenetic system would provide a convenient and flexible means of manipulation. To this end, the authors exploited the CcaS-CcaA light-switchable system, which allows activation and deactivation of transcription by green and red light, respectively. They used this system to make the expression of one of the repressilator's transcription factors (lacI) light-controlled, from a construct separated from the main repressilator plasmid. This way, under red light the oscillator runs freely, but exposure to green light causes overexpression of the lacI, pushing the system into a specific state. Consequently, returning to red light will restore the oscillations from the same phase in all cells, effectively synchronizing the cell population.

      After demonstrating the functionality of the basic concept, the authors combined modeling and experiments to show how periodic exposure to green light enables efficient entrainment, and how the frequency of the forcing signal affects the oscillatory behavior (detuning).

      This work provides an important demonstration of engineering tunability into a foundational genetic circuit, expands the synthetic biology toolbox, and provides a platform to address critical questions about synchronization in biological oscillators. Due to the flexibility of the experimental system, it is also expected to provide a fertile ground for future testing of theoretical predictions regarding non-linear oscillators.

      Strengths:

      The study provides a simple and elegant mechanism for the entertainment of a synthetic oscillator. The design relies on optogenetic proteins, which enable efficient experimentation compared to alternative approaches (like using chemical inducers). This way, a static culture (without microfluidics or change of growth media) can be easily exposed to flexible temporal sequences of the zeitgeber, and continuously measured through time.

      The study makes use of both plate-reader-based population-level readout and mother-machine single-cell measurements. Synchronization through entrainment is a single cell level phenomenon, but with a clear population-level manifestation. Thus, this experimental approach combination provides a strong validation to their system. At the same time, differences between the readout from the two systems have emerged, and provided a further opportunity for model refinement and testing.

      The authors correctly identified the main optimization goal, namely the effective leakiness of their construct even under red light. Then, they successfully overcame this issue using synthetic biology approaches.

      The work is supported by a simplified model of the repressilator, which provides a convenient analytical and numerical means to draw testable predictions. The model predictions are well aligned with the experimental evidence.

      Weaknesses:

      Even after optimizing the expression level of the light-sensitive gene, the system is very sensitive, i.e., a very short exposure is sufficient to elicit the strongest entertainment. This limited dynamic range might hamper some model testing and future usage.

      As a result of the previous point, the system is entrained by transiently "breaking" the oscillator: each pulse of green light represents a Hopf bifurcation into a single attractor. it means that the system cannot oscillate in constant green light. In comparison, this is generally not the case for natural zeitgebers like light and temperature for the circadian rhythms. Extreme values might prevent oscillations (not necessarily due to breaking the core oscillator), but usually, free running is possible in a wide range of constant conditions. In some cases, the free-running period length will vary as a function of the constant value. While the approach presented in this manuscript is valid, a comprehensive analysis of more subtle modes of repressilator entrainment could also be of value.

      The entire work makes use of a single intensity and single duration of the green pulse to force entrainment. While the model has clear predictions for how those modalities should affect entrainment, none of the experiments attempted to validate those predictions.

      While we agree with the Reviewer that all reported experiments were performed with pulses of constant amplitude and duration, we do not see this as a necessary limitation for future studies on the optorepressilator. Using pulse-width modulation, green light intensity could be easily and continuously modulated from zero to a maximum value (as in Fig. 4), exploring a wide range of intermediate intensity levels and therefore of mean LacI production rates from the optogenetic promoter. We do not include additional experiments in the revised manuscript but we have greatly expanded the theoretical discussion on the low amplitude regime, both for a constant illumination (new Supplementary Materials Section 5) and the pulsed case (new Supplementary Fig. 8).

      Recommendations for the Authors:

      (1) The introduction emphasized the utility of entrainment as a means to achieve population-wide synchrony. It is worth mentioning also that it enables synchronization of the internal oscillator with an external zeitgeber, to achieve a specific phase-locking between them. Often, this is the main utility attributed to entrainment, e.g., in circadian clocks.

      Following Reviewer’s suggestion we now say in the introduction:

      These oscillations maintain a constant phase relation to the external light cue that can act as a zeitgeber.

      (2) It is sometimes unclear at first glance which of the figure panels show simulation data and which show experimental data (e.g., Figure 5a,b; Figure 6a,b). More explicitly labeling the panels could help.

      We thank the reviewer for pointing this out, we now explicitly label all the panels.

      (3) Figure 3b - please add a color bar to indicate the meaning of the red-green scale, and enlarge the markers so their color is more visible. Also, can add additional controls of (i) sfGFP expression without the ccaR, and (ii) the autofluorescent signal from wild type. Please also provide the raw data (not the time derivatives) in a supplementary figure.

      A colorbar has been added and markers enlarged.

      (i) Unfortunately we do not have a control for GFP expression without ccaR.

      (ii) autofluorescence signal from “a negative control consisting of DHL708 with plasmids pNO286-3 and pSR58-0 (optogenetic plasmids without sfGFP cassette)” has been added for comparison to Fig.3b. This modification was actually very helpful in understanding that the sensitivity threshold in our experiments is mainly determined by autofluorescence. OD600 and fluorescence raw data are now provided in Supplementary Fig. 6.

      (4) Figure 3d - the claim in the text is that the purple optorepressilator and the wildtype repressilator have identical periods and amplitude. However, it seems from the figure that there is a small difference in the period length. This deviation is not problematic in any way, but I wondered whether it might actually be explained by the model, assuming that there is still a very weak leak from the new construct. In other words, would the model predict a bifurcation diagram in which an increasing x' concentration causes a gradual decrease in amplitude and increase in period, before the loss of rhythmicity? If so, Figure 3d can serve not only as a technical optimization demonstration but also as a nice validation of the model.

      We thank the reviewer for raising this interesting point. We now report, in Supplementary Materials Section 5, a theoretical prediction of the period with respect to a constant concentration of x'. For our choice of parameters (adjusted to reproduce the main experimental quantitative features) we find a period that decreases with x'. Leakage would therefore lead to a shorter period, contrary to what is observed experimentally. To explain the longer period observed in the optorepressilator we went back to extract the average growth rates of bacteria in the purple optorepressilator and repressilator curves in Fig.3d. As we now discuss in the main text:

      “The slight difference in period can be explained by the presence of additional plasmids in the optorepressilator strain, which results in a lower growth rate (Supplementary Figures 4 and 5). As found in the digital approximation, the repressilator period is mainly controlled by the inverse growth rate (see Figure 1a and Supplementary Figure 9) meaning a lower growth rate results in a longer oscillation period. When we normalize the time with the growth rate the two oscillations overlap nicely (Supplementary Figure 4).”

      (5) Supplementary Figure 10 has no reference from the main text. it is unclear what's the difference from Figure 3. In general, many items in the supplementary materials are not referenced from the text. In addition, on many occasions, there is a reference to "supplementary information" without a specific address, which is not so useful to the reader. In any case possible, please be more specific. Also, note that there's inconsistency in referring to the supplemental section as "supplementary materials" vs "supplemental information".

      We now explicitly reference all Supplementary Figures in the main text and use consistent reference to Supplementary Materials.

      (6) The discussion at the bottom of p.7 ("Optogenetic entrainment") is missing a reference to the duration and intensity of the zeitgeber: In the example from human circadian rhythms it doesn't indicate light intensity; In the modeling of the PRC, both modalities are absent. it is important at least to indicate the parameters used for the simulation and experiments. It would be even better to explore in the model how these modalities affect the PRC and entrainment. And it would be incredible if the authors could show this also experimentally.

      We now report the light intensity values for:

      - our experiments:

      “We first demonstrate this by monitoring the population signal from CFP (reporting TetR or 𝑦 in the model) in multiwell cultures under constant red illumination (9.82 W/m^2) interrupted by green light pulses (5.64 W/m^2) with a duration of 2 h and period 𝑇 = 18 h.”

      For mother machine experiments “Green and red light stimuli were provided by the two LEDs (Thorlabs M530L4, Thorlabs M660L4) with respective intensities 6 W/m^2 and 26 W/m^2 for the synchronization experiments, and 1.1 W/m^2 and 4.5 W/m^2 for the entrainment experiments.”

      - and simulations:

      “In Fig. 5a we report the phase shift produced by a single pulse (with duration tau=2 h and intensity beta’=80 h-1 fixed for all the simulations) as a function of the pulse arrival phase ϕ.”

      We also added an additional supplementary figure (Supplementary Fig. 7) that explores how the duration and intensity of the light pulses affect the PRC in the model. An approximate analytic result is also derived for the PRC in the digital approximation that compares very well with simulation, providing physical insight into PRC shape (Supplementary Materials Section 7).

      (7) The experimental validation of the PRC can be much more thorough. Notably, an entrainment experiment with repeated pulses does not provide the same level of validation as a proper PRC experiment. This is because many differently shaped PRCs can give rise to the same entrainment pattern, as long as their fixed-point phases are the same.

      Luckily, there might already be a decent amount of data from the mother machine experiments to fit with the PRC prediction, given the authors have pulsed a non-synchronized population that spans the entire x-axis of the PRC. It is possible that a proper PRC experiment wouldn't be too difficult with the plate reader either, given the throughput of the author's system.

      This is a very interesting suggestion but unfortunately, in our mother machine data, the first pulse arrives before the cells have completed a full cycle, so although different cells receive the first pulse at a sufficiently randomized phase, we can’t extract their individual phases at the pulse arrival time.

      Indeed it would be possible to design a plate reader experiment for the specific purpose of directly measuring the PRC. However, our current protocol involves continuous manual dilutions, which makes it rather laborious. We are currently working on an automated procedure that will allow us to systematically address this and other interesting suggestions in the future.

      An indirect experimental validation of the PRC is however still possible using available data. See added red points in Fig.5a and reply to point 10 below.

      (8) The discrepancy between the mother machine and plate reader experiments in Figure 5 is explained by a difference in growth rate variability in the two systems. It is not readily obvious how a difference in variability rather than the mean value of the period length can cause a shifted mean phase. It is only hinted in the text that growth rate has two different effects - on the period as well as the amplitude. I hypothesize that because of this period and amplitude correlation, there is a bias contribution to the sum of trajectories that have resulted in a shifted mean phase. Maybe there is another contribution from the asymmetric waveform of the signal? or from the distribution the alpha is sampled from? A direct discussion on that point will make the results much clearer. If the period-amplitude speculation above is right, please add also a panel that shows it. It will also be helpful to show the predicted PRC for the two parameter regimens.

      We thank the reviewer for highlighting this point. In the previous version of the manuscript we omitted the fact that in order to better match experimental signals we chose slightly different values for T_L/T_0 for simulations in Fig. 5d and 5e. We now report the values of all simulation parameters in the revised manuscript. This difference could also contribute to the shift in the mean phase for the two cases. We added this information in the main text.

      “The bottom panel in Fig. 5d shows the result of a numerical simulation with the same parameters as in Fig. 1b and the addition of a periodic light stimulation, with period $T_L/T_0 = 1$} [...] For the simulations in the lower panel of Fig.5e, all parameters remained the same as in Fig.5d with the exception of the period of the light pulses (T_L/T_0 = 0.97) and the standard deviation of the growth rate distribution, which was increased from 0.034 h^-1 to 0.071 h^-1 to better reproduce the experimental observations in the mother machine.”

      Additionally, we added a supplementary figure (Supplementary Fig. 9) demonstrating the correlation between period and amplitude of the oscillations, for simulations with varying growth rate.

      (9) The results from the detuning experiments are really nice, especially the decomposition in high frequency shown in Figure 6c. However, the experiments explore only the very high forcing amplitude conditions. Is there any way to test the weaker forcing regimens, as these are expected to uncover the interesting areas in between the Arnold's tongues? If this is experimentally difficult, it would be interesting to include at least the model prediction.

      We thank the reviewer for stimulating us to go in this direction. We have performed simulations to explore model predictions for areas between the Arnold’s tongues. We find onset of entrainment as the amplitude increases and also the existence of intermediate plateaus at fractional frequency ratios. These results are now included in the Supplementary Fig. 8.

      (10) Another prediction from the Arnold's tongue would be the relative phase of entrainment in different f/v0 conditions. The text refers to it very briefly, but this is a quantitative prediction that can be demonstrated clearly in a figure - how well do they match? It can be shown, for example, by a plot with f/v0 on the x-axis, the phase difference between the pulse and peak expression on the y-axis, a curve representing the model prediction for that function, and dots (with error bars) representing the calculated values from the experimental data.

      Generally, when suitable, this kind of direct comparison is more useful to the reader than the way the authors chose to compare simulation and experiments throughout the manuscript.

      We thank the reviewer for this very interesting suggestion. We have completely rewritten the discussion on entrainment commenting on how the same PRC (phase shift vs pulse arrival phase) can be interpreted as a T_L/T_0-1 vs phase difference plot. Indeed in the new Fig.5a we plot over the theoretical PRC curve, the values of the relative phase of entrainment for three values of the period of the light pulses (from the data in Fig. 6b). The agreement is remarkably good, providing a further experimental validation of the predicted PRC.

      (11) The raw data can be valuable for the community for reanalysis and further hypothesis testing. Hence, it will be very useful to make all of the data (e.g., the fluorescence signal quantification tables from all the experiments) publicly available.

      We prepared files with all raw data, to be made available to the community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Reviews):

      Summary: 

      The authors use a combination of biochemistry and cryo-EM studies to explore a complex between the cap-binding complex and an RNA binding protein, ALYREF, that coordinates mRNA processing and export.

      Strengths: 

      The biochemistry and structural biology are supported by mutagenesis which tests the model in vitro. The structure provides new insight into how key events in RNA processing and export are likely to be coordinated.

      Weaknesses: 

      The authors provide biochemical studies to confirm the interactions that they identify; however, they do not perform any studies to test these models in cells or explore the consequences of mRNA export from the nucleus. In fact, several of the amino acids that they identified in ALYREF that are critical for the interaction, as determined by their own biochemical studies, are conserved in budding yeast Yra1 (residues E124/E128 are E/Q in budding yeast and residues Y135/V138/P139 are F/S/P), where the impact on poly(A) RNA export from the nucleus could be readily evaluated. The authors could at least mention this point as part of the implications and the need for future studies. No one seems to have yet targeted any of these conserved residues, so this would be a logical extension of the current work.

      We thank the reviewer for the feedback on our work. ALYREF coordinates pre-mRNA processing and export through interactions with a plethora of mRNA biogenesis factors including the DDX39B subunit of the TREX complex, CBC, EJC, and 3’ processing factors. ALYREF mediates the recruitment of the TREX complex on nascent transcripts which depends on its interactions with both CBC and EJC. Our work and studies by others indicate that ALYREF uses overlapping interfaces including both the N-terminal WxHD motif and the RRM domain to bind CBC and EJC. Thus, ALYREF mutants deficient in CBC interaction will also disrupt the ALYREF-EJC interaction and are not ideal for functional studies. In addition, the CBC plays important roles in multiple steps of mRNA metabolism through interactions with a plethora of factors, which often interact competitively with CBC. Identification of separation-of-function mutations on CBC or ALYREF that specifically disrupt their interaction but not other cellular complexes containing CBC or ALYREF would be an important future area to test the model in cells. 

      We appreciate the reviewer’s insightful comments regarding yeast Yra1. Thus far, the physical and functional connection between Yra1 and CBC in yeast has not been demonstrated. There are major differences between yeast Yra1 and human ALYREF. Given the lack of an EJC in S. cerevisiae, it is unclear whether Yra1 acts in a similar manner as human ALYREF. In addition, Yra1 does not contain a WxHD motif in its N-terminal unstructured region, which is involved in CBC and EJC interactions in ALYREF. Characterization of the Yra1-CBC interaction will be an interesting future direction. We now include a discussion about yeast Yra1 in the newly added “Conclusion and perspectives” section. 

      Specific suggestions:

      The authors could put their work in context by speculating how some of the amino acids that they identify as being critical for the interactions they identify could contribute to cancer. For example, they mention mutations of interacting residues in NCBP2 are associated with human cancers, pointing out that NCBP2 R105C amino acid substitution has been reported in colorectal cancer and the NCBP2 I110M mutation has been found in head and neck cancer. Do the authors speculate that these changes would decrease the interaction between NCBP2 and ALYREF and, if so, how would this contribute to cancer? They also mention that a K330N mutation in NCBP1 in human uterine corpus endometrial carcinoma, where Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. How do they speculate loss of this interaction would contribute to cancer?

      In the revised manuscript, we include a discussion about these CBC mutants found in human cancers in the “Conclusion and perspectives” section. We think some of these CBC mutants, such as NCBP-1 K330N, could reduce interaction with ALYREF. Compromised CBC-ALYREF interaction will affect the recruitment of the TREX complex on nascent transcripts and cause dysregulation of mRNA export. In addition, that could also change the partition of CBC and ALYREF in different cellular complexes and cause perturbation of various steps in mRNA biogenesis that are regulated by CBC and ALYREF. Thus far, it is unclear whether and how loss of the CBC-ALYREF interaction directly contributes to cancer. Our work and that of others provide molecular insights to test in future studies. 

      Reviewer #2 (Public Reviews):

      Summary: 

      In this manuscript, Bradley and his colleagues represented the cryo-EM structure of the nuclear cap-binding complex (CBC) in complex with an mRNA export factor, ALYREF, providing a structural basis for understanding CBC regulating gene expression.

      Strengths: 

      The authors successfully modeled the N-terminal region and the RRM domain of ALYREF (residues 1-183) within the CBC-ALYREF structure, which revealed that both the NCBP1 and NCBP2 subunits of the CBC interact with the RBM domain of ALYREF. Further mutagenesis and pull-down studies provided additional evidence to the observed CBC-ALYREF interface. Additionally, the authors engaged in a comprehensive discussion regarding other cellular complexes containing CBC and/or ALYREF components. They proposed potential models that elucidated coordinated events during mRNA maturation. This study provided good evidence to show how CBC effectively recruits mRNA export factor machinery, enhancing our understanding of CBC regulating gene expression during mRNA transcription, splicing, and export. 

      Weaknesses: 

      No in vivo or in vitro functional data to validate and support the structural observations and the proposed models in this study. Cryo-EM data processing and structural representation need to be strengthened. 

      We appreciate the reviewer’s comments and suggestions. The fact that ALYREF uses highly overlapped binding interfaces for CBC and EJC interactions prevents us from a clear functional dissection of the ALYREF-CBC interaction using in vitro assays or in cells at the current stage. Please also see our response to Reviewer 1. 

      In this revised manuscript, we have reprocessed the cryo-EM data using a different strategy which yields significantly improved maps. We have made improvements to the presentation of the structural work based on the reviewer’s specific comments. 

      Reviewer #3 (Public Reviews):

      Summary: 

      The authors carried out structural and biochemical studies to investigate the multiple functions of CBC and ALYREF in RNA metabolism.

      Strengths: 

      For the structural study part, the authors successfully revealed how NCBP1 and NCBP2 subunits interact with mALYREF (residues 1-155). Their binding interface was then confirmed by biochemical assays (mutagenesis and pull-down assays) presented in this study. 

      Weaknesses: 

      The authors did not provide functional data to support their proposed models. The authors should include more details regarding the workflow of their cryo-EM data processing in the figure. 

      We thank the reviewer for the comments. We completely agree that testing the proposed models in cells would be ideal. However, as we also respond to the other reviewers, functional studies are premature at the current stage because both ALYREF and CBC are components of many cellular complexes that regulate mRNA metabolism. Separation-of-function mutations on CBC or ALYREF first need to be identified in future studies for further investigation. Please also see our response to Reviewer 1. 

      As suggested by the reviewer, we have included more details of the cryo-EM workflow in this revised manuscript. We have also included various validation measures including 3DFSC analyses, map vs model FSC curves, and representative density maps at various protein-protein binding interfaces. 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Major points:

      The authors should take advantage of Figure 1, which shows the domain structures of NCBP1, NCBP2, and ALYREF to indicate for the reader specifically which protein domains are included in the biochemical and structural analyses. In the current version of the manuscript, there is plenty of space to indicate below each domain structure precisely what regions are included.

      In this revised manuscript, we have revised Figure 1A to indicate the protein constructs used in this work. 

      Although it is fine to combine the Results and Discussion, the authors should really offer a concluding paragraph to highlight the novel results from this study and put the results in context.

      We thank the reviewer for the recommendation. We now include a “Conclusion and perspectives” section in this revised manuscript.  

      Minor comments:

      Page 5, last sentence (and others) starts a sentence with the word "Since" when likely "As" which does not imply a time element to the phrase, is the correct word.

      "Since the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      Would be more correct as: "As the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      We thank the reviewer for the comments. We have made the corrections. 

      The word 'data' is plural so the sentence at the bottom of p.9 that includes the phrase "...in vivo data shows.." should read "..in vivo data show.." 

      Corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the Authors):

      Major points:

      (1) The authors claimed the improved solubility of mouse ALYREF2 (mALYREF2, residues 1-155) compared to the previously employed ALYREF construct. However, human ALYREF has already been purified successfully for pull down assay, indicating soluble human ALYREF obtained, why not use human ALYREF directly? Please clarify. 

      Pull-down studies were performed with GST-tagged ALYREF. For cryo-EM studies, untagged ALYREF is preferred to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not amenable for structural studies. We have revised the text to clarify this point. 

      (2) The authors confirmed CBC-ALYREF interfaces through mutagenesis and pull-down assays in vitro. However, it would be more informative and interesting to include functional assays in vitro or/and in vivo with mutagenesis. 

      We completely concur with the reviewer that testing the proposed models in vitro and in vivo would be important. However, as we pointed out in our response to public reviews, the highly overlapped binding interfaces on ALYREF for CBC and EJC interactions pose a great challenge for functional studies. Furthermore, both ALYREF and CBC are multifunctional factors and interact with a number of partners. Ideally, separation-of-function mutants that specifically disrupt the CBC-ALYREF interaction but not others need to be identified in future studies in order to perform functional studies. 

      (3) About cryo-EM data processing and structural representation:

      (1) In the description of the cryo-EM data processing, the authors claimed they did heterogeneous refinement, homogenous refinement, and then local refinement. This reviewer is puzzled by this process because the normal procedure should be non-uniform refinement following homogenous refinement. If the authors did not perform non-uniform refinement, they should do it because it would significantly improve the quality and resolution of cryo-EM maps. In addition, the right local refinement should include mask files and only show the density/map of the local region. 

      We thank the reviewer for the suggestions. In response to the reviewer’s comment on the preferred orientation issue (point 5 below), we reprocessed the cryo-EM data and obtained significantly improved cryo-EM maps. In this revised manuscript, the CBC-mALYREF map was refined using homogeneous refinement; the CBC map was refined using homogenous refinement followed by non-uniform refinement. Refinement masks are included in Figure 2-figure supplement1. 

      (2) Further local refinements with signal subtraction should be performed to improve the density and resolution of mALYREF2. 

      We tested local refinement with or without signal subtraction using masks covering mALYREF2 and various regions of CBC. Unfortunately, this approach did not improve the density of mALYREF2. We suspect that the small size of mALYREF2 (77 residues for the RRM domain) and the intrinsic flexibility of CBC are the limiting factors in these attempts. 

      (3) Figures with cryoEM map showing the side chains of the residues on the CBC-mALYREF2 interface should be included to strengthen the claims. Authors could add the map to Figure 3b/c or present it as a supplementary figure.

      We include new supplementary figures (Figure 3-figure supplement 1) to show the electron densities corresponding to the views in Figure 3B and 3C. Residues labeled in Figure 3B and 3C are shown in sticks in these supplementary figures.

      (4) For cryo-EM date processing, the authors have omitted lots of important details. Could the authors elaborate on the data processing with more details in the corresponding Figure and Methods Sections? Only one abi-initial model from the picked good particles was displayed in the figure. Are there any other different conformations of 3D classes for the dataset? In addition, too few classes have been considered in 3D classification, more classes may give a class with better density and resolution.

      We thank the reviewer for the comments. We have reprocessed the cryo-EM data. A major change is to use Topaz for particle picking. We now include more details for data processing in Figure 2-figure supplement 1 and the method section. The cryo-EM sample is relatively uniform. Ab-initio reconstruction and heterogenous refinement yielded only one good class and the other classes are “junk” classes (omitted in the workflow figure). No major conformational changes were observed throughout the multiple rounds of heterogenous refinement for both CBC and CBCmALYREF2. In this revised manuscript, we have been able to obtain significantly improved maps through the new data processing strategy employing Topaz as illustrated in Figure 2-figure supplement 1 to 5.

      (5) Angular distribution plots should be included to show if there is a preferred orientation issue. Based on the presented maps in validation reports, there may exist a preferred orientation issue for the reported two cryo-EM maps. Detailed 3D-Histogram and directional FSC plots for all the cryo-EM maps using 3DFSC web server should be presented to show the overall qualities (https://www.nature.com/articles/nmeth.4347 and https://3dfsc.salk.edu/).

      We thank the reviewer for the recommendations. In response to the reviewer’s comment on the preferred orientation issue, we reprocessed the cryo-EM data. Topaz was used for particle picking instead of template picking. 3DFSC analyses indicate that the new CBC-mALREF2 map has a sphericity of 0.946, which is a significant improvement from the previous map which has a sphericity of 0.815. Consistently, the maps presented in this revised manuscript show significantly improved densities. We now include angular distribution and 3DFSC analyses of the EM maps (Figure 2-figure supplement 2 and 4). 

      (6) Figures of model-to-map FSCs need to be present to demonstrate the quality of the models and the corresponding ones (model resolution when FSC=0.5) should also be included in Table 1. The accuracy of the model is important for structural explanations and description.

      The model-to-map FSCs are now included in Figure 2-figure supplement 3A and 5A. The model resolutions of CBC-mALYREF2 and CBC are estimated to be 3.5 Å and 3.6 Å at an FSC of 0.5. These numbers are now included in Table 1. 

      (7) In addition, figures of local density maps with different regions of the models, showing side chains, are necessary and important to justify the claimed resolutions. 

      We now include density maps overlayed with residue side chains at various regions. For the CBCmALYREF2 map, density maps are shown at the mALYREF2-NCBP1 interfaces (Figure 3-figure supplement 1A and 1B), mALYREF2-NCBP2 interface (Figure 3-figure supplement 1C), NCBP1NCPB2 interface (Figure 2-figure supplement 5B), and the region near m7G (Figure 2-figure supplement 5C). For the CBC map, density maps are shown at the NCBP1-NCPB2 interface (Figure 2-figure supplement 3B) and the region near m7G (Figure 2-figure supplement 3C). 

      Minor points:

      (1) A figure superimposing the models from the CBC-mALYREF2 amp and mALYREF2 alone map is necessary to present that there are no other CBC binding-induced conformational changes in CBC except the claimed by the authors. In addition, a figure showing the density of m7GpppG should be included as well.  

      Overlay of CBC and CBC-mALYREF2 models is now presented in Figure 2-figure supplement 3D. Comparing CBC and CBC-mALYREF2, NCBP1 and NCBP2 have a RMSD of 0.32 Å and 0.30 Å, respectively. The density maps near the M7G cap analog are shown in Figure 2-figure supplement 3C for CBC and Figure 2-figure supplement 5C for CBC-mALYREF2. 

      (2) Authors obtained the two maps from one dataset, so "we first determined" and "we next determined" (page 6) should be replaced with something like "One class of 3D cryo-EM map revealed' and "Another class of 3D cryo-EM map defined". 

      We have revised the text as suggested by the reviewer.  

      (3) In 'Abstract', 'a mRNA export factor' should be 'an mRNA export factor'. 

      Corrected in the revised manuscript.

      (4) In 'Abstract', the final sentence 'Comparison of CBC- ALYREF to other CBC and ALYREF containing cellular complexes provides insights into the coordinated events during mRNA transcription, splicing, and export' doesn't read smoothly, I would suggest revising it to 'Comparing CBC-ALYREF with other cellular complexes containing CBC and/or ALYREF components provides insight into the coordinated events during mRNA transcription, splicing, and export.' 

      We thank the reviewer for the recommendation and have revised accordingly. 

      (5) In paragraph 'CBC-ALYREF and viral hijacking of host mRNA export pathway', line 6, the sentences preceding and following the term 'However' indicate a progressive or parallel relationship, rather than a transitional one. To enhance the coherence, I would suggest replacing 'However' with 'Furthermore' or 'In addition'. 

      Corrected in the revised manuscript.

      (6) In both Figure 5 and Figure 6, the depicted models are proposed and constructed exclusively through the comparison of the CBC-partial ALYREF with other cellular complexes containing components of CBC and/or ALYREF, which need to be confirmed by more studies. To prevent potential confusion and misunderstandings, it is recommended to replace the term 'model' with 'proposed model'. 

      Corrected in the revised manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Major points:

      (1) In the Results and Discussion section, the authors mentioned "Recombinant human ALYREF protein was shown to interact with the CBC in RNase-treated nuclear extracts." However, they used mouse ALYREF for cryo-EM investigations. Can the authors include an explanation for this choice during the revision?  

      In our work, we used a mixture of glutamic acid and arginine to increase the solubility of GSTALYREF. For cryo-EM studies, we use untagged ALYREF to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not suitable for structural studies in standard buffers. We have made further clarification on this point in this revised manuscript. 

      (2) In the paragraph on "CBC-ALYREF interfaces", the authors stated "For example, E97 forms salt bridges with K330 and K381 of NCBP1. Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. The importance of this interface between ALYREF and NCBP1 is highlighted by a K330N mutation found in human uterine corpus endometrial carcinoma." I fail to see a strong connection between their structural observations and previous findings regarding the role of a K330N mutation found in human uterine corpus endometrial carcinoma. The authors should add more words to thread these two parts.  

      In response to the reviewer’s comment, we now move the discussion of these CBC mutants to the newly added “Conclusion and perspectives” section. 

      (3) The authors should include side chains of the residues in their figure of Local resolution estimation and FSC curves, especially when they are presenting the binding interface between two components. 

      We have now included density maps that are overlayed with structural models showing side chains of critical residues. These maps include the NCBP1-mALYREF2 interfaces (Figure 3-figure supplement 1A and 1B), NCBP2-mALYREF2 interface (Figure 3-figure supplement 1C), NCBP1NCBP2 interface (Figure 2-figure supplement 3B and 5B), and the m7G cap region (Figure 2figure supplement 3C and 5C). 

      Minor points: 

      (1) Some grammatical mistakes need to be corrected. For example, it is "an mRNA" instead of "a mRNA".  

      Corrected in the revised manuscript.

      (2) The authors can provide more information for the audience to know better about ALYREF when it first appears in the 5th line in the Abstract section. For example, "It promotes mRNA export through direct interaction with ALYREF, a key mRNA export factor, ...". 

      We have revised the sentence based on the reviewer’s comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some of the data is problematic and does not always support the authors' conclusions:

      (1) Fig. 1K and H are identical.

      Thank you for pointing out this problem in manuscript. We apologize for this unintentional mistake and have replaced Fig. 1K.

      (2) The graph in Figure 2B contradicts the text. It is not obvious how the image was quantified to produce the histological score graph..

      We thank the reviewer for pointing out this problem in manuscript, as the reviewer suggested, we have replaced the Figure 2B.

      (3) In Figures 2C and D, there is no clear pattern of changes in pro-inflammatory or anti-inflammatory cytokines, despite the authors' claims in the text.

      We appreciate the comment, we think the reason is that the level of cytokines in the tissue is low, so the pattern of changes is not obvious.

      (4) It is unclear why the anti-dsDNA antibody does not stain the nucleus in Figure 4B. The staining with anti-dsDNA and DAPI does not match well. Figure 5H shows there is still lots of cytosolic DNA in OGT-/- HCF-1-C, measured by DAPI. These data do not support the authors' conclusion that HCFC600 eliminates cytosolic DNA accumulation (line 229). There is no support for the authors' claim that HCF-1 restrains the cGAS-STING pathway (line 330).

      We thank these insightful comments, the most critical step in staining cytosolic DNA is to proceed to a low-permeabilization as to allow the antibody to cross the cellular membrane but not the nuclear membrane, that’s why the anti-dsDNA antibody does not stain the nucleus. In Figure 5H, we think we used a high concentrated DAPI to do the staining and nucleus DNA get stained, looks like it’s the cytosolic DNA. 

      (5) In Figure 5B, there is no increase in HCF-1 cleavage after OGT over-expression.

      We appreciate the reviewer for his/her comment, we think the reason is that we used the cell line to stably overexpress OGT-GFP and we may have missed the time point when the increase in HCF-1 cleavage occurred, so there is no big increase of it. However, there is a significant increase in Figure 5C.

      (6) In Figure 7, the TNF-a staining does not inspire confidence.

      We thank the reviewer for his/her comment, from both Figure 7K (MC38 tumor model) and Figure 7N (LLC tumor model), we observed a significant increase in TNF-α+ CD8+ T cells in the group treated with the combination of OSMI-1 and anti-PD-L1 compared to the control group, as evidenced by the clear clustering.

      The writing needs significant improvement:

      (1) There are multiple English grammar mistakes throughout the paper. It is recommended that the authors run the manuscript through an editing service.

      We thank the reviewer for his/her suggestion. We apologize for the poor language of our manuscript. We worked on the manuscript for a long time and the repeated addition and removal of sentences and sections obviously led to poor readability. We have now worked on both language and readability and have also involved native English speakers for language corrections. We really hope that the flow and language level have been substantially improved.

      (2) Some passages are misleading -- lines 161-162, line 217, lines 241-242, 263-264, 299-300. They need to be changed substantially.

      We apologize for these mistakes, we have changed them.

      (3) Figure legends should be rewritten. Currently, they are too abbreviated to be understood.

      We apologize for that, we have rewritten them.

      (4) Discussion should also be thoroughly reworked. Currently, it is merely restating the authors' findings. The authors should put their findings in the broader context of the field.

      We apologize for that. For a better understanding of our study, we have reworked the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) Previous studies (DOI: 10.1093/nar/gkw663, 10.1016/j.jgg.2015.07.002, 10.1016/j.dnarep.2022.103394) have suggested that OGT deficiency triggers DNA damage, connecting it to DNA repair and maintenance through various mechanisms. This should be acknowledged in the manuscript. Conversely, the role of HCF1 and its cleaved products in maintaining genomic integrity hasn't been previously shown. The authors investigate HCF1's role solely in the context of OGT inhibition. It is unclear whether this is also true under other stimuli that trigger DNA damage, whether fragments of HCF1 specifically reduce DNA damage, or if HSF1 is involved in the basal machinery that would be defective only in the absence of OGT.

      We have acknowledged the manuscript mentioned above. In this paper we focused on the OGT function, which is related to HCF1. The role of HCF1 and its cleaved products in maintaining genomic integrity is an interesting topic, we may focus on it in next project.

      (2) In villin-CRE-deficient mice, the authors observe generic inflammation in the intestine unrelated to tumor development. It's unclear if this also occurs in the presence of OGT inhibitors in mice, whether these inhibitors induce a systemic inflammatory (Type I interferon) response, or if certain tissues like the intestine or proliferating tumor cells are more susceptible to such a response.

      We thank the comment, yes, investigating whether OGT inhibitors induce an inflammatory response, either systemically or tissue-specifically, is a very interesting project to focus on. However, in our current paper, we use a genetic method to identify the role of OGT deficiency in intestine inflammation-induced tumor development. This approach provides convincing evidence for our hypothesis. We may test the effect of OGT inhibitors on inflammation and tumor development in our next project.

      (3) Another critical observation is the magnitude of the interferon response triggered by DNA damage in the OGT-deficient models. While it's known that DNA damage can activate cGAS-STING, the response's extent in the absence of OGT prompts the question of whether additional OGT-specific features could explain this phenomenon. For example, Lamin A, essential for nuclear envelope integrity and shown to be O-glycosylated (DOI: 10.3390/cells7050044), and other components of the nuclear envelope or its repair might be affected by OGT. The impact of OGT inhibition on nuclear envelope integrity compared to other DNA-damaging agents could be explored.

      We appreciate the comment, in this project, we find an OGT binding protein, HCF1, though LC–MS/MS assay, it’s a top one candidate in binding profiles, so we focus on it. Like Lamin A and other components of the nuclear envelope still are good targets to check, we may explore these in our next project.

      (4) The authors also demonstrate a correlation between OGT expression in tumors compared to healthy tissues. However, the reason is unclear, raising questions about whether this is a consequence of proliferation or metabolic deregulation in the cancer. The authors should address this aspect.

      We appreciate the reviewer’s insightful point. It is very good questions and very interesting research. However, in this paper we focused on how OGT influence its downstream molecules to promote tumor, we didn’t check why OGT is increased in tumors, it is not the scope of this current work, we would love to investigate it in the future.

      Minor points

      Please add the legend to Figures S2, S3 and S5.

      We thank the comment, we have added the legend to Figures S2, S3 and S5.

      The sentence line 137 should be clarified as OGT deficiency seems more related to increased inflammation in this model.

      We thank the comment, we have corrected the sentence line 137.

      Line 732 has a ( typo before the number 34.

      We thank the comment, we have corrected the sentence line 732.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important study, the authors manually assessed randomly selected images published in eLife between 2012 and 2020 to determine whether they were accessible for readers with deuteranopia, the most common form of color vision deficiency. They then developed an automated tool designed to classify figures and images as either "friendly" or "unfriendly" for people with deuteranopia. While such a tool could be used by publishers, editors or researchers to monitor accessibility in the research literature, the evidence supporting the tools' utility was incomplete. The tool would benefit from training on an expanded dataset that includes different image and figure types from many journals, and using more rigorous approaches when training the tool and assessing performance. The authors also provide code that readers can download and run to test their own images. This may be of most use for testing the tool, as there are already several free, user-friendly recoloring programs that allow users to see how images would look to a person with different forms of color vision deficiency. Automated classifications are of most use for assessing many images, when the user does not have the time or resources to assess each image individually.

      Thank you for this assessment. We have responded to the comments and suggestions in detail below. One minor correction to the above statement: the randomly selected images published in eLife were from articles published between 2012 and 2022 (not 2020).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study developed a software application, which aims to identify images as either "friendly" or "unfriendly" for readers with deuteranopia, the most common color-vision deficiency. Using previously published algorithms that recolor images to approximate how they would appear to a deuteranope (someone with deuteranopia), authors first manually assessed a set of images from biology-oriented research articles published in eLife between 2012 and 2022. The researchers identified 636 out of 4964 images as difficult to interpret ("unfriendly") for deuteranopes. They claim that there was a decrease in "unfriendly" images over time and that articles from cell-oriented research fields were most likely to contain "unfriendly" images. The researchers used the manually classified images to develop, train, and validate an automated screening tool. They also created a user-friendly web application of the tool, where users can upload images and be informed about the status of each image as "friendly" or "unfriendly" for deuteranopes.

      Strengths:

      The authors have identified an important accessibility issue in the scientific literature: the use of color combinations that make figures difficult to interpret for people with color-vision deficiency. The metrics proposed and evaluated in the study are a valuable theoretical contribution. The automated screening tool they provide is well-documented, open source, and relatively easy to install and use. It has the potential to provide a useful service to the scientists who want to make their figures more accessible. The data are open and freely accessible, well documented, and a valuable resource for further research. The manuscript is well written, logically structured, and easy to follow.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The authors themselves acknowledge the limitations that arise from the way they defined what constitutes an "unfriendly" image. There is a missed chance here to have engaged deuteranopes as stakeholders earlier in the experimental design. This would have allowed [them] to determine to what extent spatial separation and labelling of problematic color combinations responds to their needs and whether setting the bar at a simulated severity of 80% is inclusive enough. A slightly lowered barrier is still a barrier to accessibility.

      We agree with this point in principle. However, different people experience deuteranopia in different ways, so it would require a large effort to characterize these differences and provide empirical evidence about many individuals' interpretations of problematic images in the "real world." In this study, we aimed to establish a starting point that would emphasize the need for greater accessibility, and we have provided tools to begin accomplishing that. We erred on the side of simulating relatively high severity (but not complete deuteranopia). Thus, our findings and tools should be relevant to some (but not all) people with deuteranopia. Furthermore, as noted in the paper, an advantage of our approach is that "by using simulations, the reviewers were capable of seeing two versions of each image: the original and a simulated version." We believe this step is important in assessing the extent to which deuteranopia could confound image interpretations. Conceivably, this could be done with deuteranopes after recoloration, but it is difficult to know whether deuteranopes would see the recolored images in the same way that non-deuteranopes see the original images. It is also true that images simulating deuteranopia may not perfectly reflect how deuteranopes see those images. It is a tradeoff either way. We have added comments along these lines to the paper.

      (2) The use of images from a single journal strongly limits the generalizability of the empirical findings as well as of the automated screening tool itself. Machine-learning algorithms are highly configurable but also notorious for their lack of transparency and for being easily biased by the training data set. A quick and unsystematic test of the web application shows that the classifier works well for electron microscopy images but fails at recognizing red-green scatter plots and even the classical diagnostic images for color-vision deficiency (Ishihara test images) as "unfriendly". A future iteration of the tool should be trained on a wider variety of images from different journals.

      Thank you for these comments. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central. We used our original model to make predictions for those images. The corresponding results are now included in the paper.

      We agree that many of the images identified as being "unfriendly" are microscope images, which often use red and green dyes. However, many other image types were identified as unfriendly, including heat maps, line charts, maps, three-dimensional structural representations of proteins, photographs, network diagrams, etc. We have uploaded these figures to our Open Science Framework repository so it's easier for readers to review these examples. We have added a comment along these lines to the paper.

      The reviewer mentioned uploading red/green scatter plots and Ishihara test images to our Web application and that it reported they were friendly. Firstly, it depends on the scatter plot. Even though some such plots include green and red, the image's scientific meaning may be clear. Secondly, although the Ishihara images were created as informal tests for humans, these images (and ones similar to them) are not in eLife journal articles (to our knowledge) and thus are not included in our training set. Thus, it is unsurprising that our machine-learning models would not classify such images correctly as unfriendly.

      (3) Focusing the statistical analyses on individual images rather than articles (e.g. in figures 1 and 2) leads to pseudoreplication. Multiple images from the same article should not be treated as statistically independent measures, because they are produced by the same authors. A simple alternative is to instead use articles as the unit of analysis and score an article as "unfriendly" when it contains at least one "unfriendly" image. In addition, collapsing the counts of "unfriendly" images to proportions loses important information about the sample size. For example, the current analysis presented in Fig. 1 gives undue weight to the three images from 2012, two of which came from the same article. If we perform a logistic regression on articles coded as "friendly" and "unfriendly" (rather than the reported linear regression on the proportion of "unfriendly" images), there is still evidence for a decrease in the frequency of "unfriendly" eLife articles over time.

      Thank you for taking the time to provide these careful insights. We have adjusted these statistical analyses to focus on articles rather than individual images. For Figure 1, we treat an article as "Definitely problematic" if any image in the article was categorized as "Definitely problematic." Additionally, we no longer collapse the counts to proportions, and we use logistic regression to summarize the trend over time. The overall conclusions remain the same.

      Another issue concerns the large number of articles (>40%) that are classified as belonging to two subdisciplines, which further compounds the image pseudoreplication. Two alternatives are to either group articles with two subdisciplines into a "multidisciplinary" group or recode them to include both disciplines in the category name.

      Thank you for this insight. We have modified Figure 2 so that it puts all articles that have been assigned two subdisciplines into a "Multidisciplinary" category. The overall conclusions remain the same.

      (4) The low frequency of "unfriendly" images in the data (under 15%) calls for a different performance measure than the AUROC used by the authors. In such imbalanced classification cases the recommended performance measure is precision-recall area under the curve (PR AUC: https://doi.org/10.1371%2Fjournal.pone.0118432) that gives more weight to the classification of the rare class ("unfriendly" images).

      We now calculate the area under the precision-recall curve and provide these numbers (and figures) alongside the AUROC values (and figures). We agree that these numbers are informative; both metrics lead to the same overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      An analysis of images in the biology literature that are problematic for people with a color-vision deficiency (CVD) is presented, along with a machine learning-based model to identify such images and a web application that uses the model to flag problematic images. Their analysis reveals that about 13% of the images could be problematic for people with CVD and that the frequency of such images decreased over time. Their model yields 0.89 AUC score. It is proposed that their approach could help making biology literature accessible to diverse audiences.

      Strengths:

      The manuscript focuses on an important yet mostly overlooked problem, and makes contributions both in expanding our understanding of the extent of the problem and in developing solutions to mitigate the problem. The paper is generally well-written and clearly organized. Their CVD simulation combines five different metrics. The dataset has been assessed by two researchers and is likely to be of high-quality. Machine learning algorithm used (convolutional neural network, CNN) is an appropriate choice for the problem. The evaluation of various hyperparameters for the CNN model is extensive.

      We thank the reviewer for these comments.

      Weaknesses:

      The focus seems to be on one type of CVD (deuteranopia) and it is unclear whether this would generalize to other types.

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies.

      The dataset consists of images from eLife articles. While this is a reasonable starting point, whether this can generalize to other biology/biomedical articles is not assessed.

      This is an important point. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      "Probably problematic" and "probably okay" classes are excluded from the analysis and classification, and the effect of this exclusion is not discussed.

      We now address this in the Discussion section.

      Machine learning aspects can be explained better, in a more standard way.

      Thank you. We address this comment in our responses to your comments below.

      The evaluation metrics used for validating the machine learning models seem lacking (e.g., precision, recall, F1 are not reported).

      We now provide these metrics (in a supplementary file).

      The web application is not discussed in any depth.

      The paper includes a paragraph about how the Web application works and which technologies we used to create it. We are unsure which additional aspects should be addressed.

      Reviewer #3 (Public Review):

      Summary:

      This work focuses on accessibility of scientific images for individuals with color vision deficiencies, particularly deuteranopia. The research involved an analysis of images from eLife published in 2012-2022. The authors manually reviewed nearly 5,000 images, comparing them with simulated versions representing the perspective of individuals with deuteranopia, and also evaluated several methods to automatically detect such images including training a machine-learning algorithm to do so, which performed the best. The authors found that nearly 13% of the images could be challenging for people with deuteranopia to interpret. There was a trend toward a decrease in problematic images over time, which is encouraging.

      Strengths:

      The manuscript is well organized and written. It addresses inclusivity and accessibility in scientific communication, and reinforces that there is a problem and that in part technological solutions have potential to assist with this problem.

      The number of manually assessed images for evaluation and training an algorithm is, to my knowledge, much larger than any existing survey. This is a valuable open source dataset beyond the work herein.

      The sequential steps used to classify articles follow best practices for evaluation and training sets.

      We thank the reviewer for these comments.

      Weaknesses:

      I do not see any major issues with the methods. The authors were transparent with the limitations (the need to rely on simulations instead of what deuteranopes see), only capturing a subset of issues related to color vision deficiency, and the focus on one journal that may not be representative of images in other journals and disciplines.

      We thank the reviewer for these comments. Regarding the last point, we have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Thank you.

      Reviewer #2 (Recommendations For The Authors):

      - The web application link can be provided in the Abstract for more visibility.

      We have added the URL to the Abstract.

      - They focus on deuteranopia in this paper. It seems that protanopia is not considered. Why? What are the challenges in considered this type of CVD?

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies. Deuteranopia is the most common color-vision deficiency, so we focused on the needs of these individuals as a starting point.

      - The dataset is limited to eLife articles. More discussion of this limitation is needed. Couldn't one also include some papers from PMC open access dataset for comparison?

      We have reviewed an additional 2,000 images, which we randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      - An analysis of the effect of selecting a severity value of 0.8 can be included.

      We agree that this would be interesting, but we leave it for future work.

      - "Probably problematic" and "probably okay" classes are excluded from analysis, which may oversimplify the findings and bias the models. It would have been interesting to study these classes as well.

      We agree that this would be interesting, but we leave it for future work. However, we have added a comment to the Discussion on this point.

      - Some machine learning aspects are discussed in a non-standard way. Class weighting or transfer learning would not typically be considered hyperparameters."corpus" is not a model. Description of how fine-tuning was performed could be clearer.

      We have updated this wording to use more appropriate terminology to describe these different "configurations." Additionally, we expanded and clarified our description of fine tuning.

      - Reporting performance on the training set is not very meaningful. Although I understand this is cross-validated, it is unclear what is gained by reporting two results. Maybe there should be more discussion of the difference.

      We used cross validation to compare different machine-learning models and configurations. Providing performance metrics helps to illustrate how we arrived at the final configurations that we used. We have updated the manuscript to clarify this point.

      - True positives, false positives, etc. are described as evaluation metrics. Typically, one would think of these as numbers that are used to calculate evaluation metrics, like precision (PPV), recall (sensitivity), etc. Furthermore, they say they measure precision, recall, precision-recall curves, but I don't see these reported in the manuscript. They should be (especially precision, recall, F1).

      We have clarified this wording in the manuscript.

      - There are many figures in the supplementary material, but not much interpretation/insights provided. What should we learn from these figures?

      We have revised the captions and now provide more explanations about these figures in the manuscript.

      - CVD simulations are mentioned (line 312). It is unclear whether these methods could be used for this work and if so, why they were not used. How do the simulations in this work compare to other simulations?

      This part of the manuscript refers to recolorization techniques, which attempt to make images more friendly to people with color vision deficiencies. For our paper, we used a form of recolorization that simulates how a deuteranope would see a figure in its original form. Therefore, unless we misunderstand the reviewer's question, these two types of simulation have distinct purposes and thus are not comparable.

      - relu -> ReLU

      We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      The title can be more specific to denote that the survey was done in eLife papers in the years 2012-2022. Similarly, this should be clear in the abstract instead of only "images published in biology-oriented research articles".

      Thank you for this suggestion. Because we have expanded this work to include images from PubMed Central papers, we believe the title is acceptable as it stands. We updated the abstract to say, "images published in biology- and medicine-oriented research articles"

      Two mentions of existing work that I did not see are to Jambor and colleagues' assessment on color accessibility in several fields: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/, and whether this work overlaps with the 'JetFighter' tool

      (https://elifesciences.org/labs/c2292989/jetfighter-towards-figure-accuracy-and-accessibility).

      Thank you for bringing these to our attention. We have added a citation to Jambor, et al.

      We also mention JetFighter and describe its uses.

      Similarly, on Line 301: Significant prior work has been done to address and improve accessibility for individuals with CVD. This work can be generally categorized into three types of studies: simulation methods, recolorization methods, and estimating the frequency of accessible images.

      - One might mention education as prior work as well, which might in part be contributing to a decrease in problematic images (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/)

      We now suggest that there are four categories and include education as one of these.

      Line 361, when discussing resources to make figures suitable, the authors may consider citing this paper about an R package for single-cell data: https://elifesciences.org/articles/82128

      Thank you. We now cite this paper.

      The web application is a good demonstration of how this can be applied, and all code is open so others can apply the CNN in their own uses cases. Still, by itself, it is tedious to upload individual image files to screen them. Future work can implement this into a workflow more typical to researchers, but I understand that this will take additional resources beyond the scope of this project. The demonstration that these algorithms can be run with minimal resources in the browser with tensorflow.js is novel.

      Thank you.

      General:

      It is encouraging that 'definitely problematic' images have been decreasing over time in eLife. Might this have to do with eLife policies? I could not quickly find if eLife has checks in place for this, but given that JetFighter was developed in association with eLife, I wonder if there is an enhanced awareness of this issue here vs. other journals.

      This is possible. We are not aware of a way to test this formally.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tang et al present an important manuscript focused on endogenous virus-like particles (eVLP) for cancer vaccination with solid in vivo studies. The author designed eVLP with high protein loading and transfection efficiency by PEG10 self-assembling while packaging neoantigens inside for cancer immunotherapy. The eVLP was further modified with CpG-ODN for enhanced dendritic cell targeting. The final vaccine ePAC was proven to elicit strong immune stimulation with increased killing effect against tumor cells in 2 mouse models. Below are my specific comments:

      Thanks very much to comment our work as “important”. We sincerely appreciate the extremely helpful comments from the reviewer to significantly improve the quality of our manuscript.

      (1) The figures were well prepared with minor flaws, such as missed scale bars in Figures 4B, 4K, 5B, and 5C. The author should also add labels representing statistical analysis for Figures 3C, 3D, and 3E. In Figure 6G, the authors should label which cell type is the data for.

      Thanks very much for the very suggestive comments. The scale bars and statistical analysis have been added in Figures 4B, 4K, 5B, 5C, 3C, 3D, and 3E. For Figure 6G, we have added “CD44+ CD62L- in CD8+ T cells” to explain the cell type.

      (2) In Figure 3H, the antigen-presenting cells (APCs) increased significantly, but there was also a non-negligible 10% of APCs found in the control group, indicating some potential unwanted immune response; the authors need to explain this phenomenon or add a cytotoxic test on the normal liver or other cell lines for confirmation.

      Thanks very much for this extremely helpful suggestion. The antigen-presenting cells (APCs) in Figure 3H were isolated from mouse bone marrow and then cultured in vitro for about 5 days with cytokine stimulation (IL-4 and GM-CSF). Due to the stimulation effects of IL-4 and GM-CSF, a small proportion of the APCs (~10%) was tending to mature (co-expressing CD80 and CD86) in the control group, as pointing out by the reviewer. Similarly, in Figure 3I, these 10% activated APCs can activate T cells in vitro and exhibit certain cytotoxicity. Since APCs must be induced and cultured in vitro before using in this experiment, the background cytotoxicity induced by cytokines is unavoidable, and this has been well documented in literatures.

      (3) In Figure 3I, the ePAC seems to have a very similar effect on cytotoxic T-cell tumor killing compared to the peptides + CpG group. If the concentrations were also the same, based on that, questions will arise as to what is the benefit of using the compact vector other than just free peptide and CpG? Please explain and elaborate.

      Thanks very much for the comment. In vitro experiments indeed demonstrated that peptides + CpG had the same T cell activating ability as ePAC, as pointing out by the reviewer. However, due to the instability of peptides and the lack of targeting, the efficiency of activating the immune system for peptides + CpG after subcutaneous injection is significantly lower than that for ePAC in vivo, as shown in Figure 3D and Figure 2A. Then, as expected, the antitumor efficacy induced by peptides alone + CpG is significantly lower than that induced by ePAC in Figure 5. We have provided a detailed description in “Results” section of “Antitumor effect of ePAC in subcutaneous HCC model” as follows: Furthermore, ePAC with the ability to target DCs and increased stability by encapsulating peptides, exhibited significantly higher tumor growth inhibition efficiency (p=0.0002) comparing with the eVLP + CpG-ODN treated group similar to the simple mixture of neoantigen peptides and adjuvant (Figures 5B and 5C). Meanwhile, the Kaplan-Meier analysis of tumor progression free survival (PFS) also clearly demonstrated the therapeutic advantages of our ePAC (p=0.0194, Figure 5B).

      (4) In the animal experiment in Figures 4F to L, the activation effect of APCs was similar between ePAC and CpG-only groups with no significance, but when it comes to the HCC mouse model in Figure 5, the anti-tumor effect was significantly increased between ePAC and CpG-only group. The authors should explain the difference between these two results.

      Thanks very much for the comment. Since PEG10 protein does not have an adjuvant effect, the adjuvant effect of ePAC mainly comes from the modified CpG. Therefore, although ePAC can effectively deliver tumor neoantigens, it does not have a significant advantage over free CpG in activating APCs. However, CpG only possesses the adjuvant effect and does not carry neoantigens. While it can promote the maturation of APCs, it cannot generate neoantigen-specific T cells. Consequently, the antitumor effect of CpG-only is much lower than that of ePAC in Figure 5.

      Reviewer #2 (Public Review):

      Summary:

      The authors provided a novel antigen delivery system that showed remarkable efficacy in transporting antigens to develop cancer therapeutic vaccines.

      Strengths:

      This manuscript was innovative, meaningful, and had a rich amount of data.

      Weaknesses:

      There are still some issues that need to be addressed and clarified.

      Thanks very much to comment our work as “innovative”. We sincerely appreciate the extremely helpful comments from the reviewer to significantly improve the quality of our manuscript, and the listed weaknesses have been all carefully addressed.

      (1) The format of images and data should be unified. Specifically, as follows: a. The presentation of flow cytometry results; b, The color schemes for different groups of column diagrams.

      Thanks very much. Following the reviewer’s comment, we have unified the format of all images and data as suggested.

      (2) The P-value should be provided in Figures, including Figure 1F, 1H, 3C, 3D, and 3E.

      Thanks very much. We have provided the corresponding P-values in Figure 1F, 1H, 3C, 3D, and 3E.

      (3) The quality of Figure 1C was too low to support the conclusion. The author should provide higher-quality images with no obvious background fluorescent signal. Meanwhile, the fluorescent image results of "Egfp+VSVg" group were inconsistent with the flow cytometry data. Additionally, the reviewer recommends that the authors use a confocal microscope to repeat this experiment to obtain a more convincing result.

      Thanks very much for this comment. Following the reviewer’s suggestion, we uniformly adjusted the original images in Figure 1C to reduce background interference and increase its quality. After eliminating background interference, the fluorescence image of the "Egfp+VSVg" group was consistent with the flow cytometry result.

      (4) The survival situation of the mouse should be provided in Figure 5, Figure 6, and Figure 7 to support the superior tumor therapy effect of ePAC.

      Thanks very much for the extremely helpful comment. Following the reviewer’s suggestion, we have added the progression free survival (PFS) of mice in Figure 5 and described this result in the “Results” section of “Antitumor effect of ePAC in subcutaneous HCC model” as follows: Meanwhile, the Kaplan-Meier analysis of tumor progression free survival (PFS) also clearly demonstrated the therapeutic advantages of our ePAC (p=0.0194, Figure 5B). For Figure 6 and Figure 7, to promptly detect the immune changes in the tumor microenvironment after vaccination, we were unable to conduct long-term observations on tumor-bearing mice, and therefore, we did not provide the survival curve. However, we monitored the tumor volume changes in real-time, which also can serve as an important measure for evaluating antitumor efficacy.

      (5) To demonstrate that ePAC could trigger a strong immune response, the positive control group in Figure 4K should be added.

      Thanks very much for this very helpful comment. Following the reviewer’s suggestion, the mouse anti-CD3 antibody was used as the positive control in vitro to activate splenic T cells for ELISPOT assay, and the corresponding results have been added in revised Figure 4K. To address this, we have provided a detailed description in “Figure legends” section of “Figure 4. ePAC delivery and immune activation in vivo” as follows: The mouse anti-CD3 antibody was used to activate splenic T cells in vitro as the positive control for ELISPOT assay.

      (6) In Figure 6G-I and other figures, the author should indicate the time point of detection. Meanwhile, there was no explanation for the different numbers of mice in Figure 6G-I. If the mouse was absent due to death, it may be necessary to advance the detection time to obtain a more convincing result.

      Thanks very much for the comment. The samples for Figure 6 G-I data were collected and analyzed at the day 28 after the start of treatment. Following the reviewer’s suggestion, we have specifically marked the time point of “Sacrifice for sampling” in Figure 6A. And we have provided a detailed description in “Figure legends” section of “Figure 6. Evaluation ePAC antitumor efficacy in orthotopic HCC model by αTIM-3 combination” as follows: The mice were sacrificed and sampled for analysis on the day of 28 after initiating treatment. In addition, in Figure 6G-I we have clearly indicated the sample size for each group. Although three mice in the PBS group died, we still have obtained enough samples for statistical analysis (n>3).

      (7) In Figure 6B, the rainbow color bar with an accurate number of maximum and minimum fluorescence intensity should be provided. In addition, the corresponding fluorescence intensity in Figure 6B should be noted.

      Thanks very much for this very helpful comment. Following the reviewer’s suggestion, we have added the rainbow color bar with an accurate number of maximum and minimum fluorescence intensity, and the statistic results in revised Figure 6B.

      (8) The quality of images in Figure 1D and Figure S1B could not support the author's conclusion; please provide higher-quality images.

      Thanks very much. In Figure 1D and Figure S1B, to ensure the authenticity of the results, we tried our best to improve the quality of the pictures and provided the WB results with the full membrane scan. Although some non-specific bands appeared in the results, the target bands remained prominent. Additionally, we used two tags (HA and eGFP) for verification, which fully guarantees the reliability of our findings.

      (9) In Figure 2F, the bright field in the overlay photo may disturb the observation. Meanwhile, the scale bar should be provided in enlarged images.

      Thanks very much. Following the reviewer’s suggestion, we have deleted the bright field in revised Figure 2F and added the scale bar in the enlarged images.

      Reviewer #3 (Public Review):

      Summary:

      The authors harnessed the potential of mammalian endogenous virus-like proteins to encapsulate virus-like particles (VLPs), enabling the precise delivery of tumor neoantigens. Through meticulous optimization of the VLP component ratios, they achieved remarkable stability and efficiency in delivering these crucial payloads. Moreover, the incorporation of CpG-ODN further heightened the targeted delivery efficiency and immunogenicity of the VLPs, solidifying their role as a potent tumor vaccine. In a diverse array of tumor mouse models, this novel tumor vaccine, termed ePAC, exhibited profound efficacy in activating the murine immune system. This activation manifested through the stimulation of dendritic cells in lymph nodes, the generation of effector memory T cells within the spleen, and the infiltration of neoantigen-specific T cells into tumors, resulting in robust anti-tumor responses.

      Strengths:

      This study delivered tumor neoantigens using VLPs, pioneering a new method for neoantigen delivery. Additionally, the gag protein of VLP is derived from mammalian endogenous virus-like protein, which offers greater safety compared to virus-derived gag proteins, thereby presenting a strong potential for clinical translation. The study also utilized a humanized mouse model to further validate the vaccine's efficacy and safety. Therefore, the anti-tumor vaccine designed in this study possesses both innovation and practicality.

      Thanks very much to comment our work as “novel”, “innovation” and “practicality”. We sincerely appreciate the extremely helpful comments from the reviewer to significantly improve the quality of our manuscript.

      Weaknesses:

      (1) CpG-ODN is an FDA-approved adjuvant with various sequence structures. Why was CpG-ODN 1826 directly chosen in this study instead of other types of CpG-ODN? Additionally, how does DEC-205 recognize CpG-ODN 1826, and can DEC-205 recognize other types of CpG-ODN?

      Thanks very much for the comment. CpG-ODNs are classified into three main types based on their structural composition: A, B, and C. Among them, only the B-class CpG-ODNs 1668, 1826, and 2006 have been directly proven to effectively bind DEC-205 and activate DC cells [1]. Therefore, in this study, B-class CpG-ODN 1826 was chosen as the ligand targeting DEC-205 on the surface of DC cells. DEC-205 primarily binds sequences containing the CpG motif core in a pH-dependent manner, thus theoretically allowing DEC-205 to bind a wide range of CpG-ODNs.

      [1] Lahoud MH et al. DEC-205 is a cell surface receptor for CpG oligonucleotides. PNAS. 2012

      (2) Why was it necessary to treat DCs with virus-like particles three times during the in vitro activation of T cells? Can this in vitro activation method effectively obtain neoantigen-responsive T cells?

      Thanks very much for the comment. DCs need to be pre-stimulated before being used to activate T cells. Although Single DC stimulation can activate T cells, but the activation efficiency is insufficient. Current research suggests that three DC-T interactions can more effectively activate T cells [2]. Therefore, we prepared virus-like particle stimulated DCs for three times to fully activate T cells. Our results in Figures 3I and 7D also demonstrate that three-time stimulations effectively activated antigen-specific T cells, resulting in stronger tumor cell killing effects.

      [2] Ali M et al. Induction of neoantigen-reactive T cells from healthy donors. Nature protocol. 2019.

      (3) In the humanized mouse model, the authors used Hepa1-6 cells to construct the tumor model. To achieve the vaccine's anti-tumor function, these Hepa1-6 cells were additionally engineered to express HLA-A0201. However, in the in vitro experiments, the authors used the HepG2 cell line, which naturally expresses HLA-A0201. Why did the authors not continue to use HepG2 cells to construct the tumor model, instead of Hepa1-6 cells?

      Thanks very much for the comment. HepG2 cells are derived from human liver cancer. When directly implant into immunocompetent mice, they will be cleared by the mouse immune system and will not form tumors. Therefore, we have not continued to use HepG2 cells to construct the tumor model.

      (4) The advantages of low immunogenicity viruses as vaccines compared with conventional adenovirus and lentivirus, etc. should be discussed.

      Thanks very much for the very suggestive comment. In the introduction starting from line 76, we first described the structure and function of lentiviruses and discussed the design and application of virus-like particles (VLPs) based on lentiviruses. To provide a more comprehensive comparison, we included a discussion on VLPs, lentiviruses, and adenoviruses in the discuss section (from line 441 to line 447) as follows: “Furthermore, comparing to the virus-based delivery vectors, the lentiviruses although can stably integrate into the host genome but carry risks of insertional mutagenesis; adenoviruses although have high transduction efficiency but strong immunogenicity, which leads to fast clearance by the immune system of the host and affects the efficiency of the secondary injection. Instead, our VLPs offer low immunogenicity and superior safety, making them more suitable for repeated use and vaccine development.”

      (5) In Figure 6B, the authors should provide statistical results.

      Thanks very much. We have provided the statistical results in revised Figure 6B following the reviewer’s suggestion.

      (6) The entire article demonstrates a clear logical structure and substantial content in its writing. However, there are still some minor errors, such as the misspelling of "Spleenic" in Figure 3B, and the sentence from line 234 should be revised.

      Thanks very much. We have carefully checked and corrected the typos throughout the whole manuscript as much as possible.

      (7) The authors demonstrated the efficiency of CpG-ODN membrane modification by varying the concentration of DBCO, ultimately determining the optimal modification scheme for eVLP as 3.5 nmol of DBCO. However, in Figure 2B, the author did not provide the modification efficiency when the DBCO concentration is lower than 3.5 nmol. These results should be provided.

      Thanks very much for the suggestion. We have repeated the experiment and reduced the concentration of DBCO to 2.1 nmol and 0.7 nmol, respectively. The results showed that in a 200 µl eVLP reaction system, 3.5 nmol DBCO achieved the highest modification efficiency. We have provided a detailed description in “Results” section of “Envelope decoration of neoantigen-loaded eVLP” as follows: Furthermore, by varying the concentration of DBCO-C6-NHS Ester from 0 to 14 nmol, ePAC exhibited different CpG-ODN loading efficiency as evidenced by agarose gel electrophoresis (Figure 2B and Figure S3). And the results showed that in a 200 µl eVLP reaction system, 3.5 nmol DBCO achieved the highest modification efficiency.

      (8) In Figure 3, the authors presented a series of data demonstrating that ePAC can activate mouse DC2.4 cells and BMDCs in vitro. However, in Figure 7, there is no evidence showing whether human DC cells can be activated by ePAC in vitro. This data should be provided.

      Thanks very much for this very helpful suggestion. We used ePAC to activate human DCs and the results indicate that, compared to the blank control group, both eVLP and ePAC increased the co-expression of CD80 and CD86 in DCs, and ePAC was the most efficient. We have provided a detailed description in the “Results” section of “Antitumor effect by HLA-A*0201 restricted vaccine” as follows: After the stimulation, the DCs in ePAC treated group showed the highest level of maturation comparing to the eVLP treated group and control group (Figure S4), by using flow cytometry analysis.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2B and 2D: unlike what is written in the results part, the results are not consistent, but opposite: LSS has higher activity in 2B, less in 2D. 

      The activities in Figure 2B come from NMR kinetic experiments with pGly, whereas Figure 2D reports on activity towards whole S. aureus cells. The LytM and LSS activities in these two experiments are indeed not directly comparable, but served to highlight the fact that simple pentaglycine is a poor model substrate for M23 enzymes. We carried out a turbidity assay with pristine enzymatic preparation and indeed it is highly consistent both with the kinetic assay using pentaglycine (Fig. 2B) as well as with larger PG fragments (Fig. 2K) indicating that the catalytic domain of LSS is significantly more efficient than LytM in hydrolyzing cells from community acquired methicillin resistant S. aureus strain USA300 as well as synthetic PG fragments.  The corresponding paragraph in Results has now been updated and rephrased.

      (2) Figure 2, panel K missing statistical analysis, which makes it difficult to appreciate if the difference is significant. If it is a one-time experiment or a single value, the value should be presented as a table. The corresponding text in the results part is confusing. The fold change or drop in percentage is unclear in the figure. 

      We have added a table (panel L) to Figure 2, which shows absolute values of LSS and LytM hydrolysis rates. Indeed, most of the values are from single NMR kinetic measurements, however, PG fragment (2) for LSS and PG fragment (3) for LytM were measured as duplicates to verify the reproducibility of the data. This is now mentioned in Figure 3 legend and in the Materials and Methods. Also, the corresponding text in the Results has been updated and rephrased.

      (3) Figure 3H: the cleavage of D-ala-gly is unclear, the cleavage products need to be labeled and quantified. The experiment used purified PG treated with mutanolysin. Presumably, mixed monomers, dimers, trimers, and multimers are used. It would be helpful to show the HPLC profile of the purified muropeptide. It would be informative to analyze which fractions generate D-ala-gly. In addition, the intact murein sacculus should be included. 

      For the sake of clarity, we have moved the 13C-HMBC spectra presented in Figure 3H to Fig. S7 in the Supplementary Material. The full carbonyl carbon region of the reference (prior to addition of enzyme) 13C-HMBC spectrum together with larger expansions of spectra acquired from enzyme-treated muropeptides are now shown. Furthermore, graphical presentations of identified PG fragments due to LSS/LytM activity are included. No HPLC analysis of the muropeptides was performed at this stage. Being insoluble, the intact murein sacculus is not amenable to liquid-state NMR studies, but we envisage studies of this remarkably complex structure also with solid-state NMR.

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the experiments address the question asked by the authors and no additional experiments are required to strengthen the conclusions drawn. 

      Abstract: 

      The abstract is not well written and more specific (and accurate) information should be provided by the authors. 

      We are grateful for the constructive and helpful comments to improve our manuscript. The abstract has now been modified by taking into account the Reviewer’s suggestions.

      Introduction 

      The intro is relatively long and wordy. It could most certainly be shortened and written in a more simple way to make it more impactful.

      The introduction has now been modified by taking into account the Reviewer’s suggestions.

      (2) One of the peptide stems in Figure 1 is missing a pentaglycine side chain; I would recommend increasing the font size; the peptide stem looks like it is attached to the carbon in position 2, it may be a good idea to move it to the left? 

      We thank the Reviewer for this comment. Figure 1 has been improved, the frameshift has been fixed and the non-cross-linked pGly bridge has been included to the lysine side-chain in tetraStem.

      Results 

      Figure 2 is a bit overwhelming and its description is sketchy. Fig 2B shows a much higher activity of LSS on pGly as compared to LytM whilst 2K shows a very similar rate. 

      We have rearranged Figures 1 and 2 by moving the original panel J in Figure 2 (structures of PG fragments) to Figure 1 panel C. The bar graph in Figure 2J now shows absolute rates of substrate hydrolysis for 2 mM LSS and LytM. These indicate that LSS is much more efficient against PG fragments in vitro in comparison to LytM. Rates normalized with respect to pGly are shown in Figure 2K. Also, a table showing absolute rates of hydrolysis for 2 mM LSS and 50 mM LytM has been included in Figure 2, panel L. In this Table, the values for PG fragments 2 and 3 were determined by two independent measurements to test and accredit the reproducibility of the method. This is also now elaborated further in the Materials and Methods.

      Figure 3 is impressive and very informative but again hard to follow. 

      - Panels 3A and 3B are nicely conceived but the resolution is rather poor and it is difficult to know exactly where the arrows point. 

      We very much value suggestions given by the Reviewer to improve readability of our manuscript. In the case of Figure 3, we have now greatly enhanced the resolution and readability of the figure by horizontal scaling of panels A and B.

      Figure 4 shows a comparative analysis of catalytic rate using various substrates, the authors may want to present graphs with the same y-axis to get the most out of the comparison between substrates. 

      The scaling of the y-axis is the same for all the substrates now. In addition, we have reorganized the panels in the figure to enhance readability.

      Figure 5: - The same remark as above, please cite all panels in alphabetical order. 

      Citing to Figure 5 has now been revised.

      Material and methods: 

      - How were the peptide concentrations determined? It may be useful to indicate if specific conditions were required to solubilize some peptides, pGly is particularly insoluble in aqueous solutions. 

      - Page 19, replace cpm by rpm; biological or technical replicates?

      These have now been added and edited accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      After reviewing the authors' response letter and the revised manuscript, I believe they have done a commendable job in addressing my comments.

      Additionally, I concur with the concerns raised by Reviewer #2 regarding several potential confounding factors that require better control in their experimental design. These include the differences in physical properties between vocal and nonvocal stimuli, as well as the infant's exposure to the speech/auditory environment. These concerns should be thoroughly and explicitly discussed in the manuscript, ensuring a clearer understanding for the readers.

      Thank you for the suggestion. We have discussion these limitations in our revised manuscript. In this round of revision, we have tempered our conclusion due to these limitations.

      Reviewer #2:

      The revised manuscript does discuss the limitations of the control stimuli, as well as the limitations with regard to conclusions that can be drawn from this data set. I therefore expected the authors to temper a bit their recommendation that this could be a 'screening' signal for autism because these data are not sufficiently strong to make that recommendation. Also, in the same vein, perhaps the title might be adjusted somewhat to suggest less certainty, for example, by using the word "change" rather than "milestone"'? The data are of interest, but the limitations are genuine limitations.

      Thank you for your expert comments and considerations. We have moderated our recommendation for autism screening and softened the statement of “milestone” throughout the manuscript. Please see the updated article title, abstract, significance statement, and discussion.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we will provide additional details on the animal collections as requested.

      We agree with point (3) in theory but not in fact. As shown in Figure 3a, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We will better highlight this evidence in the revision.

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses (lines 192-194). We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We will make this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis.

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 3b, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We will add this point to the limitations. As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). We believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. That said, we will conduct a sensitivity analysis as recommended to determine the impact of sampling and inclusion of the small clusters on our primary findings.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We will revise it.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.

      We will expand the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies. Unfortunately, we did not have sequences available from other species, and we will add this to the limitations. Sequences from other species may be available from sequences collected by others, which as we note in the limitations do not have sufficient metadata to assign them to Alberta vs. the rest of Canada. While we have requested this data, we have been unsuccessful in obtaining it. We will continue to pursue it.

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and will expand our discussion of relevance to non-O157 STEC. Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We will add a discussion of these.

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. However, we will add a more explicit discussion of risk factors.

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. We will add a discussion of this. We will also be adding a sensitivity analysis to the manuscript simulating a different sampling approach, which should also be informative to this question.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      We will add discussions on E. tarda flagellin and examine whether E. tarda flagellin or T3SS needle/rod can activate NLRC4.

      The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      The reviewer raises an interesting question. We will explore this question and provide relevant discussions/hypothesis in the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses:

      The functional assessment of EseB homologues is limited to inflammasome activation at the protein level but does not include the effects on cell viability as shown for E. tarda EseB. Confirmation that EseB homologues have similar effects on cell death would strengthen this portion of the manuscript.

      According to the reviewer’s suggestion, we plan to examine the effects of representative EseB homologs on cell death.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed. 

      See the results below. These results will be added and discussed in the revised manuscript.

      Author response image 1.

      (2) Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (3) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (4) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 2.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract – 

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction – 

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion – 

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72. 

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A.

      2007;104(36):14330–5. 

      (3) Dekker PJT, Keil P, Rassow J, Maarse AC, Pfanner N, Meijer M. Identification of MIM23, a putative component of the protein import machinery of the mitochondrial inner membrane. FEBS Lett. 1993;330(1):66–70. 

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from:

      https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive. However, the association of Sfp1 with cytoplasmic transcripts remains to be validated, as explained in the following comments:

      A two-hybrid based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. The revised version of the manuscript now states that the observed interaction could be indirect.

      To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular what would be the background of a similar experiment performed without UV cross-linking. This is crucial, as Figure S2G shows very localized and sharp peaks for the CRAC signal, often associated with over-amplification of weak signal during sequencing library preparation.

      (1) To rule out possible PCR artifacts, we used a UMI (Unique Molecular Identifier) scan. UMIs are short, random sequences added to each molecule by the 5’ adapter to uniquely tag them. After PCR amplification and alignment to the reference genome, groups of reads with identical UMIs represent only one unique original molecule. Thus, UMIs allow distinguishing between original molecules and PCR duplicates, effectively eliminating the duplicates.

      (2) Looking closely at the peaks using the IGV browser, we noticed that the reads are by no means identical. Each carrying a mutation [probably due to the cross-linking] in a different position and having different length. Note that the reads are highly reproducible in two replicate.

      (3) CRAC+ genes do not all fall into the category of highly transcribed genes.  On the contrary, as depicted in Figure 6A (green dots), it is evident that CRAC+ genes exhibit a diverse range of Rpb3 ChIP and GRO signals. Furthermore, as illustrated in Figure 7A, when comparing CRAC+ to Q1 (the most highly transcribed genes), it becomes evident that the Rpb4/Rpb3 profile of CRAC+ genes is not a result of high transcription levels.

      (4) Only a portion of the RiBi mRNAs binds Sfp1, despite similar expression of all RiBi.

      (5) The CRAC+ genes represent a distinct group with many unique features. Moreover, many CRAC+ genes do not fall into the category of highly transcribed genes.

      (6) The biological significance of the 262 CRAC+ mRNAs was demonstrated by various experiments; all are inconsistent with technical flaws. Some examples are:

      a) Fig. 2a and B show that most reads of CRAC+ mRNA were mapped to specific location – close the pA sites.

      b) Fig. 2C shows that most reads of CRAC+ mRNA were mapped to specific RNA motif.

      c) Most RiBi CRAC+ promoter contain Rap1 binding sites (p= 1.9x10-22), whereas the vast majority of RiBi CRAC- promoters do not contain Rap1 binding site. (Fig. 3C).

      d) Fig. 4A shows that RiBi CRAC+ mRNAs become destabilized due to Sfp1 deletion, whereas RiBi CRAC- mRNAs do not. Fig. 4B shows similar results due to

      e) Fig. 6B shows that the impact of Sfp1 on backtracking is substantially higher for CRAC+ than for CRAC- genes. This is most clearly visible in RiBi genes.

      f) Fig. 7A shows that the Sfp1-dependent changes along the transcription units is substantially more rigorous for CRAC+ than for CRAC-.

      g) Fig. S4B Shows that chromatin binding profile of Sfp1 is different for CRAC+ and CRAC- genes

      In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assess the specificity of the observed protein-RNA interactions (to complement Fig. 2D).

      GPP1, a highly expressed genes, is not to be pulled down by Sfp1 (Fig. 2D). GPP1 (alias RHR2) was included in our Table S2 as one of the 264 CRAC+ genes, having a low CRAC value. However, when we inspected GPP1 results using the IGV browser, we realized that the few reads mapped to GPP1 are actually anti-sense to GPP1 (perhaps they belong to the neighboring RPL34B genes, which is convergently transcribed to GPP1) (see Fig. 1 at the bottom of the document). Thus, GPP1 is not a CRAC+ gene and would now serve as a control. See  We changed the text accordingly (see page 11 blue sentences). In light of this observation, we checked other CRAC genes and found that, except for ALG2, they all contain sense reads (some contain both sense and anti-sense reads). ALG2 and GPP1 were removed leaving 262 CRAC+ genes.

      The CRAC-selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation. Also, whether the fraction of mRNA bound by Sfp1 is nuclear or cytoplasmic is unclear.

      Further experimental validation was provided in some of our figures (e.g., Fig. 5C, Fig. 3B).

      We argue that Sfp1 binds RNA co-transcriptionally and accompanies the mRNA till its demise in the cytoplasm: Co-transcriptional binding is shown in: (I) a drop in the Sfp1 ChIP-exo signal that coincides with the position of Sfp1 binding site in the RNA (Fig. 5C), demonstrating a movement of Sfp1 from chromatin to the transcript, (II) the dependence of Sfp1 RNA-binding on the promoter (Fig. 3B) and binding of intron-containing RNA. Taken together these 3 different experiments demonstrate that Sfp1 binds Pol II transcript co-transcriptionally.  Association of Sfp1 with cytoplasmic mRNAs is shown in the following experiments: (I) Figure 2D shows that Sfp1 pulled down full length RNA, strongly suggesting that these RNA are mature cytoplasmic mRNAs. (II) mRNA encoding ribosomal proteins, which belong to the CRAC+ mRNAs group are degraded by Xrn1 in the cytoplasm (Bresson et al., Mol Cell 2020). The capacity of Sfp1 to regulates this process (Fig. 4A-D) is therefore consistent with cytoplasmic activity of Sfp1. (III) The effect of Sfp1 on deadenylation (Fig. 4D), a cytoplasmic process, is also consistent with cytoplasmic activity of Sfp1. 

      To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. Removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Co-purification of reporter RNA with Sfp1 was only observed when Rap1 binding sites were included in the reporter. Negative controls for all the purification experiments might be useful.

      In the swapping experiment, the plasmid lacking RapBS serves as the control for the one with RapBS and vice versa (see Bregman et al., 2011). Remember, that all these contracts give rise to identical RNA. Indeed, RabBS affects both mRNA synthesis and decay, therefore the controls are not ideal. However, see next section.

      More importantly, in Fig. 3B “Input” panel, one can see that the RNA level of “construct F” was higher than the level of “construct E”. Despite this difference, only the RNA encoded by construct E was detected in the IP panel. This clearly shows that the detection of the RNA was not merely a result of its expression level.

      To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate, but also an increased degradation rate. This important observation needs careful validation,

      Indeed, we do provide validations in Fig. 4C Fig. 4D Fig. S3A and during the revision we included an  additional validation as Fig. S3B. Of note, we strongly suspect that GRO is among the most reliable approaches to determine half-lives (see our response in the first revision letter).

      As genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). As an additional validation, a temperature shift to 42{degree sign}C was used to show that , for specific ribosomal protein mRNA, the degradation was faster, assuming that transcription stops at that temperature. It would be important to cite and discuss the work from the Tollervey laboratory showing that a temperature shift to 42{degree sign}C leads to a strong and specific decrease in ribosomal protein mRNA levels, probably through an accelerated RNA degradation (Bresson et al., Mol Cell 2020, e.g. Fig 5E).

      This was cited. Thank you. 

      Finally, the conclusion that mRNA deadenylation rate is altered in the absence of Sfp1, is difficult to assess from the presented results (Fig. 3D).

      This type of experiment was popular in the past. The results in the literature are similar to ours (in fact, ours are nicer). Please check the papers cited in our MS and a number of papers by Roy Parker.

      The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. An effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. To what extent this result is important for the main message of the manuscript is unclear.

      Suggestions: a) please clearly indicate in the figures when they correspond to reanalyses of published results.

      This was done.

      b) In table S2, it would be important to mention what the results represent and what statistics were used for the selection of "positive" hits. 

      This was discussed in the text.

      Strengths:

      - Diversity of experimental approaches used.

      - Validation of large-scale results with appropriate reporters.

      Weaknesses:

      - Lack of controls for the CRAC results and lack of negative controls for the co-purification experiments that were used to validate specific mRNA targets potentially bound by Sfp1.

      - Several conclusions are derived from complex correlative analyses that fully depend on the validity of the aforementioned Sfp1-mRNA interactions.

      We hope that our responses to Reviewer 2's thoughtful comments have rulled out concerns regarding the lack of controls.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Please review the text for spelling errors. While not mandatory, wig or begraph files for the CRAC results would be very useful for the readers.

      Author response image 1.

      A snapshot of IGV GPP1 locus showing that all the reads are anti-sense (pointing at the opposite direction of the gene (the gene arrows [white arrows over blue, at the bottom] are pointing to the right whereas the reads’ orientations are pointing to the left).

    1. Author response:

      The following is the authors’ response to the current reviews.

      The concerns raised during the review have been incorporated into the discussion of the results, and the need for further research is acknowledged in the paper. This is not possible in the present study, as the clinical project has been completed and further patients cannot be enrolled without starting a new project. We are confident that the results are scientifically valid and that the methodology was scientifically sound and up to date. They were obtained on a dataset that was obviously large enough to allow 20% of it to be set aside and a machine-learned classifier to be trained on the remaining 80%, which then assigned samples to neuropathy with an accuracy better than guessing.

      Furthermore, our results are at least tentatively replicated in a completely independent data set from another patient cohort. The strengths and limitations of the study design, in particular the latter, are discussed in the necessary depth. In summary, the machine-learned results provided major hits on one side and probably unimportant lipids on the other side of the variable importance scale. Both could be verified in vitro. We are therefore confident that we have contributed to the advancement of knowledge about cancer therapy-associated neuropathy and look forward to further developments in this area.


      The following is the authors’ response to the original reviews.

      Weaknesses Reviewer 1: 

      There are a number of weaknesses in the study. The small sample size is a significant limitation of the study. Out of 31 patients, only 17 patients were reported to develop neuropathy, with significant neuropathy (grade 2/3) in only 5 patients. The authors acknowledge this limitation in the results and discussion sections of the manuscript, but it limits the interpretation of the results. Also acknowledged is the limited method used to assess neuropathy. 

      We agree with the reviewer that the cohort size and assessment of neuropathy are limitations of our study as we already described in the corresponding section of the manuscript. However, occurrence and grade of the neuropathy are in line with results reported from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70% (54.9% in our cohort), and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (13). In these studies, neuropathy is assessed by using questionnaires or by grading via NCTCTCAE as in our study. In summary, assessment and occurrence of neuropathy of our reported cohort are in line with previous reports.

      Potentially due to this small number of patients with neuropathy, the machine learning algorithms could not distinguish between samples with and without neuropathy. Only selected univariate analyses identified differences in lipid profiles potentially related to neuropathy.  

      The data analysis consistently followed a "mixture of experts" approach, as this seems to be the most successful way to deal with omics data. We have elaborated on this in the Methods section, including several supporting references. Regarding the quoted sentence from the results section, after rereading it, we realized that it was somewhat awkwardly worded. What we mean is now better worded in the results section, namely “Although the three algorithms detected neuropathy in new cases, unseen during training, at balanced accuracy of up to 0.75, while only the guess level of 0.5 was achieved when using permuted data for training, the 95% CI of the performance measures was not separated from guess level”. Therefore, multivariate feature selection was not considered a valid approach, since it requires that the algorithms from which the feature importance is read can successfully perform their task of class assignment (4). Therefore, univariate methods (Cohen's d, FPR, FWE) were preferred, as well as a direct hypothesis transfer of the top hits from the abovementioned day1/2 assessments to neuropathy. Classical statistics consisting of direct group comparisons using Kruskal-Wallis tests (5) were performed.” 

      It was our approach to investigate the data set in an unbiased manner by different machine learning algorithms and select those lipids that the majority of the algorithms considered important for distinguishing the patient groups (majority voting). This way, the inconsistencies and limitations of a single evaluation method, such as regression analysis, that occur in some datasets, can be mitigated. 

      Three sphingolipid mediators including SA1P differed between patients with and without neuropathy at the end of treatment. These sphingolipids were elevated at the end of treatment in the cohort with neuropathy, relative to those without neuropathy. However, across all samples from pre to post-paclitaxel treatment, there was a significant reduction in SA1P levels. It is unclear from the data presented what the underlying mechanism for this result would be. 

      We agree with the reviewer that our study does not identify the mechanism by which paclitaxel treatment alters sphingolipid concentrations in the plasma of patients. It has been reported before that paclitaxel may increase expression and activity of serine palmitoyltransferase (SPT) which is the crucial enzyme and rate-limiting step in the denovo synthesis of sphingolipids. This may be associated with a shift towards increased synthesis of 1-deoxysphingolipids and a decrease of “classical” sphingolipids (6) and may explain the general reduction of SA1P and other sphingolipid levels after paclitaxel treatment in our study. 

      It is also conceivable that paclitaxel reduces the release of sphingolipids into the plasma. Paclitaxel is a microtubule stabilizing agent (7) that may interfere with intracellular transport processes and release of paracrine mediators. 

      The mechanistic details of paclitaxel involvement in sphingolipid metabolism or transport are highly interesting but identifying them is beyond the scope of our manuscript.

      If elevated SA1P is associated with neuropathy development, it would be expected to increase in those who develop neuropathy from pre to post-treatment time points. 

      There is a general trend of reduced plasma SA1P concentrations following paclitaxel treatment. Nevertheless, patients experiencing neuropathy exhibit significantly elevated SA1P levels post-treatment. 

      It has been shown before that paclitaxel-induced neuropathic pain requires activation of the S1P1 receptor in a preclinical study (8). Moreover, a meta-analysis of genome-wide association studies (GWAS) from two clinical cohorts identified multiple regulatory elements and increased activity of S1PR1 associated with paclitaxel-induced neuropathy (9). These data imply that enhanced S1P receptor activity and signaling are key drivers of paclitaxel-induced neuropathy. It seems that both, increased levels of the sphingolipid ligands in combination with enhanced expression and activity of S1P receptors can potentiate paclitaxel-induced neuropathy in patients. This explains why also decreased SA1P concentrations after paclitaxel treatment can still enhance neuropathy via the S1PRTRPV1 axis in sensory neurons.

      We added this paragraph to the discussions section of our manuscript.

      Primary sensory neuron cultures were used to examine the effects of SA1P application.

      SA1P application produced calcium transients in a small proportion of sensory neurons. It is not clear how this experimental model assists in validating the role of SA1P in neuropathy development as there is no assessment of sensory neuron damage or other hallmarks of peripheral neuropathy. These results demonstrate that some sensory neurons respond to SA1P and that this activity is linked to TRPV1 receptors. However, further studies will be required to determine if this is mechanistically related to neuropathy.

      As we detected elevated levels of SA1P in the plasma of PIPN patients, we can assume higher concentrations in the vicinity of sensory neurons. These neurons are the main drivers for neuropathy and neuropathic pain and are strongly affected by paclitaxel in their activity (10-15). Also, TRPV1 shows altered activity patterns in response to paclitaxel treatment (16). Because of its relevance for nociception and pathological pain, TRPV1 activity is a suitable and representative readout for pathological pain states in peripheral sensory neurons (17, 18), which is why we investigated them.

      We would like to point out the potency of SA1P to increase capsaicin-induced calciumtransients in sensory neurons at submicromolar concentrations. 

      We also agree with the reviewer that further studies need to investigate the underlying mechanisms in more detail. We added this sentence to the final paragraph in the discussion section of our manuscript.

      Weaknesses Reviewer 2: 

      The article is poorly written, hindering a clear understanding of core results. While the study's goals are apparent, the interpretation of sphingolipids, particularly SA1P, as key mediators of paclitaxel-induced neuropathy lacks robust evidence. 

      We agree that the relevance of SA1P as key mediator of paclitaxel-induced neuropathy might be overstated and changed the wording throughout the manuscript accordingly. However, we would like to point out the potency of this lipid to increase capsaicin-induced calcium-transients in sensory neurons at submicromolar concentrations. 

      Also, the lipid signature in the plasma of PIPN patients shows a unique pattern and sphingolipids are the group that showed the strongest alterations when comparing the patient groups. We also measured eicosanoids, such as prostaglandins, linoleic acid metabolites, endocannabinoids and other lipid groups that have previously been associated with influences on pain perception or nociceptor sensitization. However, none of these lipids showed significant differences in their concentrations in patient plasma. This is why we consider sphingolipids as contributors to or markers of paclitaxel-induced neuropathy in patients.

      We also revised the entire article to improve its clarity.

      The introduction fails to establish the significance of general neuropathy or peripheral neuropathy in anticancer drug-treated patients, and crucial details, such as the percentage of patients developing general neuropathy or peripheral neuropathy, are omitted. This omission is particularly relevant given that only around 50% of patients developed neuropathy in this study, primarily of mild Grade 1 severity with negligible symptoms, contradicting the study's assertion of CIPN as a significant side effect. 

      As we already described in the introduction, CIPN is a serious dose- and therapy-limiting side effect, which affects up to 80% of treated patients. This depends on dose and combination of chemotherapeutic agents. For paclitaxel, therapeutic doses range from 80 – 225 mg/m². As CIPN symptoms are dose-dependent, the number of PIPN patients that receive a high paclitaxel dose is higher than the number of PIPN patient receiving a low dose.

      In our study, we mainly used a low dose paclitaxel, because this therapeutic regimen is the most widely used paclitaxel monotherapy. From previous studies, the expected occurrence of neuropathy with this therapeutic regimen is around 50-70%, and most patients (8090%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3).

      Our results are within the range reported by these studies (54.9% patients with neuropathy). Also, as we highlight in Table S1, the neuropathy symptoms persist in most cases for several years after chemotherapy, affecting quality of life of these patients which makes it far from being a negligible symptom.

      We added some more information concerning PIPN in the introduction section in which we emphasize the clinical problem.

      The lack of clarity in distinguishing results obtained by lipidomics using machine learning methods and conventional methods adds to the confusion. The poorly written results section fails to specify SA1P's downregulation or upregulation, and the process of narrowing down to sphingolipids and SA1P is inadequately explained. 

      We have tried to keep the machine learning part in the main manuscript short and moved major parts of it to a supplement. However, as this has been claimed to have led to a lack of clarity, we have expanded the description of the data analysis and added extensive explanations and supporting references for the mixed expert approach that was used throughout the analysis. We hope this is now clear.

      Integrating a significant portion of the discussion section into the results section could enhance clarity. An explanation of the utility of machine learning in classifying patient groups over conventional methods and the citation of original research articles, rather than relying on review articles, may also add clarity to the usefulness of the study. 

      As suggested by the reviewer, we moved the relevant parts from the discussion to the results section in the revised version of our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 2 should be better explained or removed. In its current form, it does not add to the interpretation of the manuscript.  

      As mentioned above, we have expanded the description of the ESOM/U-matrix method in the Methods section and rewritten the figure legend. In addition, we have annotated the U-matrix in the figure. The method has been reported extensively in the computer science and biomedical literature, and a more detailed description in the referenced papers would go beyond the current focus on lipidomics. However, we believe that this discussion is sufficiently detailed for the readers of this report: "… a second unsupervised approach was used to verify the agreement between the lipidomics data structure and the prior classification, implemented as self-organizing maps (SOM) of artificial neurons (19). In the special form of an “emergent” SOM (ESOM (20)), the present map consisted of 4,000 neurons arranged on a two-dimensional toroidal grid with 50 rows and 80 columns (21, 22). ESOM was used because it has been repeatedly shown to correctly detect subgroup structures in biomedical data sets comparable to the present one (20, 22, 23). The core principle of SOM learning is to adjust the weights of neurons based on their proximity to input data points. In this process, the best matching unit (BMU) is identified as the neuron closest to a given data point. The adaptation of the weights is determined by a learning rate (η) and a neighborhood function (h), both of which gradually decrease during the learning process. Finally, the groups are projected onto separate regions of the map. On top of the trained ESOM, the distance structure in the high-dimensional feature space was visualized in the form of a so-called U-matrix (24) which is the canonical tool for displaying the distance structures of input data on ESOM (21). 

      The visual presentation facilitates data group separation by displaying the distances between BMUs in high-dimensional space in a color-coding that uses a geographical map analogy, where large "heights" represent large distances in feature space, while low "valleys" represent data subsets that are similar. "Mountain ranges" with "snow-covered" heights visually separate the clusters in the data. Further details about ESOM can be found in (24)."

      The second patient cohort is only included in the discussion - with cohort details in the supplementary material and figures included in the main text. Perhaps these data should be removed entirely. The findings are described as trends and not statistically significant and multiple issues with this second cohort are mentioned in the discussion. 

      We agree with the reviewer that including the second patient cohort in the discussion is inadequate. Of course, there are differences between the patient cohorts that do not allow direct comparison and that are highlighted in the section on limitations of the study. However, we still think it is interesting and relevant to show these data, because we used our algorithms trained on the first patient cohort to analyze the second cohort. And these data support the main results. 

      We therefore moved the entire paragraph to the results section of to improve coherence of our manuscript. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      The title does not reflect the content of the paper and should be changed to better reflect the content and its significance. 

      We change the title to “Machine learning and biological validation identify sphingolipids as potential mediators of paclitaxel-induced neuropathy in cancer patients” to avoid overstating the results as suggested by the Reviewer.

      Further, the discussion should be modified to avoid overstating the results. 

      As the reviewer suggests, we changed the wording to avoid overstating the results. 

      Reviewer #2 (Recommendations For The Authors): 

      Please address the absence of clear neuropathy in the majority of patients after treatment with paclitaxel in your discussion. 

      As stated above, occurrence and grade of the neuropathy are in line with the results from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70%, (the variability is due to differences in the assessment methods) and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3). 

      We added this information in the discussion section of the revised manuscript.

      Line 65: Kindly replace review articles with original research articles for proper citation. 

      We replaced the review articles with original publications, focusing on clinical observations. We added the following publications: Jensen et al., Front Neurosci 2020; Chen et al., Neurobiol Aging 2018; Igarashi et al., J Alzheimers Dis. 2011; Kim et al., Oncotarget 2017 as references 17-20 in the revised version of our manuscript.

      Line 260: The mention of SA1P is introduced here without prior reference (do not use words like "again", or "see above", if it is not previously mentioned). Adjust the text for coherence.

      We agree with the reviewer that the introduction of SA1P in this passage in incoherent. We replaced the sentence in line 260 with: 

      The small set of lipid mediators emerging from all three methods as informative for neuropathy included the sphingolipid sphinganine-1-phosphate (SA1P), also known as dihydrosphingosine-1-phosphate (DH-S1P)…”

      Lines 301-315: Consider relocating several lines from this section to the results section for improved clarity. 

      We moved the lines 309-312 explaining the algorithm selection and their validation success in the corresponding results section (Lipid mediators informative for assigning postpaclitaxel therapy samples to neuropathy).

      Lines 382-396: Move this content to the results section to enhance the organization and coherence of the manuscript. 

      We moved the entire paragraph to the results section of our manuscript to improve coherence. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      References

      (1) Barginear M, Dueck AC, Allred JB, Bunnell C, Cohen HJ, Freedman RA, et al. Age and the Risk of Paclitaxel-Induced Neuropathy in Women with Early-Stage Breast Cancer (Alliance A151411): Results from 1,881 Patients from Cancer and Leukemia Group B (CALGB) 40101. Oncologist. 2019;24(5):617-23.

      (2) Mauri D, Kamposioras K, Tsali L, Bristianou M, Valachis A, Karathanasi I, et al. Overall survival benefit for weekly vs. three-weekly taxanes regimens in advanced breast cancer: A metaanalysis. Cancer Treat Rev. 2010;36(1):69-74.

      (3) Budd GT, Barlow WE, Moore HC, Hobday TJ, Stewart JA, Isaacs C, et al. SWOG S0221: a phase III trial comparing chemotherapy schedules in high-risk early-stage breast cancer. J Clin Oncol. 2015;33(1):58-64.

      (4) Lötsch J, and Ultsch A. Pitfalls of Using Multinomial Regression Analysis to Identify ClassStructure-Relevant Variables in Biomedical Data Sets: Why a Mixture of Experts (MOE) Approach Is Better. BioMedInformatics. 2023;3(4):869-84.

      (5) Kruskal WH, and Wallis WA. Use of Ranks in One-Criterion Variance Analysis. J Am Stat Assoc. 1952;47(260):583-621.

      (6) Kramer R, Bielawski J, Kistner-Griffin E, Othman A, Alecu I, Ernst D, et al. Neurotoxic 1deoxysphingolipids and paclitaxel-induced peripheral neuropathy. FASEB J. 2015;29(11):4461-72.

      (7) Field JJ, Diaz JF, and Miller JH. The binding sites of microtubule-stabilizing agents. Chem Biol. 2013;20(3):301-15.

      (8) Janes K, Little JW, Li C, Bryant L, Chen C, Chen Z, et al. The development and maintenance of paclitaxel-induced neuropathic pain require activation of the sphingosine 1-phosphate receptor subtype 1. J Biol Chem. 2014;289(30):21082-97.

      (9) Chua KC, Xiong C, Ho C, Mushiroda T, Jiang C, Mulkey F, et al. Genomewide Meta-Analysis Validates a Role for S1PR1 in Microtubule Targeting Agent-Induced Sensory Peripheral Neuropathy. Clin Pharmacol Ther. 2020;108(3):625-34.

      (10) Kawakami K, Chiba T, Katagiri N, Saduka M, Abe K, Utsunomiya I, et al. Paclitaxel increases high voltage-dependent calcium channel current in dorsal root ganglion neurons of the rat. J Pharmacol Sci. 2012;120(3):187-95.

      (11) Pittman SK, Gracias NG, Vasko MR, and Fehrenbacher JC. Paclitaxel alters the evoked release of calcitonin gene-related peptide from rat sensory neurons in culture. Exp Neurol. 2013.

      (12) Luo H, Liu HZ, Zhang WW, Matsuda M, Lv N, Chen G, et al. Interleukin-17 Regulates NeuronGlial Communications, Synaptic Transmission, and Neuropathic Pain after Chemotherapy.

      Cell reports. 2019;29(8):2384-97 e5.

      (13) Pease-Raissi SE, Pazyra-Murphy MF, Li Y, Wachter F, Fukuda Y, Fenstermacher SJ, et al. Paclitaxel Reduces Axonal Bclw to Initiate IP3R1-Dependent Axon Degeneration. Neuron. 2017;96(2):373-86 e6.

      (14) Duggett NA, Griffiths LA, and Flatters SJL. Paclitaxel-induced painful neuropathy is associated with changes in mitochondrial bioenergetics, glycolysis, and an energy deficit in dorsal root ganglia neurons. Pain. 2017.

      (15) Li Y, Adamek P, Zhang H, Tatsui CE, Rhines LD, Mrozkova P, et al. The Cancer Chemotherapeutic Paclitaxel Increases Human and Rodent Sensory Neuron Responses to TRPV1 by Activation of TLR4. J Neurosci. 2015;35(39):13487-500.

      (16) Hara T, Chiba T, Abe K, Makabe A, Ikeno S, Kawakami K, et al. Effect of paclitaxel on transient receptor potential vanilloid 1 in rat dorsal root ganglion. Pain. 2013;154(6):882-9.

      (17) Jardin I, Lopez JJ, Diez R, Sanchez-Collado J, Cantonero C, Albarran L, et al. TRPs in Pain Sensation. Front Physiol. 2017;8:392.

      (18) Julius D. TRP Channels and Pain. Annual review of cell and developmental biology.

      2013;29:355-84.

      (19) Kohonen T. Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern. 1982;43(1):59-69.

      (20) Lötsch J, Lerch F, Djaldetti R, Tegder I, and Ultsch A. Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix). Big Data Analytics. 2018;3(1):5.

      (21) Ultsch A. 2003.

      (22) Lotsch J, Geisslinger G, Heinemann S, Lerch F, Oertel BG, and Ultsch A. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B-induced local skin hypersensitization in healthy subjects: a machine-learned analysis. Pain. 2018;159(1):11-24.

      (23) Lötsch J, Thrun M, Lerch F, Brunkhorst R, Schiffmann S, Thomas D, et al. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects. Int J Mol Sci. 2017;18(6).

      (24) Lötsch J, and Ultsch A. Cham: Springer International Publishing; 2014:249-57.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprograming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Strengths:

      The data are convincing and supported by appropriate, validated methodology. These results are both technically and scientifically exciting and are likely to appeal to retinal specialists and neuroscientists in general.

      Weaknesses:

      There are some data gaps that need to be addressed.

      (1) Please label the time points of AAV injection, EdU labeling, and harvest in Figure 1B.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (2) What fraction of Müller cells were transduced by AAV under the experimental conditions?

      We apologize for not clearly conveying the transduction efficiency. The retinal region adjacent to the injection site, typically near the central retina, exhibits a transduction efficiency of nearly 100%. In contrast, the peripheral retina shows a lower transduction efficiency compared to the central region. We will include the quantification of AAV transduction efficiency in the revised manuscript.

      The quantification of Edu+ MG or other markers was conducted in the area with the highest efficiency. 

      (3) It seems unusually rapid for MG proliferation to begin as early as the third day after CCA injection. Can the authors provide evidence for cyclin D1 overexpression and p27 Kip1 knockdown three days after CCA injection?

      In our pilot study, we tested the onset time of GFP expression from AAV-GFAP-GFP following intravitreal injection. We observed GFP expression in MG as early as two days post-infection. These findings will be included in the revised manuscript. Additionally, we plan to perform qPCR or Western blot analysis to confirm cyclin D1 overexpression and p27kip1 knockdown at the onset of Müller glia proliferation, which will also be included in the revised manuscript.

      (4) The authors reported that MG proliferation largely ceased two weeks after CCA treatment. While this is an interesting finding, the explanation that it might be due to the dilution of AAV episomal genome copies in the dividing cells seems far-fetched.

      We believe that the lack of durability in high Cyclin D1 and low p27kip1 levels in MG contributes to the cessation of their proliferation. A potential reason for the loss of high Cyclin D1 overexpression and p27kip1 knockdown during MG proliferation could be the dilution of the AAV episomal genome. However, testing this hypothesis is challenging. Instead, we plan to provide direct evidence in the revised manuscript by examining the levels of Cyclin D1 and p27kip1 in the retina treated with CCA before and after the peak of MG proliferation.

      Reviewer #2 (Public Review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons. While the evidence for stimulating proliferation in this study is convincing, the evidence for neurogenesis in this study is not convincing or robust, suggesting that stimulating cell cycle-reentry may not be associated with increasing regeneration without another proneural stimulus.

      Below are concerns and suggestions.

      Intro:

      (1) The authors cite past studies showing "direct conversion" of MG into neurons. However, these studies (PMID: 34686336; 36417510) show EdU+ MG-derived neurons suggesting cell cycle re-entry does occur in these strategies of proneural TF overexpression.

      We thank the reviewer for pointing this out. We will revise the statement to "MG neurogenesis," which encompasses both direct conversion and Müller glia proliferation followed by neuronal differentiation.

      (2) Multiple citations are incorrectly listed, using the authors first name only (i.e. Yumi, et al; Levi, et al;). Studies are also incompletely referenced in the references.

      We apologize for the mistake with the reference. We will fix these mistakes in the revised version.

      Figure 1:

      (3) When are these experiments ending? On Figure 1B it says "analysis" on the end of the paradigm without an actual day associated with this. This is the case for many later figures too. The authors should update the paradigms to accurately reflect experimental end points.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (4) Are there better representative pictures between P27kd and CyclinD OE, the EdU+ counts say there is a 3 fold increase between Figure 1D&E, however the pictures do not reflect this. In fact, most of the Edu+ cells in Figure 1E don't seem to be Sox9+ MG but rather horizontally oriented nuclei in the OPL that are likely microglia.

      Thanks to the reviewer for pointing this out. We will replace the image of Cyclin D1 which a better representative image.

      (5) Is the infection efficacy of these viruses different between different combinations (i.e. CyclinD OE vs. P27kd vs. control vs. CCA combo)? As the counts are shown in Figure 1G only Sox9+/Edu+ cells are shown not divided by virus efficacy. If these are absolute counts blind to where the virus is and how many cells the virus hits, if the virus efficacy varies in efficiency this could drive absolute differences that aren't actually biological.

      Because the AAV-GFAP-Cyclin D1 and AAV-GFAP-Cyclin D1-p27kip1 shRNA viruses do not carry a fluorescent reporter gene, we cannot easily measure viral efficacy in the same experiment. We believe that variations in viral efficacy cannot account for the significant differences in MG proliferation for two reasons: 1) We injected the same titer for all three viruses, and 2) Viral infection efficacy is very high, approaching 100% in the central retina. Nonetheless, to rule out the possibility that the differences in MG proliferation among the Cyclin D overexpression, p27kip1 knockdown, and CCA groups are due to variations in viral efficacy, we will include the p27kip1 knockdown and Cyclin D1 overexpression efficiencies for all four groups using qPCR and/or Western blot analysis in the revised manuscript.

      (6) According to the Jax laboratories, mice aren't considered aged until they are over 18months old. While it is interesting that CCA treatment does not seem to lose efficacy over maturation I would rephrase the findings as the experiment does not test this virus in aged retinas.

      Thank you to the reviewer for bringing this to our attention. We will void using “aged mice” in our revised manuscript.

      (7) Supplemental Figure 2c-d. These viruses do not hit 100% of MG, however 100% of the P27Kip staining is gone in the P27sh1 treatment, even the P27+ cell in the GCL that is likely an astrocyte has no staining in the shRNA 1 picture. Why is this?

      For Supplementary Figure 2c-d, we focused on the central area where knockdown efficiency was high, approaching 100%. We will replace this image with one that includes both high and low Müller glia transduction efficiency regions, clearly demonstrating the complete loss of p27kip1 staining in the area of high transduction efficiency.

      Figure 2

      (8) Would you expect cells to go through two rounds of cell cycle in such a short time? The treatment of giving Edu then BrdU 24 hours later would have to catch a cell going through two rounds of division in a very short amount of time. Again the end point should be added graphically to this figure.

      We thank the reviewer for raising this important point. While the typical cell cycle time for human cells is approximately 24 hours, we hypothesized that 24 hours would be the most likely timepoint to capture cells continuously progressing through the cell cycle. However, we acknowledge that we cannot exclude the possibility of some cells entering a second cell cycle at much later timepoints.

      In the revised manuscript, we will carefully qualify our conclusion to state that the majority of MG do not immediately undergo another cell division, rather than making a definitive statement. This more cautious phrasing will better reflect the limitations of the 24-hour timepoint and allow for the potential of a small subset of cells proceeding through additional rounds of division at later stages.

      Figure 3

      (9) I am confused by the mixing of ratios of viruses to indicate infection success. I know mixtures of viruses containing CCA or control GFP or a control LacZ was injected. Was the idea to probe for GFP or LacZ in the single cell data to see which cells were infected but not treated? This is not shown anywhere?

      The virus infection was not uniform across the entire retina. To mark the infection hotspots, we added 10% GFP virus to the mixture. Regions of the retina with low infection efficiency were removed by dissection and excluded from the scRNA-seq analysis. We apologize for not clearly explaining this methodological detail in the original text, and will update the Methods section accordingly.

      (10) The majority of glia sorted from TdTomato are probably not infected with virus. Can you subset cells that were infected only for analysis? Otherwise it makes it very hard to make population judgements like Figure 3E-H if a large portion are basically WT glia.

      This question is related to the last one. Since the regions with high virus infection efficiency were selectively dissected and isolated for analysis, the percentage of CCA-infected MG should constitute the majority in the scRNA-seq data.

      (11) Figure 3C you can see Rho is expressed everywhere which is common in studies like this because the ambient RNA is so high. This makes it very hard to talk about "Rod-like" MG as this is probably an artifact from the technique. Most all scRNA-seq studies from MG-reprogramming have shown clusters of "rods" with MG hybrid gene expression and these had in the past just been considered an artifact.

      We agree that the low levels of Rho in other MG clusters (such as quiescent, reactivated, and proliferating MG) are likely due to RNA contamination. However, the level of Rho in the rod-like MG is significantly higher than in the other clusters, indicating that this is unlikely to be solely due to contamination.

      As shown in Supplementary Figure 7A-C, a cluster of MG-rod hybrid cells (cluster C4) was present in all three experimental groups at similar ratios, and this hybrid cluster was excluded from further analysis. In contrast, the rod-like Müller glia (cluster C3) were predominantly found in the CCA and CCANT groups, suggesting a genuine response to CCA treatment.

      Furthermore, we will conduct Rho and Gnat1 RNA in situ hybridization on the dissociated retinal cells to further support the conclusion that rod-specific genes are upregulated in a subset of MG in the revised manuscript.

      (12) It is mentioned the "glial" signature is downregulated in response to CCA treatment. Where is this shown convincingly? Figure H has a feature plot of Glul , which is not clear it is changed between treatments. Otherwise MG genes are shown as a function of cluster not treatment.

      We will add box plots of several MG-specific genes to better illustrate the downregulation of the glial signature in the relevant cell cluster in the revised manuscript.

      Figure 4

      (13) The authors should be commended for being very careful in their interpretations. They employ the proper controls (Er-Cre lineage tracing/EdU-pulse chasing/scRNA-seq omics) and were very careful to attempt to see MG-derived rods. This makes the conclusion from the FISH perplexing. The few puncta dots of Rho and GNAT in MG are not convincing to this reviewer, Rho and GNAT dots are dense everywhere throughout the ONL and if you drew any random circle in the ONL it would be full of dots. The rigor of these counts also comes into question because some dots are picked up in MG in the INL even in the control case. This is confusing because baseline healthy MG do not express RNA-transcripts of these Rod genes so what is this picking up? Taken together, the conclusion that there are Rod-like MG are based off scRNA-seq data (which is likely ambient contamination) and these FISH images. I don't think this data warrants the conclusion that MG upregulate Rod genes in response to CCA.

      We performed RNA in situ hybridization on retinal sections because we aimed to correlate cell localization with rod gene expression. We understand the reviewer’s concern that the punctate signals of Rho and GNAT1 in the ONL MG may actually originate from neighboring rods. In the revised manuscript, we will conduct RNAscope on dissociated retinal cells to avoid this issue.

      Figure 5

      (14) Similar point to above but this Glul probe seems odd, why is it throughout the ONL but completely dark through the IPL, this should also be in astrocytes can you see it in the GCL? These retinas look cropped at the INL where below is completely black. The whole retinal section should be shown. Antibodies exist to GS that work in mouse along with many other MG genes, IHC or western blots could be done to better serve this point.

      Indeed, the GCL was cropped out in Figure 5 A-B. We have other images with all retinal layers, which we will use in the revised manuscript. Additionally, we will perform the GS antibody staining to demonstrate partial MG dedifferentiation following CCA treatment.

      Figure 6

      (15) Figure 6D is not a co-labeled OTX2+/ TdTomato+ cell, Otx2 will fill out the whole nucleus as can be seen with examples from other MG-reprogramming papers in the field (Hoang, et al. 2020; Todd, et al. 2020; Palazzo, et al. 2022). You can clearly see in the example in Figure 6D the nucleus extending way beyond Otx2 expression as it is probably overlapping in space. Other examples should be shown, however, considering less than 1% of cells were putatively Otx2+, the safer interpretation is that these cells are not differentiating into neurons. At least 99.5% are not.

      We have additional examples of Otx2+ Tdt+ Edu+ cells, which suggest that MG neurogenesis to Otx2+ cells does occur, despite the low efficiency. We will include these images in the revised manuscript.

      (16) Same as above Figure 6I is not convincingly co-labeled HuC/D is an RNA-binding protein and unfortunately is not always the clearest stain but this looks like background haze in the INL overlapping. Other amacrine markers could be tested, but again due to the very low numbers, I think no neurogenesis is occurring.

      We have additional examples of HuC/D+ Tdt+ Edu+ cells, which we will show in the revised manuscript.

      (17) In the text the authors are accidently referring to Figure 6 as Figure 7.

      We thank the reviewer for pointing out the mistake. We will correct the mistake in the revised manuscript.

      Figure 7

      (18) I like this figure and the concept that you can have additional MG proliferating without destroying the retina or compromising vision. This is reminiscent of the chick MG reprogramming studies in which MG proliferate in large numbers and often do not differentiate into neurons yet still persist de-laminated for long time points.

      General:

      (19) The title should be changed, as I don't believe there is any convincing evidence of regeneration of neurons. Understanding the barriers to MG cell-cycle re-entry are important and I believe the authors did a good job in that respect, however it is an oversell to report regeneration of neurons from this data.

      We thank the reviewer for the suggestion. We will consider changing the title in the revised manuscript.

      (20) This paper uses multiple mouse lines and it is often confusing when the text and figures switch between models. I think it would be helpful to readers if the mouse strain was added to graphical paradigms in each figure when a different mouse line is employed.

      We will label the mouse lines used in each experiment in the figures where appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors re-examine the developmental origin of cortical oligodendrocyte (OL) lineage cells using a combination of strategies, focussing on the question of whether the LGE generates cortical OL cells. The paper is interesting to myelin biologists, the methods used are appropriate and, in general, the study is well-executed, thorough, and persuasive, but not 100% convincing.

      Thank you very much for approving our paper.

      Strengths, weaknesses, and recommendations:

      The first evidence presented that the LGE does not generate OLs for the cortex is that there are no OL precursors 'streaming' from the LGE during embryogenesis, unlike the MGE (Figure 1A). This in itself is not strong evidence, as they might be more dispersed. In fact, in the images shown, there is no obvious 'streaming' from the MGE either. Note that in Figure 1 there is no reference to the star that is shown in the figure.

      We totally agree with you. While OPC migration stream is not strong evidence to support that the LGE does not generate OPCs for the cortex, when considering our additional evidence, the absence of obvious 'streaming' from LGE to cortex provided supplementary support for this conclusion. Finally, we have removed the star in the figure.

      The authors then electroporate a reporter into the LGE at E13.5 and examine the fate of the electroporated cells (Figures 1C-E). They find that electroporated cells became neurons in the striatum and in the cortex but no OLs for the cortex. There are two issues with this: first, there is no quantification, which means there might indeed be a small contribution from the LGE that is not immediately obvious from snapshot images. Second, it is unexpected to find labelled neurons in the cortex at all since the LGE does not normally generate neurons for the cortex. Electroporations are quite crude experiments as targeting is imprecise and variable and not always discernible at later stages. For example, in Figure 1D, one can see tdTOM+ cells near the AEP, as well as the striatum. Hence, IUE cannot on its own be taken as proof that there is no contribution of the LGE to the cortical OL population.

      Thank you for your constructive suggestions.

      (1) Following the reviewer's suggestion, we have added these statistics, please see Figure 1F.

      (2) The reviewer raised a good point. We occasionally found a very small number of electroporated cells in the MGE/AEP VZ in our IUE system. Therefore, we can identify these electroporated cells in the cortex, most of them expressed the neuronal marker NeuN. We suspect these are MGE-derived cortical interneurons. It's worth noting that these electroporated cells (MGE-derived) are not glia cells. The probable reason may be that MGE/AEP generate cortical OPCs mainly before E13.5 (in this study we performed IUE at E13.5).

      The authors then use an alternative fate-mapping approach, again with E13.5 electroporations (Figure 2). They find only a few GFP+ cells in the cortex at E18 (Figures 2C-D) and P10 (Figure 2E) and these are mainly neurons, not OL lineage cells. Again, there is no quantification.

      Thank you very much for your suggestions. Actually, in this fate-mapping approach, the electroporated cells in the cortex is very few. We analyzed four mice, and found that all GFP positive cells (139 GFP+) did not express OLIG2, SOX10 and PDGFRA.

      Figure 3 is more convincing, but the experiments are incomplete. Here the authors generate triple-transgenic mice expressing Cre in the cortex (Emx1-Cre) and the MGE (Nkx2.1-Cre) as well as a strong nuclear reporter (H2B-GFP). They find that at P0 and P10, 97-98% of OL-lineage cells (SOX10+ or PDGFRA+) in the cortex are labelled with GFP (Figure 3). This is a more convincing argument that the LGE/CGE might not contribute significant numbers of OL lineage cells to the cortex, in contrast to the Kessaris et at. (2006) paper, which showed that Gsh2-Cre mice label ~50% of SOX10+ve cells in the motor cortex at P10. The authors of the present paper suggest that the discrepancy between their study and that of Kessaris et al. (2006) is based on the authors' previous observation (Zhang et al 2020) (https://doi.org/10.1016/j.celrep.2020.03.027) that GSH2 is expressed in intermediate precursors of the cortex from E18 onwards. If correct, then Kessaris et al. might have mistakenly attributed Gsh2-Cre+ lineages to the LGE/CGE when they were in fact intrinsic to the cortex. However, the evidence from Zhang et al 2020 that GSH2 is expressed by cortical intermediate precursors seems to rest solely on their location within the developing cortex; a more convincing demonstration would be to show that the GSH2+ putative cortical precursors co-label for EMX1 (by immunohistochemistry or in situ hybridization), or that they co-label with a reporter in Emx1-driven reporter mice. This demonstration should be simple for the authors as they have all the necessary reagents to hand. Without these additional data, the assertion that GSX2+ve cells in the cortex are derived from the cortical VZ relies partly on an act of faith on the part of the reader. Note that Tripathi et al. (2011, "Dorsally- and ventrally-derived oligodendrocytes have similar electrical properties but myelinate preferred tracts." J. Neurosci. 31, 6809-6819) found that the Gsh-Cre+ OL lineage contributed only ~20% of OLs to the mature cortex, not ~50% as reported by Kessaris et al. (2006). If it is correct that these Gsh2-derived OLs are from the cortical anlagen as the current paper claims, then it would raise the possibility that the ventricular precursors of GSH2+ intermediate progenitors are not uniformly distributed through the cortical VZ but are perhaps localized to some part of it. Then the contribution of Gsh2-derived OLs to the cortical population could depend on precisely where one looks relative to that localized source. It would be a nice addition to the current manuscript if the authors could explore the distribution of their GSH2+ intermediate precursors throughout the developing cortex. In any case, Tripathi et al. (2011) should be cited.

      Thank you for your constructive suggestions.

      (1) We used the Emx1Cre; RosaH2B-GFP mouse and found that nearly all GSX2+ cells in the cortical SVZ are derived from the Emx1+ lineage at P0 (Please see our new Figure 3-supplement 1A-C). 

      (2) According to your suggestion, we have cited this paper (Tripathi et al.) in our revised manuscript.

      (3) The study conducted by Kessaris et al. (2006) revealed that roughly 50% of cortical oligodendrocytes (OLs) originate from the Gsx2 lineage (LGE/CGE-derived). In contrast, Tripathi et al. (2011) observed that Gsx2-derived OLs contribute only around 20% to the corpus callosum (CC). To investigate the reasons behind these disparate findings, we conducted three experiments. Firstly, using Emx1Cre; RosaH2B-GFP mice, we found that approximately 89% of lateral CC (LCC) OLs originate from the Emx1 lineage, with only around 11% derived from the ventral source (refer to Author response image 1A and B below). Secondly, employing Nkx2-1Cre; RosaH2B-GFP mice, we determined that approximately 11% of LCC OLs originate from the Nkx2.1 lineage (refer to pictures C and D below). Finally, we found that approximately 98.3% of lateral LCC OLs originate from both Emx1 and Nkx2.1 lineages, with only around 1.7% possibly derived from the LGE (see Author response image 1E and F below). Taken together, our results indicate that approximately 89% of LCC OLs originate from the Emx1 lineage, while 11% of LCC OLs are derived from the medial ganglionic eminence (MGE).

      It is worth noting that OLs from Emx1 and Nkx2.1 lineages were equally distributed in the medial CC (mCC) (see Author response image 1G below). This finding suggests that MGE-derived OLs exhibit spatial heterogeneity in their distribution within the CC. These results provide evidence that the contribution of the lateral ganglionic eminence (LGE) and caudal ganglionic eminence (CGE) to CC OLs is minimal.

      Author response image 1.

      Finally, the authors deleted Olig2 in the MGE and found a dramatic reduction of PDGFRA+ and SOX10+ cells in the cortex at E14 and E16 (Figure 4A-F). This further supports their conclusion that, at least at E16, there is no significant contribution of OLs from ventral sources other than the MGE/AEP. This does not exclude the possibility that the LGE/CGE generates OLs for the cortex at later stages. Hence, on its own, this is not completely convincing evidence that the LGE generates no OL lineage cells for the cortex.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1. The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We speculate that these OPCs are derived from dorsal MGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) in the lateral cortex are derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal, which further supporting our findings. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      Reviewer #2 (Public Review):

      Traditional thinking has been that cortical oligodendrocyte progenitor cells (OPCs) arise in the development of the brain from the medial ganglionic eminence (MGE), lateral/caudal ganglionic eminence (LGE/CGE), and cortical radial glial cells (RGCs). Indeed a landmark study demonstrated some time ago that cortical OPCs are generated in three waves, starting with a ventral wave derived from the medial ganglionic eminence (MGE) or the anterior entopeduncular area (AEP) at embryonic day E12.5 (Nkx2.1+ lineage), followed by a second wave of cortical OLs derived from the lateral/caudal ganglionic eminences (LGE/CGE) at E15.5 (Gsx2+/Nkx2.1- lineage), and then a final wave occurring at P0, when OPCs originate from cortical glial progenitor cells (Emx1+ lineage). However, the authors challenge the idea in this paper that cortical progenitors are produced from the LGE. They have found previously that cortical glial progenitor cells were also found to express Gsx2, suggesting this may not have been the best marker for LGE-derived OPCs. They have used fate mapping experiments and lineage analyses to suggest that cortical OPCs do not derive from the LGE.

      Strengths:

      (1) The data is high quality and very well presented, and experiments are thoughtful and elegant to address the questions being raised.

      (2) The authors use two elegant approaches to lineage trace LGE derived cells, namely fate mapping of LGE-derived OPCs by combining IUE (intrauterine electroporation) with a Cre recombinase-dependent IS reporter, and Lineage tracing of LGE-derived OPCs by combining IUE with the PiggyBac transposon system. Both approaches show convincingly that labelled LGE-derived cells that enter the cortex do not express OPC markers, but that those co-labelling with oligodendrocyte markers remain in the striatum.

      (3) The authors then use further approaches to confirm their findings. Firstly they lineage trace Emx1-Cre; Nkx2.1-Cre; H2B-GFP mice. Emx1-Cre is expressed in cortical RGCs and Nkx2.1-Cre is specifically expressed in MGE/AEP RGCs. They find that close to 98% of OPCs in the cortex co-label with GFP at later times, suggesting the contribution of OPCs from LGE is minimal.

      (4) They use one further approach to strengthen the findings yet further. They cross Nkx2.1-Cre mice with Olig2 F/+ mice to eliminate Olig2 expression in the SVZ/VZ of the MGE/AEP (Figures 4A-B). The generation of MGE/AEP-derived OPCs is inhibited in these Olig2-NCKO conditional mice. They find that the number of cortical progenitors at E16.5 is reduced 10-fold in these mice, suggesting that LGE contribution to cortical OPCs is minimal.

      We thank the reviewer for summarizing the strengths of our manuscript.

      Weaknesses:

      (1) The authors use IUE in experiments mentioned in point 2 of 'Strengths' above (Figures 1 and 2) and claim that the reporter was delivered specifically into LGE VZ at E13.5 using this IUE. It would be nice to see some sort of time course of delivery after IUE to show the reporter is limited to LGE VZ at early times post-IUE.

      Thank you very much for your suggestions. Indeed, when using IUE in our system, we occasionally found a small number of electroporated cells in the MGE/AEP VZ. Thus, we can find very few electroporated cells (MGE/AEP-derived) in the cortex and these electroporated cells are neuron (perhaps interneuron).

      (2) In the experiments mentioned in point 3 of 'Strengths' (Figure 3), statistical analysis showed that only approximately 2% of OPCs were GFP-negative cells. This 2% could possibly be derived from the LGE/CGE so does not totally rule out that LGE contributes some cortical OPCs.

      Thank you for your constructive suggestions. We apologize for any imprecise descriptions. Despite we suspect that this 2% may originate from MGE {Considering the possibility of incomplete recombination in Olig2 gene locus, we guess the OPCs (Olig2+) may be derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F)} or from the dMGE (The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than in other Nkx2.1-expressing regions). Anyway, we have softened the assertion everywhere in our revised manuscript.

      (3) In the experiments mentioned in point 4 of 'Strengths' (Figure 4), they do still find cortical OPCs at E16.5 in the Olig2-NCKO conditional mice. It is unclear whether this is due to the recombination efficiency of the CRE enzyme not being 100%, or whether there is some LGE contribution to the cortical OPCs.

      This experiment alone may not provide strong evidence to support that LGE do not contribute to the cortical OPCs during development. However, when combing our other results with this result, we can confirm that the contribution of LGE to cortical OPCs is minimal. Furthermore, a recent study reported that LGE/CGE-derived OLs make minimum contributions to the neocortex and corpus callosum,which further supporting the reliability of our conclusion.

      We would like to thank the reviewers and editors for their valuable comments and suggestions again.

      Impact of Study:

      The authors show elegantly and convincingly that the contribution of the LGE to the pool of cortical OPCs is minimal. The title should perhaps be that the LGE contribution is minimal rather than no contribution at all, as they are not able to rule out some small contribution from the LGE. These findings challenge the traditional belief that the LGE contributes to the pool of cortical OPCs. The authors do show that the LGE does produce OPCs, but that they tend to remain in the striatum rather than migrate into the cortex. It is interesting to wonder why their migration patterns may be different from the MGE-derived OPCs which migrate to the cortex. The functional significance of these different sources of OPCs for adult cortex in homeostatic or disease states remains unclear though.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) Change the title to e.g. 'limited contribution of the LGE to cortical oligodendrocytes'. Alternatively, It might be more useful to highlight where they come from, e.g. "Cortical oligodendrocytes originate predominantly or exclusively from the MGE and cortical VZ"

      As suggested, we have changed the old title to the following: The lateral/caudal ganglionic eminence makes a limited contribution to cortical oligodendrocytes

      (2) Demonstrate using lineage tracing that GSH2+ cells in the cortex are derived from the Emx1-lineage, e.g. using immunohistochemistry for GSX2 and a reporter in Emx1-Cre mice crossed to a reporter.

      In our revised manuscript, we have added a new figure (Figure 3-supplement 1A-C) to demonstrate that the GSX2+ cells in the cortex are derived from the Emx1-lineage.

      (3) Make it clear in their discussion that they have not explored the CGE so it is possible that this region generates some OLs.

      The Emx1Cre; Nkx2.1Cre; H2B-GFP mice showed that only ~2% cortical OLs are derived from LGE/CGE. Actually, considering the efficiency of Cre enzyme recombination and the relatively low Cre activity in the dMGE of Nkx2.1Cre, the actual contribution of LGE/CGE-derived cortical OLs could be even lower than our current observation. Therefore, our results demonstrate that the LGE/CGE generate very few,possibly even no,OLs for the cortex.

      (4) Soften the assertion that the LGE does not generate any OL lineage cells that reach the cortex by e.g. changing the word 'sole' to 'predominant' (line 88) and, elsewhere in the paper, leaving open the possibility that small numbers of LGE-derived OLs might enter the cortex, depending on where exactly one looks.

      As suggested, we have softened the assertion everywhere in our manuscript.

      (5) Lines 255-260: 'First, the time window during which the MGE generates OLs is very brief, perhaps occurring before MGE neurogenesis. The high level of SHH in the MGE allows for the production of a small population of cortical OPCs around E12.5. Subsequently, multipotent intermediate progenitors begin to express DLX transcription factors resulting in ending the generation of OPCs in the MGE'. What is the evidence that OL genesis precedes neurogenesis? If there is none (as I suspect) then this statement should be removed.

      The editors raised a good point. We have no strong evidence to support that OL genesis precedes neurogenesis in MGE, thus, we removed these sentences in our manuscript.

      (6) Figure 1E should show quantification of cells as a % of electroporated cells and as a % of PDGFRA+ or OLIG2+ or SOX10+ cells, so that the reader might have a clear view of the extent of labelling.

      Done.

      (7) Figure 4: This is interesting but incomplete. At E14.5 the authors show the presence of PDGFRA+cells in the telencephalon. However, at E16.5 they show images only of the dorsal-most region of the cortex. If the LGE/CGE begins to generate OLPs for the early cortex, they would be expected to appear near the cortico-striatal boundary, as shown in Kessaris 2006 Fig1g-h. In the current manuscript, the authors do not show these regions, or the LGE and CGE, in their images. It is essential to show PDGFRA immunolabelling at the cortico-striatal boundary and also in the LGE and CGE at E16.5 in control and Olig2 mutant mice. It is also necessary to extend this analysis to E18.5, perhaps showing PDGFRA+ cells streaming from the cortical VZ/SVZ.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1.Frankly, the expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We guess these OPCs are derived from dMGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) are derived from MGE. In fact, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      (8) Cite Tripathi et al. (2011) and mention the disparity between the findings of that paper and Kessaris et al. (2006) and possible reasons - see main review above.

      Done.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Shore et al. report important effects of a heterozygous mutation in the KCNT1 potassium channel on ion currents and firing behavior of excitatory and inhibitory neurons in the cortex of KCNT1-Y777H mice. The authors provide solid evidence of physiological differences between this heterozygous mutation and their previous work with homozygotes. The reviewers appreciated the inclusion of recordings in ex vivo slices and dissociated cortical neurons, as well as the additional evidence showing an increase in persistent sodium currents (INaP) in parvalbumin-positive interneurons in heterozygotes. However, they were unclear regarding the likelihood of the increased sodium influx through INaP channels increasing sodium-activated potassium currents in these neurons.

      Regarding the last sentence of the eLife assessment, we’ve added a new paragraph to the Discussion section of the paper to address this concern. Please see the response to comment 1B of Reviewer #1 below for more details. We feel that the question of whether an increase in INaP would further increase KCNT1 activity is a valid discussion point but not a limitation of the importance or rigor of the work itself.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the effects of a heterozygous mutation in the KCNT1 potassium channels on the properties of ion currents and firing behavior of excitatory and inhibitory neurons in the cortex of mice expressing KCNT1-Y777H. In humans, this mutation as well as multiple other heterozygotic mutations produce very severe early-onset seizures and produce a major disruption of all intellectual function. In contrast, in mice, this heterozygous mutation appears to have no behavioral phenotype or any increased propensity to seizures. A relevant phenotype is, however, evident in mice with the homozygous mutation, and the authors have previously published the results of similar experiments with the homozygotes. As perhaps expected, the neuronal effects of the heterozygous mutation presented in this manuscript are generally similar but markedly smaller than the previously published findings on homozygotes. There are, however, some interesting differences, particularly on PV+ interneurons, which appear to be more excitable than wild type in the heterozygotes but more excitable in the heterozygotes. This raises the interesting question, which has been explicitly discussed by the authors in the revised manuscript, as to whether the reported changes represent homeostatic events that suppress the seizure phenotype in the mouse heterozygotes or simply changes in excitability that do not reach the threshold for behavioral outcomes.

      Strengths and Weaknesses:

      (1) The authors find that the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation. They propose that this results from the selective upregulation of a persistent sodium current INaP in the PV+ interneurons. These observations are very interesting ones, and they raised some issues in the original submission:

      A) The protocol for measuring the INaP current could potentially lead to results that could be (mis)interpreted in different ways in different cells. First, neither K currents nor Ca currents are blocked in these experiments. Instead, TTX is applied to the cells relatively rapidly (within 1 second) and the ramp protocol is applied immediately thereafter. It is stated that, at this time, Na currents and INaP are fully blocked but that any effects on Na-activated K currents are minimal. In theory this would allow the pre- to post- difference current to represent a relatively uncontaminated INaP. This would, however, only work if activation of KNa currents following Na entry is very slow, taking many seconds. A good deal of literature has suggested that the kinetics of activation of KNa currents by Na influx vary substantially between cell types, such that single action potentials and single excitatory synaptic events rapidly evoke KNa currents in some cell types. This is, of course, much faster than the time of TTX application. Most importantly, the kinetics of KNa activation may be different in different neuronal types, which would lead to errors that could produce different estimates of INaP in PV+ interneurons vs other cell types.

      In their revised manuscript, the authors have provided good data demonstrating that, at least for the PV and SST neurons, loss of KNa currents after TTX application is slow relative to the time course of loss of INaP, justifying the use of this protocol for these neuronal types.

      B) As the authors recognize, INaP current provides a major source of cytoplasmic sodium ions for the activation. An expected outcome of increased INaP is, therefore, further activation of KNa currents, rather than a compensatory increase in an inward current that counteracts the increase in KNa currents, as is suggested in the discussion.

      The authors comment in the rebuttal that, despite the fact that sodium entry through INaP is known to activate KNa channels, an increase in INaP does not necessarily imply increased KNa current. This issue should be addressed directly somewhere in the text, perhaps most appropriately in the discussion.

      We’ve added the following new paragraph to the Discussion section of the manuscript to address this concern:

      “As the persistent sodium current has been shown to act as a source of cytoplasmic sodium ions for KCNT1 channel activation in some neuron types (Hage & Salkoff, 2012), one might expect that the compensatory increase in INaP in YH-HET PV neurons would further increase, rather than counteract, KNa currents. Unfortunately, there is insufficient information on the relative locations of the INaP and KCNT1 channels, as well as the kinetics of sodium transfer to KCNT1 channels, among cortical neuron subtypes, and even less is known in the context of KCNT1 GOF neurons; thus, it is difficult to predict how alterations in one of these currents may affect the other. One plausible reason that increased INaP would not alter KNa currents in YH-HET PV neurons is that the particular sodium channels that are responsible for the increased INaP are not located within close proximity to the KCNT1 channels. Moreover, homeostatic mechanisms that modify the length and/or location of the sodium channel-enriched axon initial segment (AIS) in neurons in response to altered excitability are well described (Grubb & Burrone, 2010; Kuba et al., 2010); thus, it is possible that in YH-HET PV neurons, the length or location of the AIS is altered, leading to uncoupling of the sodium channels that are responsible for the increased INaP to the KCNT1 channels. Future studies will aim to further investigate potential mechanisms of neuron-type-specific alterations in NaP and KNa currents downstream of KCNT1 GOF.”  

      C) The numerical simulations, in general, provide a very useful way to evaluate the significance of experimental findings. Nevertheless, while the in-silico modeling suggests that increases in INaP can increase firing rate in models of PV+ neurons, there is as yet insufficient information on the relative locations of the INaP channels and the kinetics of sodium transfer to KNa channels to evaluate the validity of this specific model.

      The authors have now put in all of the appropriate caveats on this very nicely in the revised manuscript.

      (2) The effects of the KCNT1 channel blocker VU170 on potassium currents are somewhat larger and different from those of TTX, suggesting that additional sources of sodium may contribute to activating KCNT1, as suggested by the authors. Because VU170 is, however, a novel pharmacological agent, it may be appropriate to make more careful statements on this. While the original published description of this compound reported no effect on a variety of other channels, there are many that were not tested, including Na and cation channels that are known to activate KCNT1, raising the possibility of off-target effects.

      In the revised version, the authors have added more to the manuscript on this issue and have added a very clear discussion of this to the text (in the discussion section).

      This is a very clear and thorough piece of work, and the authors are to be congratulated on this. My one remaining suggestion would be to make an explicit statement about whether increased sodium influx through INaP channels, which is thought to activate KNa channels, would be likely to increase KNa current in these neurons (see comment 1B).

      Please see response to comment 1B.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shore et al. investigate the consequent changes in excitability and synaptic efficacy of diverse neuronal populations in an animal model of juvenile epilepsy. Using electrophysiological patch-clamp recordings from dissociated neuronal cultures, the authors find diverging changes in two major populations of inhibitory cell types, namely somatostatin (SST)- and parvalbumin (PV)-positive interneurons, in mice expressing a variant of the KCNT1 potassium channel. They further suggest that the differential effects are due to a compensatory increase in the persistent sodium current in PV interneurons in pharmacological and in silico experiments. It remains unclear why this current is selectively enhanced in PV-interneurons.

      Strengths:

      (1) Heterozygous KCNT1 gain of function variant was used which more accurately models the human disorder.

      (2) The manuscript is clearly written, and the flow is easy to follow. The authors explicitly state the similarities and differences between the current findings and the previously published results in the homozygous KCNT1 gain of function variant.

      (3) This study uses a variety of approaches including patch clamp recording, in silico modeling and pharmacology that together make the claims stronger.

      (4) Pharmacological experiments are fraught with off-target effects and thus it bolsters the authors' claims when multiple channel blockers (TTX and VU170) are used to reconstruct the sodium-activated potassium current.

      Weaknesses:

      (1) This study mostly relies on recordings in dissociated cortical neurons. Although specific WT interneurons showed intrinsic membrane properties like those reported for acute brain slices, it is unclear whether the same will be true for those cells expressing KCNT1 variants, especially when the excitability changes are thought to arise from homeostatic compensatory mechanisms. The authors do confirm that mutant SST-interneurons are hypoexcitable using an ex vivo slice preparation which is consistent with work for other KCTN1 gain of function variants (e.g. Gertler et al., 2022). However, the key missing evidence is the excitability state of mutant PV-interneurons, given the discrepant result of reduced excitability of PV cells reported by Gertler et al in acute hippocampal slices.

      Reviewer #3 (Public Review):

      Summary:

      The present manuscript by Shore et al. entitled Reduced GABAergic Neuron Excitability, Altered Synaptic Connectivity, and Seizures in a KCNT1 Gain-of-Function Mouse Model of Childhood Epilepsy" describes in vitro and in silico results obtained in cortical neurons from mice carrying the KCNT1-Y777H gain-of-function (GOF) variant in the KCNT1 gene encoding for a subunit of the Na+-activated K+ (KNa) channel. This variant corresponds to the human Y796H variant found in a family with Autosomal Dominant Nocturnal Frontal lobe epilepsy. The occurrence of GOF variants in potassium channel encoding genes is well known, and among potential pathophysiological mechanisms, impaired inhibition has been documented as responsible for KCNT1-related DEEs. Therefore, building on a previous study by the same group performed in homozygous KI animals, and considering that the largest majority of pathogenic KCNT1 variants in humans occur in heterozygosis, the Authors have investigated the effects of heterozygous Kcnt1-Y777H expression on KNa currents and neuronal physiology among cortical glutamatergic and the 3 main classes of GABAergic neurons, namely those expressing vasoactive intestinal polypeptide (VIP), somatostatin (SST), and parvalbumin (PV), crossing KCNT1-Y777H mice with PV-, SST- and PV-cre mouse lines, and recording from GABAergic neurons identified by their expression of mCherry (but negative for GFP used to mark excitatory neurons).

      The results obtained revealed heterogeneous effects of the variant on KNa and action potential firing rates in distinct neuronal subpopulations, ranging from no change (glutamatergic and VIP GABAergic) to decreased excitability (SST GABAergic) to increased excitability (PV GABAergic). In particular, modelling and in vitro data revealed that an increase in persistent Na current occurring in PV neurons was sufficient to overcome the effects of KCNT1 GOF and cause an overall increase in AP generation.

      Strengths:

      The paper is very well written, the results clearly presented and interpreted, and the discussion focuses on the most relevant points.

      The recordings performed in distinct neuronal subpopulations (both in primary neuronal cultures and, for some subpopulations, in cortical slices, are a clear strength of the paper. The finding that the same variant can cause opposite effects and trigger specific homeostatic mechanisms in distinct neuronal populations is very relevant for the field, as it narrows the existing gap between experimental models and clinical evidence.

      Weaknesses:

      My main concern regarding the epileptic phenotype of the heterozygous mice investigated has been clarified in the revision, where the infrequent occurrence of seizures is more clearly stated. Also, a more detailed statistical analysis of the modeled neurons has been added in the revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very clear and thorough piece of work, and the authors are to be congratulated on this. My one remaining suggestion would be to make an explicit statement about whether increased sodium influx through INaP channels, which is thought to activate KNa channels, would be likely to increase KNa current in these neurons (see comment 1B).

      Please see response to comment 1B.

      Reviewer #2 (Recommendations For The Authors):

      This revised manuscript is significantly improved and addresses most of my concerns. However, I would still recommend including the ex vivo slice recordings in mutant PV-interneurons as the authors proposed in their rebuttal. The I-V recordings using sequential TTX and VU170 blockade in WT SST and PV-interneurons that are provided in the rebuttal are interesting and may point to a preferential expression of persistent sodium currents in PV-interneurons normally. It would be helpful to readers as a supplemental figure.

      As proposed in the rebuttal, we are currently recording PV neurons using ex vivo slice preparations from WT and Kcnt1-YH Het mice. We look forward to including those data in a future manuscript.

      We agree with the reviewer that the differences in INaP between WT PV and SST neurons are notable. The data provided in the rebuttal were only from 5 neurons/group, and they were meant to illustrate a side-by-side comparison of TTX and VU170 subtraction methods to assess KNa currents. However, in Figure 7 of the manuscript, we performed more robust measurements of INaP and observed differences in the current between WT PV and SST neurons. Thus, we’ve added the following sentence to the Results section:

      “Interestingly, the mean peak amplitude of INaP in WT PV neurons was 70% larger than that in WT SST neurons (-1.42 ± 0.16 vs. -0.85 ± 0.07 pA/pF; Fig. 7B and 7D), suggesting there may be differences in sodium channel expression, localization, or regulation inherent to each neuron type that confer their differential response to KCNT1 GOF.”

      References

      Grubb, M. S., & Burrone, J. (2010). Activity-dependent relocation of the axon initial segment fine-tunes neuronal excitability. Nature, 465(7301), 1070-1074. https://doi.org/10.1038/nature09160

      Hage, T. A., & Salkoff, L. (2012). Sodium-activated potassium channels are functionally coupled to persistent sodium currents. J Neurosci, 32(8), 2714-2721. https://doi.org/10.1523/JNEUROSCI.5088-11.2012

      Kuba, H., Oichi, Y., & Ohmori, H. (2010). Presynaptic activity regulates Na(+) channel distribution at the axon initial segment. Nature, 465(7301), 1075-1078. https://doi.org/10.1038/nature09087

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation. 

      Strengths: 

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress. 

      We thank the reviewer for their time reviewing the manuscript and their comments regarding the potential mechanism(s) by which rod photoreceptors rapidly degenerate upon knockout of GLS.

      Weaknesses: 

      In this manuscript, the authors show a "metabolic dependency of photoreceptors on glutamine catabolism in vivo". However, there is a potential bias in their thinking that glutamine metabolism in rods is similar to cancer cells where it feeds into the TCA cycle. They should consider that as in neurons, GLS1 activity provides glutamate for synaptic transmission. The modest rescue shown by providing α-ketoglutarate in the drinking water suggests that glutamine isn't a key metabolic substrate for rods when glucose is plentiful. The ERG studies performed on the iCre-Glsflox/flox mice showed a large decrease in the scotopic b wave at saturating flashes which could indicate a decrease in glutamate at the rod synapse as stated by the authors. While EM micrographs of wt and iCre-Glsflox/flox mice were shown for the outer retina at p14, the synapse of the rods needs to be examined by EM. 

      We agree with the reviewer that in the presence of sufficient glucose, it appears a lack of GLS-driven glutamine (Gln) catabolism does not drastically alter the levels of TCA cycle metabolites or mitochondrial function as we demonstrated in Figure 4, and supplementation with alpha-ketoglutarate improved outer nuclear layer thickness by only a small amount as observed in Figure 5e. Hence, as we stated in the Results and Discussion, at least in the mouse where Gls is selectively deleted from rod photoreceptors by crossing Glsfl/fl mice with Rho-Cre mice (Glsfl/fl; Rho-Cre+, cKO), Gln’s role in supporting the TCA cycle is not the major mechanism by which rod photoreceptors utilize Gln to suppress apoptosis.

      With regards to GLS-driven Gln catabolism providing glutamate (Glu) for synaptic transmission, we again agree with the reviewer that Glu is an important excitatory neurotransmitter, but it is also a key metabolite necessary for the synthesis of glutathione, amino acids, and proteins. As noted and discussed at length in the manuscript, a lack of GLS-driven Gln catabolism in rod photoreceptors leads to reduced levels of oxidized glutathione (Figure 4D) possibly signaling an overall reduction in the biosynthesis of glutathione as Glu is directly and indirectly responsible for its synthesis. Furthermore, Gln and GLS-derived Glu play a central role in the biosynthesis of several nonessential amino acids and proteins. To this end, we see a reduction in the level of Glu, which is the product of the GLS reaction and further confirms the loss of GLS function. We also noted a significant decrease in aspartate (Asp), which can be constructed from the carbons and nitrogens of Gln as discussed at length in the manuscript (Figure 6A). Finally, we noted a significant decrease in global protein synthesis in the cKO retina as compared to the wild-type animal as well (Figure 6E). Therefore, the data suggest that GLS-driven Gln catabolism is critical for amino acid metabolism and protein synthesis and to some degree redox balance; although, the small but statistically significant changes in oxidized glutathione, NADP/NADPH, and redox gene expression may not fully account for the rapid and complete photoreceptor degeneration observed. Future studies are necessary to shed light on the role of redox imbalance in this novel transgenic mouse model.

      Glu also plays a role in synaptic transmission, and we considered this scenario as described in Figure 1 – figure supplement 5. Here, the synaptic connectivity between photoreceptors and the inner retina did not demonstrate significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer nor alterations in the labeling of a key protein (Bassoon) in ribbon synapses. These data suggest that the synaptic connectivity between photoreceptors and second-order neurons was unaltered at P14 in the cKO retina, which is the time just prior to rapid photoreceptor degeneration. We agree, though, that to obtain greater insight into the alterations in the ribbon synapse, EM images can be examined. The EM images shown in Figure 1 – figure supplement 4 are from P21 and will be utilized to assess the ribbon synapse for the revised version of the article.

      With regards to the ERG changes noted in Figure 2, we agree with the reviewer that a large decrease was noted in the scotopic b-wave at P21 and P42 in the cKO. However, an even larger reduction in the scotopic a-wave was noted at these ages as well. In animal models that disrupt photoreceptor synaptic function (Dick et al. Neuron. 2003; Johnson et al. J Neuroscience. 2007; Haeseleer et al. Nature Neuroscience. 2004; Chang et al. Vis Neurosci. 2006), a more negative ERG pattern is typically observed with the b-wave altered to a much larger degree than the a-wave. Additionally, in these models that disrupt photoreceptor synaptic transmission, the overall structure of the retina with respect to thickness is maintained (Dick et al. Neuron. 2003) or noted to have modest changes in the outer plexiform layer within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). In contrast, a rapid decline in the outer nuclear layer thickness was observed in the cKO retina after P14 likely contributing to the ERG changes noted in Figure 2.  Also, Gln is catabolized to Glu primarily by GLS as suggested by the approximately 50% reduction in Glu levels in the cKO retina (Figure 6A), but other enzymes are also capable of catabolizing Gln to Glu, so Glu levels in the rod photoreceptors are unlikely to be zero. Coupling this with the fact that rods are equipped with a self-sufficient Glu recollecting system at their synaptic terminals (Hasegawa et al. Neuron. 2006; Winkler et al. Vis Neurosci. 1999) and that GLS activity is at least two-fold higher in the photoreceptor inner segments, which support energy production and metabolism, than any other layer in the retina (Ross et al. Brain Res. 1987) suggests that altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina.

      The authors note that the outer segments are shorter but they do not address whether there is a decrease in the number of cones. 

      The number of cones will be assessed and provided in the revised version of the article.

      Rod-specific Gls ko mice with an inducible promoter were generated by crossing the Pde6g-CreERT2 and homozygous for either the WT or floxed Gls allele (IND-cKO). In Figure 3 the authors document that by western blots and antibody labeling the GLS1 expression is lost in the IND-cKO 10 days post tamoxifen. OCT images show a decrease in the thickness of the outer nuclear layer between 17 and 38 days post-TAM. Ergs should be performed on the animals at 10 and 30 days post TAM, before and after major structural changes in rod photoreceptor cells, to determine if changes in light-stimulated responses are observed. These studies could help to parse out the cause of photoreceptor cell death. 

      We agree with the reviewer that the IND-cKO is a useful tool to help parse out the cause of photoreceptor cell death in this model as well as shed light on the role of GLS-driven Gln catabolism in photoreceptor synaptic transmission as discussed at length above. Hence, ERG analyses will be provided for these animals in the revised version of the article.

      The studies in Figure 4 were all performed on iCre-Glsflox/flox and control mice at p14, why weren't the IND-cKO mice used for these studies since the findings would not be confounded by development? 

      To gain further insight into the role of GLS-driven Gln catabolism in the maintenance of rod photoreceptors as compared to their development/maturation, we will provide ERG and targeted metabolomic analyses of the IND-cKO retina in the revised version of the article.

      In all rescue studies, the endpoint was an ONL thickness, which only addressed rod cell death. The authors should also determine whether there are small improvements in the ERG, which would distinguish the role of GLS in preventing oxidative stress. 

      Optical coherence tomography (OCT) provides a sensitive in vivo method to detect small changes in retinal thickness without potential artifacts incurred through histological processing. Considering the Gls cKO retina demonstrates significant and rapid photoreceptor degeneration, we wanted to assess pathways that may be critical to photoreceptor survival downstream of GLS-driven Gln catabolism using rescue experiments with pharmacologic treatment or metabolite supplementation. That said, disruption of GLS-driven Gln catabolism may also significantly alter rod photoreceptor function beyond that which is secondary to photoreceptor cell death. As such, changes in ERG will be examined and provided in the revised version of the article for certain rescue experiments that demonstrated a robust change in ONL thickness.

      Reviewer #2 (Public Review): 

      Summary: 

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems. 

      Strengths: 

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function, and survival. 

      We greatly appreciate the reviewer’s thoughtful comments and time spent reviewing this manuscript.

      Weaknesses: 

      Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be of interest to test whether the conditional knockout mice have changes in metabolism (via qPCR such as shown in Figure 4 - Supplemental Figure 1) within the retinal pigment epithelium that may be contributing to the authors' findings in the neural retina. Additionally, the authors have very compelling data to show that inhibition of eIF2a or supplementation with asparagine can delay photoreceptor death via OCT measurements in their conditional knockout mouse model (Figure 6G, H). However, does inhibition of eIF2a or asparagine adversely impact the WT retina? It would also be impactful to know whether this has a prolonged effect, or if it is short-term, as this would provide strength to potential therapeutic targeting of these pathways to maintain photoreceptor health. 

      We agree with the reviewer that metabolic communication in the outer retina is crucial to the function and survival of both photoreceptors and RPE. We will perform qRT-PCR on the eyecups of these mice to assess any changes in the expression of metabolic genes. This data will be provided in the revised manuscript.

      We have data demonstrating systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina; this data will be included in the revised manuscript as a supplement to Figure 6. Additionally, we have recent data to suggest that the effect of ISRIB extends beyond P21 in the cKO mouse. This data will be included in the revised manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors explored the role of GLS, a glutaminase, which is an enzyme that catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors. 

      Strengths: 

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout the Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. A necessary metabolomic study was performed and appreciated. Some rescue experiments were performed and revealed possible mechanisms. 

      We thank the reviewer for their comments and appreciation of the methods utilized herein to address the role of GLS-driven Gln catabolism in rod photoreceptors.

      Weaknesses: 

      No major weaknesses were identified. The mechanism of GLS-loss-induced rod death seems not fully elucidated by this study but could be followed up in the future, and the same for GLS's role in cones.

      We agree with the reviewer that the downstream metabolic and molecular mechanisms by which Gln catabolism impacts rod photoreceptor health are not fully elucidated. Defining these mechanisms will advance our understanding of photoreceptor metabolism and identify therapeutic targets promoting photoreceptor resistance to stress. Future studies are underway to uncover these mechanisms. Additionally, while outside the scope of the current manuscript, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      A general comment was that this study left several key questions unanswered, in particular the causal mechanism for the reported ribosomal distributions.  We have been interested in the evolution of asymmetric bacterial growth and aging for many years. However, a motivational difference is that we are more interested in the evolutionary process, and evolution by natural selection works on the phenotype.  Thus, we wanted to start with the phenotype closest to fitness, appropriately defined for the conditions, work downwards.  We examined first the asymmetry of elongation rates in single cells, then gene products, and now ribosomes.  As we have pointed out, our demonstration of ribosomal asymmetry shows that the phenomenon was not peculiar and unique to the gene products we examined.  Rather, the asymmetry is acting higher up in the metabolic network and likely affecting all genes.  We find such conceptual guidance to be important.  In the ideal world, of course we would have liked to have worked out the causal mechanisms in one swoop.  In a less than ideal situation, it is a subjective decision as where to stop.  We believe that the publication of this manuscript is more than appropriate at this juncture.  We work at the interface of evolutionary theory and microbiology.  Our results could appeal to both fields.  If we attract new researchers, progress could be accelerated.  Could the delay caused by publishing only completed stories slow the rate of discovery?  These questions are likely as old as science (e.g., https://telliamedrevisited.wordpress.com/2021/01/28/how-not-to-write-a-response-to-reviewers/).

      We present below our response to specific comments by reviewers.  We have not added a new discussion of papers suggested by Reviewer #1 because we feel that the speculations would have been too unfocused.  We were already criticized for speculation in the Discussion about a link between aggregate size and ribosomal density.

      Respond to Major comments by Reviewer #1.

      a) Fig. 1 only shows 2 divisions (rather than 3 as per Rev1) to avoid an overly elaborate figure.  We have added text to the figure legend that the old and new poles and daughters in the subsequent 3, 4, 5, 6, and 7 generations can be determined by following the same notations and tracking we presented for generations 1 and 2 in Fig. 1.  For example, if we know the old and new poles of any of the four daughters after 2 divisions (as in Fig. 1), and allow that daughter to elongate, become a mother, and divide to produce 2 “grand-daughters”, the polarity of the grand-daughters can also be determined.

      b) Because division times were normalized and analyzed as quartiles, the raw values were never used.  Rather than annotating unused values, we have provided the mean division times in the Material and Methods section on normalization to provide representative values.

      c) We did not quantify in our study the changes over generations for three reasons.  First, the sample sizes for the first generations (cohorts of 1, 2, 4, and 8 cells) are statistically small.  Second, and most importantly, cells on an agar pad in a microscope slide, despite being inoculated as fresh exponentially growing cells, experience a growth lag, as all cells transferred to a new physiological condition.  Thus, to be safe, we do not collect data from cohorts 1, 2, 4, and 8 to ensure that our cells are as much as possible physiologically uniform.  Lastly, as we noted in the Material and Methods they also slow down after 7 generations (128 cells).  Thus, we have collected ribosome and length measurements primarily from cohorts 16, 32, 64, and 128.  Measurable cells from the 128 cohort are actually rare because a colony with that many cells often starts to form double layers, which are not measurable.  Most of our measurements came from the 16, 32, and 64 cohorts, in which case a time series would not be meaningful.  Some of these details were not included in our manuscript but have been added to the Material and Methods (Microscopy and time-lapse movies).  For these reasons we have not added a time series as requested by the reviewer.

      d) We have added the additional figure as requested, but as a supplement rather than in the main article (Supplemental Materials Fig. S1).  This figure showed the normalized density of ribosomes along the normalized length of old and new daughters.  The density was continuous rather than quartiles.  This figure was included in the original manuscript, but readers recommended that it be removed because the all the analyzed data had been done with quartiles.  Readers felt mislead and confused.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths:

      Strengths of the work are the very careful biochemical analyses and interesting result for wild type LRRK2.

      Weaknesses:

      The conclusions related to involvement of a monomer-dimer transition are to this reviewer, premature and an independent method needs to be utilized to bolster this aspect of the story.

      The monomer-dimer transition has been described in detail in our recent preprint Guaitoli et al., 2023 (doi: 10.1101/2023.08.11.549911). Where we in addition to mass-photometry have used blue-native page. Furthermore, to better elucidate the mechanistic impact of the phosphorylation, we have provided AlphaFold3 models. As the new AlphaFold version allows to consider PTMs as well as small molecules, we compared the models of the GDP vs the GTP-state of pT1343 LRRK2. Interestingly, the AF3 model suggests, that the phosphate of the pT1343 is orientated inwards thereby substituting the gamma phosphate (see Supplementary Figure 5). This finding is in well agreement with MD simulations published recently (Stormer et al., 2023, doi: 10.1042/BCJ20230126). As we are determining GTP hydrolysis in a multi turnover situation, the pT1343 might hamper the hydrolysis by competing with GTP re-binding. Final models have been deposited on Zenodo (https://doi.org/10.5281/zenodo.11242230).

      Reviewer #2 (Public Review):

      As discussed in the original review, this manuscript is an important contribution to a mechanistic understanding of LRRK2 kinase. Kinetic parameters for the GTPase activity of the ROC domain have been determined in the absence/presence of kinase activity. A feedback mechanism from the kinase domain to GTP/GDP hydrolysis by the ROC domain is convincingly demonstrated through these kinetic analyses. However, a regulatory mechanism directly linking the T1343 phosphosite and a monomer/dimer equilibrium is not fully supported. The T1343A mutant has reduced catalytic activity and can form similar levels of dimer as WT. The revised manuscript does point out that other regulatory mechanisms can also play a role in kinase activity and GTP/GDP hydrolysis (Discussion section). The environmental context in cells cannot be captured from the kinetic assays performed in this manuscript, and the introduction contains some citations regarding these regulatory factors. This is not a criticism, the detailed kinetics here are rigorous, but it is simply a limitation of the approach. Caveats concerning effects of membrane localization, Rab/14-3-3 proteins, WD40 domain oligomers, etc... should be given more prominence than a brief (and vague) allusion to 'allosteric targeting' near the end of the Discussion.

      We thank the reviewer for the evaluation of the manuscript and suggestions made. With respect to the mentioned caveats regarding the complex regulation of LRRK2 in its native cellular environment by effectors, localization and effector binding, we have revised the discussion, accordingly. We nevertheless, want to emphasize that the phospho-null mutant T1343A leads to an increase in Rab10 phosphorylation in cells, demonstrating a relevance of this regulatory mechanism under near physiological conditions (shown in Figure 6). In addition, to further elucidate the molecular mechanisms of the p-loop phosphorylation at T1343, we have performed AlphaFold3 modelling allowing to include phosphoresidues (see comment above, Supplemental Figure 5).

      Specific comments

      (1) The revised version is better organized with respect to the significance of monomer/dimer equilibrium and the relevance of the GTP-binding region of ROC domain that encompasses the T1343 phospho-site. The relevance of monomers/dimers of LRRK2 from previous studies is better articulated and readers are able to follow the reasoning for the various mutations.

      We thank the reviewer for the positive feedback. 

      (2) As a suggestion I would change the following on page 6 to clarify for readers: "...would show no change in kcat and KM values upon in vitro ATP treatment" to:

      "...would show no change in kcat and KM values for GTP hydrolysis upon in vitro

      ATP treatment"

      (3) The levels of dimer in WT (+ATP) and T1343A (+/- ATP) are the same, about 40-45%. These data are cited when the authors state that ATP-induced monomerization is 'abolished' (page 6). My suggestion is to re-phrase this conclusion for consistency with data (Fig 5). For example, one can state that 'ATP incubation does not affect the percentage of dimer for the T1343A variant of LRRK2'. This would be similar to the authors' description of these data on page 8 - 'no difference in dimer formation upon ATP treatment'.

      We thank the reviewer for the suggestions. We revised the manuscript accordingly. Changes have been highlighted in the version provided for reviewing purposes.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor revisions

      -change 'Although functional work on LRRK2 has been made significant progress...' to 'Although there is significant progress toward functional characterization of LRRK2...'

      -change 'exact mechanisms' to 'precise mechanisms', and similarly 'exact interplay' to 'precise interplay'

      -change 'On a contrary' to 'On the contrary' in Discussion

      -change remained to be unchanged' to 'remains unchanged', page 8

      We thank the reviewer for having noticed this. We have revised the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in experiment 3 and then tested where the string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on a string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into this study and think it contains a nice, thorough set of experiments. I enjoyed reading the paper and felt that overall it was well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      Weaknesses:

      (1) I think there are 2 key pieces of information that can be taken from the test phase - the bees' first choice and then their behavior across the whole test. I think the first choice is critical in terms of what the bee has learned from the training phase - then their behavior from this point is informed by the feedback they obtain during the test phase. I think both pieces of information are worth considering, but their behavior across the entire test phase is giving different information than their first choice, and this distinction could be made more explicit. In addition, while the bees' first choice is reported, no statistics are presented for their preferences.

      We agree with the reviewer that the first choice is critical in terms of what the bumblebees have learned from the training phase. We analyzed the bees’ first choice in Table 1, and we added the tested videos. The entire connected and disconnected strings were glued to the floor, the bees were unable to move either the connected or disconnected strings, and avoid learning behavior during the tests. We added the data of bee's each choice in the Supplementary table.

      (2) It seemed to me that the bees might not only be using visual feedback but also motor feedback. This would not explain their behavior in the first test choice, but could explain some of their subsequent behavior. For example, bees might learn during training that there is some friction/weight associated with pulling the string, but in cases where the string is separated from the flower, this would presumably feel different to the bee in terms of the physical feedback it is receiving. I'd be interested to see some of these test videos (perhaps these could be shared as supplementary material, in addition to the training videos already uploaded), to see what the bees' behavior looks like after they attempt to pull a disconnected string.

      We added supplementary videos of testing phase. As noted in General Methods, both connected and disconnected strings were glued to the floor to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string during the testing phase. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore, the difference in the friction/weight of pulling the both strings cannot be a factor in the test.

      (3) I think the statistics section needs to be made clearer (more in private comments).

      We changed the statistical analysis section as suggested by the reviewer.

      (4) I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

      Here we respectfully disagree. The solving of Rubik’s cube by humans could be said to be version of finger-movements naturally required to open nuts or remove ticks from fur, but this is somewhat beside the point: it’s not the motor sequences that are of interest, but the cognition involved. A general approach in work on animal intelligence and cognition is to deliberately choose paradigms that are outside the animals’ daily routines-this is what we have done here, in asking whether there is means-end comprehension in bee problem solving. Like comparable studies on this question in other animals, the experiments are designed to probe this question, not one of ecological validity.

      Reviewer #2 (Public Review):

      Summary:

      The authors wanted to see if bumblebees could succeed in the string-pulling paradigm with broken strings. They found that bumblebees can learn to pull strings and that they have a preference to pull on intact strings vs broken ones. The authors conclude that bumblebees use image matching to complete the string-pulling task.

      Strengths:

      The study has an excellent experimental design and contributes to our understanding of what information bumblebees use to solve a string-pulling task.

      Weaknesses:

      Overall, I think the manuscript is good, but it is missing some context. Why do bumblebees rely on image matching rather than causal reasoning? Could it have something to do with their ecology? And how is the task relevant for bumblebees in the wild? Does the test translate to any real-life situations? Is pulling a natural behaviour that bees do? Does image matching have adaptive significance?

      We appreciate the valuable comment from the reviewer. Our explanation, which we have now added to the manuscript, is as follows:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      As above, it worth noting that our work is not designed as an ecological study, but one about the question of whether causal reasoning can explain how bees solve a string-pulling puzzle. We have a cognitive focus, in line with comparable studies on other animals. We deliberately chose a paradigm that is to some extent outside of the daily challenges of the animal.

      Reviewer #3 (Public Review):

      Summary:

      This paper presents bees with varying levels of experience with a choice task where bees have to choose to pull either a connected or unconnected string, each attached to a yellow flower containing sugar water. Bees without experience of string pulling did not choose the connected string above chance (experiment 1), but with experience of horizontal string pulling (as in the right-hand panel of Figure 4) bees did choose the connected string above chance (experiments 2-3), even when the string colour changed between training and test (experiments 4-5). Bees that were not provided with perceptual-motor feedback (i.e they could not observe that each pull of the string moved the flower) during training still learned to string pull and then chose the connected string option above chance (experiment 6). Bees with normal experience of string pulling then failed to discriminate between connected and unconnected strings when the strings were coiled or looped, rather than presented straight (experiments 7-8).

      Weaknesses:

      The authors have only provided video of some of the conditions where the bees succeeded. In general, I think a video explaining each condition and then showing a clip of a typical performance would make it much easier to follow the study designs for scholars. Videos of the conditions bees failed at would be highly useful in order to compare different hypotheses for how the bees are solving this problem. I also think it is highly important to code the videos for switching behaviours. When solving the connected vs unconnected string tasks, when bees were observed pulling the unconnected string, did they quickly switch to the other string? Or did they continue to pull the wrong string? This would help discriminate the use of perceptual-motor feedback from other hypotheses.

      We added the test videos as suggested by the reviewer, and we added the data for each bee's choice. However, both connected and disconnected strings were glued to the floor, and therefore perceptual-motor feedback was equal and irrelevant between the choices during the test.

      The experiments are also not described well, for my below comments I have assumed that different groups of bees were tested for experiments 1-8, and that experiment 6 was run as described in line 331, where bees were given string-pulling training without perceptual feedback rather than how it is described in Figure 4B, which describes bees as receiving string pulling training with feedback.

      We now added figures of Experiment 6 and 7 in the Figure 1B, and we mentioned that different groups of bees were tested for Experiments 1-9.

      The authors suggest the bees' performance is best explained by what they term 'image matching'. However, experiment 6 does not seem to support this without assuming retroactive image matching after the problem is solved. The logic of experiment 6 is described as "This was to ensure that the bees could not see the familiar "lollipop shape" while pulling strings....If the bees prefer to pull the connected strings, this would indicate that bees memorize the arrangement of strings-connected flowers in this task." I disagree with this second sentence, removing perceptual feedback during training would prevent bees memorising the lollipop shape, because, while solving the task, they don't actually see a string connected to a yellow flower, due to the black barrier. At the end of the task, the string is now behind the bee, so unless the bee is turning around and encoding this object retrospectively as the image to match, it seems hard to imagine how the bee learns the lollipop shape.

      We agree with the reviewer that while solving the task in the last step during training, the bees don't actually see a string connected to a yellow flower, due to the black barrier. Since the full shape is only visible after the pulling is completed and this requires the bee to “check back” on the entire display after feeding, to basically conclude “ this is the shape that I need to be looking for later”.

      Another possibility is that bumblebees might remember the image of the “lollipop shape” while training the bees in the first step, in which the “lollipop shape” was directly presented to the bumblebee in the early step of the training.

      We added the experiment suggested by the reviewer, and the result showed that when a green table was placed behind the string to obscure the “lollipop shape” at any point during the training phase, the bees were unable to identify the connected string. The result further supports that bumblebees learn to choose the connected string through image matching.

      Despite this, the authors go on to describe image matching as one of their main findings. For this claim, I would suggest the authors run another experiment, identical to experiment 6 but with a black panel behind the bee, such that the string the bee pulls behind itself disappears from view. There is now no image to match at any point from the bee's perspective so it should now fail the connectivity task.

      Strengths:

      Despite these issues, this is a fascinating dataset. Experiments 1 and 2 show that the bees are not learning to discriminate between connected and unconnected stimuli rapidly in the first trials of the test. Instead, it is clear that experience in string pulling is needed to discriminate between connected and unconnected strings. What aspect of this experience is important? Experiment 6 suggests it is not image matching (when no image is provided during problem-solving, but only afterward, bees still attend to string connectivity) and casts doubt on perceptual-motor feedback (unless from the bee's perspective, they do actually get feedback that pulling the string moves the flower, video is needed here). Experiments 7 and 8 rule out means-end understanding because if the bees are capable of imagining the effect of their actions on the string and then planning out their actions (as hypotheses such as insight, means-end understanding and string connectivity suggest), they should solve these tasks. If the authors can compare the bees' performance in a more detailed way to other species, and run the experiment suggested, this will be a highly exciting paper

      We appreciate the valuable comment from the reviewer. We compared the bees' performance to other species, and conducted the experiment as suggested by the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Smaller comments:

      Line 64: is the word 'simple' needed here? It could also be explained by more complex forms of associative learning, no?

      We deleted “simple”.

      Methods:

      Line 230: was it checked that this was high-contrast for the bees?

      We added the relevant reference in the revised manuscript.

      Line 240: how much sucrose solution was present in the flowers?

      We added 25 microliters sucrose solution in the flowers. We added the information in the revised manuscript.

      Line 266: check grammar.

      We checked the grammar as follows: “During tests, both strings were glued to the floor of the arena to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string.”

      Statistical analysis:

      - What does it mean that "Bees identity and colony were analyzed with likelihood ratio tests"?

      Bees identity and colony was set as a random variable. We changed the analysis methods in the revised manuscript, and results of the all the experiments did not changed.

      - Line 359: do you mean proportion rather than percentage?

      We mean the percentage.

      - "the number of total choices as weights" - this should be explained further. This is the number of choices that each bee made? What was the variation and mean of this number? If bees varied a lot in this metric, it might make more sense to analyze their first choice (as I see you've done) and their first 10 choices or something like that - for consistency.

      This refers to the total number of choices made by each bumblebee. We added the mean and standard error of each bee’s number of choices in Table 1. Some bees pulled the string fewer than 10 times; we chose to include all choices made by each bee.

      - More generally I think the first test is more informative than the subsequent choices, since every choice after their first could be affected by feedback they are getting in that test phase. Or rather, they are telling you different things.

      All the bees were tested only once, however, you might be referring to the first choice. We used Chi-square test to analyze the bumblebees’ first choices in the test. It is worth noting that both connected and disconnected strings were glued to the floor. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore,the feedback from pulling either the connected or disconnected strings is the same.

      - Line 362: I think I know what you mean, but this should be re-phrased because the "number of" sounds more appropriate for a Poisson distribution. I think what you are testing is whether each individual bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee?

      We agree with the reviewer that each bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee, but not the number. We clarify this as: “The total number of the choices made by each bee was set as weights.” 

      - Line 364-365: here and elsewhere, every time you mention a model, make it clear what the dependent and independent variables are. i.e. for the mixed model, the 'bee' is the random factor? Or also the colony that the bee came from? Were these nested etc?

      We clarify this in the revised manuscript. The bee identity and colony is the random factor in the mixed model.

      - Line 368: "Latency to the first choice of each bee was recorded" - why? What were the hypotheses/ predictions here?

      The latency to the first choice was intended to see if the bumblebees were familiarizing with the testing pattern. A shorter delay time might indicate that the bumblebees were more familiar with the pattern.

      - Line 371: "Multiple comparisons among experiments were.." - do you mean 'within' experiments? It seems that treatments should not be compared between different experiments.

      We mean multiple comparisons among different experiments; we clarify this in the revised manuscript.

      Results

      Experiment 1: From the methods, it sounded like you both analyzed the bees' first choice and their total no. of choices, but in the results section (and Figure 1) I only see the data for all choices combined here.

      In table 1 and in the text you report the number of bees that chose each option on their first choice, but there are no statistical results associated with these results. At the very least, a chi square or binomial test could be run.

      Line 138: "Interestingly, ten out of fifteen bees pulled the connected string in their first choice" - this is presented like it is a significant majority of bees, but a chi-square test of 10 vs 5 has a p-value = 0.1967

      We used the Chi square test to analyzed of the bees’ first choice. We also added the analyzed data in the Table 1.

      Line 143: "It makes sense because the bees could see the "lollipop shape" once they pulled it out from the table." - this feels more like interpretation (i.e. Discussion) rather than results.

      We moved the sentence to the discussion.

      Line 162: again this feels more like interpretation/ conjecture than results.

      We removed the sentence in the results.

      Line 184: check grammar.

      We checked the grammar. We changed “task” to “tasks”.

      Figures

      I really appreciated the overview in Figure 5 - though I think this should be Figure 1? Even if the methods come later in eLife, I think it would be nice to have that cited earlier on (e.g. at the start of the results) to draw the reader's attention to it quickly, since it's so helpful. It also then makes the images at the bottom of what is currently Figure 1 make more sense. I also think that the authors could make it clearer in Figure 5 which strings are connected vs disconnected in the figure (even if it means exaggerating the distance more than it was in real life). I had to zoom in quite a bit to see which were connected vs. not. Alternatively, you could have an arrow to the string with the words "connected" "disconnected" the first time you draw it - and similar labels for the other string conditions.

      We appreciate the valuable comment from the reviewer. We changed Figure 5 to Figure 2, and Figure 4 to Figure 1. We cited the Figures at the start of the results. We also changed the gap distance between the disconnected strings. Additionally, we added arrows to indicate “connected” and “disconnected” strings in the Figure.

      Figure 1 - I think you could make it clearer that the bars refer to experiments (e.g. have an x-axis with this as a label). Also, check the grammar of the y-axis.

      We added the experiments number in the Figures. Additionally, we checked the grammar of the y-axis. We changed “percentages” to “parentage”. 

      I also think it's really helpful to see the supplementary videos but I think it would be nice to see some examples of the test phase, and not just the training examples.

      We added Supplementary videos of the testing phase.

      Reviewer #2 (Recommendations For The Authors):

      Below are also some minor comments:

      L40: "approaches".

      We changed “approach” to “approaches”.

      L42: but likely mainly due to sampling bias of mammals and birds.

      We changed the sentence as follows: String pulling is one of the most extensively used approaches in comparative psychology to evaluate the understanding of causal relationships (Jacobs & Osvath, 2015), with most research focused on mammals and birds, where a food item is visible to the animal but accessible only by pulling on a string attached to the reward (Taylor, 2010; Range et al., 2012; Jacobs & Osvath, 2015; Wakonig et al., 2021).

      L64: remove "in this study"

      We removed “in this study”.

      L64: simple associative learning of what? Isn't your image matching associative too?

      We removed “ simple”.

      L97: remove "a" before "connected".

      We removed “a” before “connected”.

      L136-138: but maybe they could still feel the weight of the flower when pulling?

      Because both strings were glued to the floor in the test phase, the feedback was the same and therefore irrelevant. This information is noted in the General Methods.

      L161: what are these numbers?

      We removed the latency in the revised manuscript.

      L167/ Table 1: I realise that the authors never tried slanted strings to check if bumblebees used proximity as a cue. Why?

      This was simply because we wanted to focus on whether bumblebees could recognize the connectivity of the string.

      Discussion: Why did you only control for colour of the string? What if you had used strings with different textures or smells? Unclear if the authors controlled for "bumblebee smell" on the strings, i.e., after a bee had used the string, was the string replaced by a new one or was the same one used multiple times?

      We used different colors to investigate featural generalization of the visual display of the string connected to the flower in this task. We controlled for color because it is a feature that bumblebees can easily distinguish.

      Both the flowers and the strings were used only once, to prevent the use of chemosensory cues. We clarify this in the revised manuscript.

      L182: since what?

      We deleted “since” in the revised manuscript.

      L182-188: might be worth mentioning that some crows and parrots known for complex cognition perform poorly on broken strings (e.g., https://doi.org/10.1098/rspb.2012.1998 ; https://doi.org/10.1163/1568539X-00003511 ; https://doi.org/10.1038/s41598-021-94879-x ) and Australian magpies use trial and error (https://doi.org/10.1007/s00265-023-03326-6).

      We added the following sentences as suggested by the reviewer: “It is worth noting that some crows and parrots known for complex cognition perform poorly on the broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve the broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback was restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but they required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).”

      L193: maybe expand on this to put the task into a natural context?

      We added the following sentences as suggested by the reviewer:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      L204: is causal understanding the same as means-end understanding?

      Means-end understanding is expressed as goal-directed behavior, which involves the deliberate and planned execution of a sequence of steps to achieve a goal. Includes some understanding of the causal relationship (Jacobs & Osvath, 2015; Ortiz et al., 2019). .

      L235: this is a very big span of time. Why not control for motivation? Cognitive performance can vary significantly across the day (at least in humans).

      Bumblebee motivation is understood to be rather consistent, as those that were trained and tested came to the flight arena of their own volition and were foragers looking to fill their crop load each time to return it to the colony.

      L232: what is "(w/w)" ? This occurs throughout the manuscript.

      “w/w” represents the weight-to-weight percentage of sugar.

      L250: this sentence sounds odd. "containing in the central well.." ?? Perhaps rephrase? Unclear what central well refers to? Did the flowers have multiple wells?

      We rephrased the sentence as follows: For each experiment, bumblebees were trained to retrieve a flower with an inverted Eppendorf cap at the center, containing 25 microliters of 50% sucrose solution, from underneath a transparent acrylic table

      L268: why euthanise?

      The reason for euthanizing the bees is that new foragers will typically only become active after the current ones were removed from the hive.

      L270: chemosensory cues answer my concern above. Maybe make it clear earlier.

      We moved this sentence earlier in the result.

      L273: did different individuals use different pulling strategies? Do you have the data to analyse this? This has been done on birds and would offer a nice comparison.

      We analyzed the string-pulling strategies among different individuals, and provided Supplementary Table 1 to display the performances of each individual in different string-pulling experiments.

      L365: unclear why both models. Would be nice to see a GLM output table.

      The duration of pulling different kinds of strings were first tested with the Shapiro-Wilk test to assess data normality. The duration data that conforms to a normal distribution was compared using linear mixed-effects models (LMM), while the data that deviates from normality were examined with a generalized linear-mixed model (GLMM). We added a GLM and GLMM output table in the revised manuscript.

      L377: should be a space between the "." and "This".

      We added a space between the “.” and “This”.

      L383-390: some commas and semicolons are in the wrong places.

      We carefully checked the commas and semicolons in this sentence.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Line 32: seems to be missing a word, suggest "the bumblebees' ability to distinguish".

      we added “the” in the revised manuscript.

      Line 47: it would be good to reference other scholars here, this is the central focus of all work in comparative psychology.

      We added the reference in the revised manuscript.

      Line 50-61: I think the string-pulling literature could be described in more detail here, with mention of perceptual-motor feedback loops as a competing hypothesis to means-end understanding (see Taylor et al 2010, 2012). It seems a stretch to suggest that "String-pulling studies have directly tested means-end comprehension in various species", when perceptual-motor feedback is a competing hypothesis that we have positive evidence for in several species.

      We mentioned the perceptual-motor feedback in the introduction as follow:

      “Multiple mechanisms can be involved in the string-pulling task, including the proximity principle, perceptual feedback and means-end understanding (Taylor et al., 2012; Wasserman et al., 2013; Jacobs & Osvath, 2015; Wang et al., 2020). The principle of proximity refers to animals preferring to pull the reward that is closest to them (Jacobs & Osvath, 2015). Taylor et al. (2012) proposed that the success of New Caledonian crows in string-pulling tasks is based on a perceptual-motor feedback loop, where the reward gradually moves closer to the animal as they pull the strings. If the visual signal of the reward approaching is restricted, crows with no prior string-pulling experience are unable to solve the broken string task (Taylor et al., 2012).

      However, when a green table was placed behind the string to obscure the “lollipop” structure during the training, the bees could not see the “lollipop” during the initial training stage or after pulling the string from under the table. In this situation, the bees were unable to identify the connected string, further proving that bumblebees chose the connected string based on image matching.

      Line 68: suggest remove 'meticulously'.

      We removed “meticulously”.

      Line 99: This is an exciting finding, can the authors please provide a video of a bee solving this task on its first trial?

      We added videos in the supplementary materials.

      Line 133: perceptual-motor feedback loops should be introduced in the introduction.

      We introduced perceptual-motor feedback loops in the revised manuscript.

      Line 136: please clarify the prior experience of these bees, it is not clear from the text.

      We clarified the prior experience of these bees as follow: Bumblebees were initially attracted to feed on yellow artificial flowers, and then trained with transparent tables covered by black tape (S7 video) through a four-step process.

      Line 138: from the video it is not possible to see the bee's perspective of this occlusion. Do the authors have a video or image showing the feedback the bees received? I think this is highly important if they wish to argue that this condition prevents the use of both image matching and a perceptual-motor feedback loop.

      We prevented the use of image matching: the bees were unable to see the flower moving towards them above the table during the training phase in this condition. But the bees may receive visual image both after pulling the string out from the table and in the initial stages of training in this condition.

      Line 147: please clarify what experience these bees had before this test.

      We added the prior experience of bumblebees before training as follow: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis. Bumblebees were first trained to feed on yellow artificial, and then trained with the same procedure as Experiment 2, but the connected strings were coiled in the test.

      Line 155: This is a highly similar test to that used in Taylor et al 2012, have the authors seen this study?

      We mentioned the reference in the revised manuscript as follows: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis.

      Line 183: This sentence needs rewriting "Since the vast majority of animals, including dogs 183 (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al.,2016) and azure-winged magpies (Wang et al., 2019) are failing in such tasks spontaneously".

      We changed the sentence as suggested by the reviewer as follow:  Some animals, including dogs (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al., 2016) and azure-winged magpies (Wang et al., 2019) fail in such task spontaneously.

      Line 186: "complete comprehension of the functionality of strings is rare" I am not sure the evidence in the current literature supports any animal showing full understanding, can the authors explain how they reach this conclusion?

      We wished to say that few animal species could distinguish between connected and disconnected strings without trial and error learning. We revised the sentence as follows:

      It is worth noting that some crows and parrots known for complex cognition perform poorly on broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback is restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but it required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).

      Line 190: the authors need to clarify which part of their study provides positive evidence for this conclusion.

      We added the evidence for this conclusion as follows: Our findings suggest that bumblebees with experience of string pulling prefer the connected strings, but they failed to identify the interrupted strings when the string was coiled in the test.

      Line 265: was the far end of the string glued only?

      The entire string was glued to the floor, not just the far ends of the string.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      In this paper, the authors used target agnostic MBC sorting and activation methods to identify B cells and antibodies against sexual stages of Plasmodium falciparum. While they isolated some Mabs against PFs48/45 and PFs230, two well-known candidates for "transmission blocking" vaccines, these antibodies' efficacies, as measured by TRA, did not perform as well as other known antibodies. They also isolated one cross-reactive mAb to proteins containing glutamic acid-rich repetitive elements, that express at different stages of the parasite life cycle. They then determined the structure of the Fab with the highest protein binder they could determine through protein microarray, RESA, and observed homotypic interactions. 

      Strengths: 

      -  Target agnostic B cell isolation (although not a novel methodology). 

      -  New cross-reactive antibody with some "efficacy" (TRA) and mechanism (homotypic interactions) as demonstrated by structural data and other biophysical data. 

      Weaknesses: 

      The paper lacks clarity at times and could benefit from more transparency (showing all the data) and explanations. 

      We have added the oocyst count data from the SMFA experiments as Supplementary Table 2, and ELISA binding curves underlying Figure 4B as Supplementary Figure 5.

      In particular: 

      - define SIFA 

      - define TRAbs 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      - it is not possible to read the Figure 6B and C panels. 

      We regret that the labels in Supplementary Figures 6 and 7 were of poor quality and have now included higher resolution images to solve this issue.

      Reviewer #2 (Public Review): 

      This manuscript by Amen, Yoo, Fabra-Garcia et al describes a human monoclonal antibody B1E11K, targeting EENV repeats which are present in parasite antigens such as Pfs230, RESAs, and 11.1. The authors isolated B1E11K using an initial target agnostic approach for antibodies that would bind gamete/gametocyte lysate which they made 14 mAbs. Following a suite of highly appropriate characterization methods from Western blotting of recombinant proteins to native parasite material, use of knockout lines to validate specificity, ITC, peptide mapping, SEC-MALS, negative stain EM, and crystallography, the authors have built a compelling case that B1E11K does indeed bind EENV repeats. In addition, using X-ray crystallography they show that two B1E11K Fabs bind to a 16 aa RESA repeat in a head-to-head conformation using homotypic interactions and provide a separate example from CSP, of affinity-matured homotypic interactions. 

      There are some minor comments and considerations identified by this reviewer, These include that one of the main conclusions in the paper is the binding of B1E11K to RESAs which are blood stage antigens that are exported to the infected parasite surface. It would have been interesting if immunofluorescence assays with B1E11K mAb were performed with blood-stage parasites to understand its cellular localization in those stages. 

      In the current manuscript, we provide multiple lines of evidence that B1E11K binds (with high affinity) to repeats that are present in RESAs, i.e. through micro-array studies, in vitro binding experiments such as Western blot, ELISA and BLI, and through X-ray crystallography studies on B1E11k – repeat peptide complexes. Taken together, we think we provide compelling evidence that B1E11k binds to repeats present in RESA proteins. We do agree that studies on the function of this mAb against other stages of the parasite could be of interest, but as our manuscript focuses on the sexual stage of the parasite, we feel that this is beyond scope of the current work. However, this line of inquiry will be strongly considered in follow up studies.   

      Reviewer #3 (Public Review): 

      The manuscript from Amen et al reports the isolation and characterization of human antibodies that recognize proteins expressed at different sexual stages of Plasmodium falciparum. The isolation approach was antigen agnostic and based on the sorting, activation, and screening of memory B cells from a donor whose serum displays high transmission-reducing activity. From this effort, 14 antibodies were produced and further characterized. The antibodies displayed a range of transmission-reducing activities and recognized different Pf sexual stage proteins. However, none of these antibodies had substantially lower TRA than previously described antibodies. 

      The authors then performed further characterization of antibody B1E11K, which was unique in that it recognized multiple proteins expressed during sexual and asexual stages. Using protein microarrays, B1E11K was shown to recognize glutamate-rich repeats, following an EE-XX-EE pattern. An impressive set of biophysical experiments was performed to extensively characterize the interactions of B1E11K with various repeat motifs and lengths. Ultimately, the authors succeeded in determining a 2.6 A resolution crystal structure of B1E11K bound to a 16AA repeat-containing peptide. Excitingly, the structure revealed that two Fabs bound simultaneously to the peptide and made homotypic antibody-antibody contacts. This had only previously been observed with antibodies directed against CSP repeats. 

      Overall I found the manuscript to be very well written, although there are some sections that are heavy on field-specific jargon and abbreviations that make reading unnecessarily difficult. For instance, 'SIFA' is never defined. 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      Strengths of the manuscript include the target-agnostic screening approach and the thorough characterization of antibodies. The demonstration that B1E11K is cross-reactive to multiple proteins containing glutamate-rich repeats, and that the antibody recognizes the repeats via homotypic interactions, similar to what has been observed for CSP repeat-directed antibodies, should be of interest to many in the field. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1 - why only gametes ELISA and Spz or others?  

      The volumes of the single B cell supernatants were too small to screen against multiple antigens/parasite stages. As we aimed to isolate antibodies against the sexual stages of the parasite, our assay focused on this stage and supernatants were not tested against other stages. Furthermore, we screened for reactivity against gametes as TRA mAbs likely target gametes rather than other forms of sexual stage parasites.

      Figure 2 A 

      (a) Wild type (WT) and Pfs48/45 knock-out (KO) gametes.

      (b) I am a bit confused about what GMT is vs Pfs48/45 

      We have changed the column titles in Figure 2A to “wild-type gametes” and “Pfs48/45 knockout gametes” to improve clarity.  

      (c) Binding is high % why is it red? 

      We chose to present the results in a heatmap format with a graded color scale, from strong binders in red to weak binders in green. It has now been clarified in the legend of the figure. 

      Please state acronyms clearly 

      TRA - transmission reducing activity 

      SMFA - standard membrane feeding assay 

      We have added the full terms to clarify the acronyms.

      1123- VRC01 (not O1)

      We have corrected this.

      Figure 2 C bottom panels, clarify which ones are TRAbs (Assuming the Mabs with over 80% TRA at 500 ug/ml) (right gel) and the ones that are not (left gel)? 

      In the Western blot in Figure 2c, we have marked the antibodies with >80% TRA with an asterisk.

      Furthermore, we have replaced ‘TRAbs’ by ‘mAbs with >80% TRA at 500 µg/mL’ in the figure legend.

      ITC show the same affinity of the Fab to the 2 peptides but not the ELISA, not the BLI/SPR would be more appropriate. Any potential explanation?  

      The way binding affinity is determined across various techniques can result in slight differences in determined values. For instance, ELISAs utilize long incubation times with extensive washing steps and involve a spectroscopic signal, isothermal titration calorimetry (ITC) uses calorimetric signal at different concentration equilibriums to extract a KD, and BLI determines kinetic parameters for KD determination. Discrepancies in binding affinities between orthologous techniques have indeed been observed previously in the context of peptide-antibody binding (e.g. PMID: 34788599).

      Despite this, regardless of technique, the relative relationships in all three sets of data is the same - higher binding affinity is observed to the longer P2 peptide. This is the main takeaway of the section. As the reviewer suggests, BLI is likely the most appropriate readout here and is the only value explicitly mentioned in the main text. We primarily use ITC to support our proposed binding stoichiometry which is important to substantiate the SEC-MALS and nsEM data in Figure 4H-I. We added the following sentences to help reinforce these points: “The determined binding affinity from our ITC experiments (Table 1) differed from our BLI experiments (Fig. 4D and 4E), which can occur when measuring antibody-peptide interactions. Regardless, our data across techniques all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      Figure 5C - would be helpful to have the peptide sequence above referring to what is E1, E2 etc... 

      We added two panels (Figure 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help better orient the reader. 

      Figure S4 - maybe highlight in different colors the EENVV, EEIEE, Etc, etc 

      Repeats found in the sequence of the various proteins in Figure S4 have now been highlighted with different colors.

      Line 163 - why 14 mabs if 11 wells? Isn't it 1 B cell per well? The authors should explain right away that some wells have more than 1 B cell and some have 1 HC, 1LC, and 1 KC. 

      We agree that this was somewhat confusing and have modified the text which now reads: “We obtained and cloned heavy and light chain sequences for 11 out of 84 wells. For three wells we obtained a kappa light chain sequence and for five wells a lambda light chain sequence. For three wells we obtained both a lambda and kappa light chain sequence suggesting that either both chains were present in a single B cell or that two B cells were present in the well. For all 14 wells we retrieved a single heavy chain sequence. Following amplification and cloning, 14 mAbs, from 11 wells, were expressed as full human IgG1s (Table S1) (Dataset S1).”

      Line 166-167 - were they multiple HC (different ones) as well when Lambda and kappa were present?

      This is not clear at first. 

      We clarified this point in the text, see also comment above.

      Line 177 - expressed Pfs48/45 and Pfs230, is it lacking both or just Pfs48/45 (as stated on line 172)? 

      Pfs48/45 binds to the gamete surface via a GPI anchor, while Pfs230 is retained to the surface through binding to Pfs48/45. Hence, the Pfs48/45 knockout parasite will therefore also lack surfacebound Pfs230. We have added a sentence to the Results clarifying this: “The mAbs were also tested for binding to Pfs48/45 knock-out female gametes, which lack surface-bound Pfs48/45 and Pfs230”.

      Show the ELISA data used to calculate EC50 in Figure 3. 

      ELISA binding curves are now shown as Figure S5.

      Line 313-315 - what if you reverse, capture the Fab (peptide too small even if biotinylated?) 

      As anticipated by the Reviewer, immobilizing the Fab and dipping into peptide did not yield appreciable signal for kinetic analysis and thus the experiment from this setup is not reported. 

      Line 341 - add crystal structure 

      This has now been added.

      There is a bit too much speculation in the discussion. For e.g. "The B1C5L and B1C5K mAbs were shown to recognize Domain 2 of Pfs48/45 and exhibited moderate potency, as previously described for Abs with such specificity (27). These 2 mAbs were isolated from the same well and shared the same heavy chain; their three similar characteristics thus suggest that their binding is primarily mediated by the heavy chain". Actual data will reinforce this statement. 

      As B1C5L and B1C5K recognized domain 2 of Pfs48/45 with similar affinity, this strongly suggests that binding is mediated though the heavy chain. Structural analysis could confirm this statement, but this is out of the scope of this study.  

      Reviewer #2 (Recommendations For The Authors): 

      Figure 1: This figure provides a description of the workflow. To make it more relevant for the paper, the authors could add relevant numbers as the workflow proceeds. 

      (a) For example, how many memory B cells were sorted, how many supernatants were positive, and then how many mAbs were produced? These numbers can be attached to the relevant images in the workflow. 

      We modified the figure to include the numbers. 

      (b) For the "Supernatant screening via gamete extract ELISA", please change to "Supernatant screening via gamete/gametocyte extract ELISA". 

      We modified the statement as suggested. 

      Line 155: The manuscript states that 84 wells reacted with gamete/gametocyte lysate. The following sentence states that "Out of the 21 supernatants that were positive...". Can the authors provide the summary of data for all 84 wells or why focus on only 21 supernatants? 

      We screened all supernatants against gamete lysate, and only a subset against gametocyte lysate. In total, we found 84 positive supernatants that were reactive to at least one of the two lysates. 21 of those 84 positive were screened against both lysates. We have modified the text to clarify the numbers:

      “After activation, single cell culture supernatants potentially containing secreted IgGs were screened in a high-throughput 384-well ELISA for their reactivity against a crude Pf gamete lysate (Fig. S1B). A subset of supernatants was also screened against gametocyte lysate (S1C). In total, supernatants from 84 wells reacted with gamete and/or gametocyte lysate proteins, representing 5.6% of the total memory B cells. Of the 21 supernatants that were screened against both gamete and gametocyte lysates, six recognized both, while nine appeared to recognize exclusively gamete proteins, and six exclusively gametocyte proteins.”

      Please note that all 84 positive wells were taken forward for B cell sequencing and cloning. 

      Line 171: SIFA is introduced for the first time and should be completely spelled out.

      We have corrected this. 

      Figure 2: 

      (a) In Figure 2A, can you change the column title from "% pos KO GMT" to "% pos Pfs48/45 KO GMT"?

      We have changed the column titles.  

      (b) In Figure 2B, the SMFA results have been converted to %TRA. Can the authors please provide the raw data for the oocyst counts and number of mosquitoes infected in Supplementary Materials? 

      We have added oocyst count data in Table S2, to which we refer in the figure legend. 

      (c) For Figure 2F, the authors do have other domains to Pfs230 as described in Inklaar et al, NPJ Vaccines 2023. An ELISA/Western to the other domains could identify the binding site for B2C10L, though we appreciate this is not the central result of this manuscript. 

      We thank the reviewer for this suggestion. We are indeed planning to identify the target domain of B2C10L using the previously described fragments, but agree with the reviewer that this not the focus of the current manuscript and decided to therefore not include it in the current report.

      Line 116: The word sporozoites appears in subscript and should be corrected to be normal text. 

      We have corrected this.

      Line 216: Typo "B1E11K" 

      We have corrected this.

      Materials and Methods: 

      (a) PBMC sampling: Please add the ethics approval codes in this section. 

      Donor A visited the hospital with a clinical malaria infection and provided informed consent for collection of PBMCs. We have modified the method section to clarify this. 

      “Donor A had lived in Central Africa for approximately 30 years and reported multiple malaria infections during that period. At the time of sampling PBMCs, Donor A had recently returned to the Netherlands and visited the hospital with a clinical malaria infection. After providing informed consent, PBMCs were collected, but gametocyte prevalence and density were not recorded.”

      (b) Gamete/Gametocyte extract ELISA: Can the authors please provide the concentration of antibodies used for the positive and negative controls (TB31F, 2544, and 399) 

      We have added the concentrations for these mAbs in the methods section.

      Recombinant Pfs48/45 and Pfs230 ELISA: Please state the concentration or molarity used for the coating of recombinant Pfs48/45 and Pfs230CMB. 

      We have added the concentrations, i.e. 0.5 µg/mL, to the methods section.

      Western Blotting: The protocol states that DTT was added to gametocyte extracts (Line 594), but Western Blots in Figures 2 and 3 were performed in non-reducing conditions. Please confirm whether DTT was added or not. 

      Thank you for noting this. We did not use DTT for the western blots and have removed this line from the methods section.

      Reviewer #3 (Recommendations For The Authors): 

      Below are a few minor comments to help improve the manuscript. 

      (1) In Figure 4E, are the BLI data fit to a 1:1 binding model? The fits seem a bit off, and from ITC and X-ray studies it is known that 2 Fabs bind 1 peptide. The second Fab should presumably have higher affinity than the first Fab since the second Fab will make interactions with both the peptide and the first Fab. It may be better to fit the BLI data to a 2:1 binding model. 

      The 2:1 (heterogeneous ligand) model assumes that there are two different independent binding sites. However, the second binding event described is dependent on the first binding event and thus this model also does not accurately reflect the system. Given that there is not an ideal model to fit, we instead are careful about the language used in the main text to describe these results. Additionally, we also include a sentence to the results section to ensure that the proper findings/interpretations are highlighted: “…our data all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      (2) The sidechain interactions shown in Figures 5C and D could probably be improved. The individual residues are just 'floating' in space, causing them to lack context and orientation. 

      We added two panels (Fig. 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help orient the reader.  

      (3) The percentage of Ramachandran outliers should be listed in Table 2. Presumably, the value is 0.2%, but this is omitted in the current table. 

      Table 2 has been modified to include the requested information explicitly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript introduced a new behavioral apparatus to regulate the animal's behavioral state naturally. It is a thermal maze where different sectors of the maze can be set to different temperatures; once the rest area of the animal is cooled down, it will start searching for a warmer alternative region to settle down again. They recorded with silicon probes from the hippocampus in the maze and found that the incidence of SWRs was higher at the rest areas and place cells representing a rest area were preferentially active during rest-SWRs as well but not during non-REM sleep.

      We thank the reviewer for carefully reading our manuscript and providing useful and constructive comments.

      Strengths:

      The maze can have many future applications, e.g., see how the duration of waking immobility can influence learning, future memory recall, or sleep reactivation. It represents an out-of-the-box thinking to study and control less-studies aspects of the animals' behavior.

      Weaknesses:

      The impact is only within behavioral research and hippocampal electrophysiology.

      We agree with this assessment but would like to add that the intersection of electrophysiological recordings in behaving animals is a very large field. Behavioral thermoregulation is a hotly researched area also by investigators using molecular tools as well. The ThermoMaze can be used for juxtacellular/intracellular recordings in behaving animals. Restricting the animal’s movement during these recordings can improve the length of recording time and recorded single unit yield in these experiments. 

      Moreover, the fact that animals can sleep within the task can open up new possibilities to compare the role of sleep in learning without having to move the animal from a maze back into its home cage. The cooling procedure can be easily adapted to head-fixed virtual reality experiments as well.

      I have only a few questions and suggestions for future analysis if data is available.

      Comment-1: Could you observe a relationship between the duration of immobility and the preferred SWR activation of place cells coding for the current (SWR) location of the animal? In the cited O'Neill et al. paper, they found that the 'spatial selectivity' of SWR activity gradually diminished within a 2-5min period, and after about 5min, SWR activity was no longer influenced by the current location of the animal. Of course, I can imagine that overall, animals are more alert here, so even over more extended immobility periods, SWRs may recruit place cells coding for the current location of the animal.

      We thank the reviewer for raising this question, which is a fundamental issue that we attempted to address using the ThermoMaze. First, we indeed observed persistent place-specific firing of CA1 neurons for up to around 5 minutes, which was the maximal duration of each warm spot epoch, as shown by the decoding analysis (based on firing rate map templates constructed during SPW-Rs) in Figure 5C and D. However, we did not observe above-chance-level decoding of the current position of the animal during sharp-wave ripples using templates constructed during theta, which aligns with previous observation that CA1 neurons during “iSWRs” (15–30 s time windows surrounding theta oscillations) did not show significant differences in their peak firing rate inside versus outside the place field (O’Neil et al., 2006). We reasoned that this could be potentially explained by a different (although correlated, see Figure 5E) neuronal representation of space during theta and during awake SPW-R.

      Comment-2: Following the logic above, if possible, it would be interesting to compare immobility periods on the thermal maze and the home cage beyond SWRs, as it could give further insights into differences in rest states associated with different alertness levels. E.g., power spectra may show a stronger theta band or reduced delta band compared to the home cage.

      If we are correct the Reviewer would like to know whether the brain state of the animal was similar in the ThermoMaze (warm spot location) and in the home cage during immobility. A comparison of the time-evolved power spectra shows similar changes from walking to immobility in both situations without notable differences. This analysis was performed on a subset of animals (n = 17 sessions in 7 mice) that were equipped with an accelerometer (home cage behavior was not monitored by video). We detected rest epochs that lasted at least 2 seconds during wakefulness in both the home cage and ThermoMaze. Using these time points we calculated the event-triggered power spectra for the delta and theta band (±2 s around the transition time) and found no difference between the home cage and ThermoMaze (Suppl. Fig. 4D).

      Prompted by the Reviewer’s question, we further quantified the changes in LFP in the two environments. We did not find any significant change in the frequencies between 1-40 Hz during Awake periods, but we did find higher delta power (1-4 Hz) in some animals in the ThermoMaze (Suppl. Fig. 4A, B). 

      We have also quantified the delta and theta power spectra in the few cases, when the warm spot was maintained, and the animal fell asleep. The time-resolved spectra classified the brain state as NREM, similar to sleeping in the home cage. Both delta and theta power were higher in the ThermoMaze following Awake-NREM transitions (±30 seconds around the transition, Suppl. Fig. 4C). It might well be that immobility/sleep outside the mouse’s nest might reflect some minor (but important) differences but our experiments with only a single camera recording do not have the needed resolution to reveal minor differences in posture.

      We added these results to the revised Supplementary material (Suppl. Fig. 4).

      Comment-3: Was there any behavioral tracking performed on naïve animals that were placed the first time in the thermal maze? I would expect some degree of learning to take place as the animal realizes that it can find another warm zone and that it is worth settling down in that area for a while. Perhaps such a learning effect could be quantified.

      Unfortunately, we did not record videos during the first few sessions in the ThermoMaze. Typically, we transferred a naïve animal into the ThermoMaze for an hour on the first day to acclimatize them to the environment. This was performed without video analysis. In addition, because the current version of the maze is relatively small (20 x 20 cm), the animal usually walked around the edges of the maze before settling down at a heated warm spot. It appeared to us that there was only a very weak drive to learn the sequence and location of the warm spot, and therefore we did not quantified learning in the current experiment. We agree with the reviewer that in future studies, it will be interesting to explore whether the ThermoMaze could be adapted to a land-version of the Morris water maze by increasing the size of the maze and performing more controlled behavioral training and testing.

      Comment-4: There may be a mislabeling in Figure 6g because the figure does not agree with the result text - the figure compares the population vector similarly of waking SWR vs sleep SWRs to exploration vs waking SWR and exploration vs sleep SWRs.

      We thank the reviewer for raising the point, we have updated the labels accordingly.

      Reviewer #2 (Public Review):

      In this manuscript, Vöröslakos and colleagues describe a new behavioural testing apparatus called ThermoMaze, which should facilitate controlling when a mouse is exploring the environment vs. remaining immobile. The floor of the apparatus is tiled with 25 plates, which can be individually heated, whereas the rest of the environment is cooled. The mouse avoids cooled areas and stays immobile on a heated tile. The authors systematically changed the location of the heated tile to trigger the mouse's exploratory behaviours. The authors showed that if the same plate stays heated longer, the mouse falls into an NREM sleep state. The authors conclude their apparatus allows easy control of triggering behaviours such as running/exploration, immobility and NREM sleep. The authors also carried out single-unit recordings of CA1 hippocampal cells using various silicone probes. They show that the location of a mouse can be decoded with above-chance accuracy from cell activity during sharp wave ripples, which tend to occur when the mouse is immobile or asleep. The authors suggest that consistent with some previous results, SPW-Rs encode the mouse's current location and any other information they may encode (such as past and future locations, usually associated with them).

      We thank the reviewer for carefully reading our manuscript and providing useful and constructive comments.

      Strengths:

      Overall, the apparatus may open fruitful avenues for future research to uncover the physiology of transitions from different behavioural states such as locomotion, immobility, and sleep. The setup is compatible with neural recordings. No training is required.

      Weaknesses:

      I have a few concerns related to the authors' methodology and some limitations of the apparatus's current form. Although the authors suggest that switching between the plates forces animal behaviour into an exploratory mode, leading to a better sampling of the enclosure, their example position heat maps and trajectories suggest that the behaviour is still very stereotypical, restricted mostly to the trajectories along the walls or the diagonal ones (between two opposite corners). This may not be ideal for studying spatial responses known to be affected by the stereotypicity of the animal's trajectories. Moreover, given such stereotypicity of the trajectories mice take before and after reaching a specific plate, it may be that the stable activity of SWR-P ripples used for decoding different quadrants may be representing future and/or past trajectories rather than the current locations suggested by the authors. If this is the case, it may be confusing/misleading to call such activity ' place-selective firing', since they don't necessarily encode a given place per se (line 281).

      We agree with the reviewer that the current version of the ThermoMaze does not necessarily motivate the mice to sample the entire maze during warm spot transitions. However, we did show correlational evidence that neuronal firing during awake sharp-wave ripples is place-selective. Both firing rate ratios and population vectors of CA1 neurons showed a reliable correlation between those during movement and awake sharp-wave ripples (Figure 5 E and F), indicating that spatial coding during movement persists into awake SWR-P state. This finding rejects the hypothesis that neuronal firing during ripples throughout the Cooling sub-session encodes past/future trajectories, which could be explained by a lack of goal-directed behavior in order to perform the task. We hope to test whether such place-specific firing during ripples can be causally involved in maintaining an egocentric representation of space in a future study.

      Besides, we have attempted to motivate the animal to visit the center of the maze during the Cooling sub-session. Moving the location of warm spots from the corners can shape the animals’ behavior and promote more exploration of the environment as we show in Suppl. Fig. 5. We agree with the Reviewer that the current size of the ThermoMaze poses these limitations. However, an example future application could be to warm the floor of a radial-arm maze by heating Peltier elements at the ends of maze arms and center in an otherwise cold room, allowing the experimenter to induce ambulation in the 1-dimensional arms, followed by extended immobility and sleep at designated areas.

      Another main study limitation is the reported instability of the location cells in the Thermomaze. This may be related to the heating procedure, differences in stereotypical sampling of the enclosure, or the enclosure size (too small to properly reveal the place code). It would be helpful if the authors separate pyramidal cells into place and non-place cells to better understand how stable place cell activity is. This information may also help to disambiguate the SPW-R-related limitations outlined above and may help to solve the poor decoding problem reported by the authors (lines 218-221).

      The ThermoMaze is a relatively small enclosure (20 x 20 cm) compared to typical 2D arenas (60 x 60 cm) used in hippocampal spatial studies. Due to the small environment, one possibility is that CA1 neurons encode less spatial information and only a small number of place cells could be found. Therefore, we identified place cells in each sub-session. We found 40.90%, 45.32%, and 41.26% of pyramidal cells to be place cells in the Pre-cooling, Cooling, and Post-cooling sub-sessions, respectively. Furthermore, we found on average 17.36% of pyramidal neurons pass the place cell criteria in all three sub-sessions in a daily session. Therefore, the strong decorrelation of spatial firing maps across sub-sessions cannot be explained by poor recording quality or weak neuronal encoding of spatial information but is potentially due to changes in environmental conditions.

      Some additional points/queries:

      Comment-1: Since the authors managed to induce sleeping on the warm pads during the prolonged stays, can they check their hypothesis that the difference in the mean ripple peak frequency (Fig. 4D) between the home cage and Thermomaze was due to the sleep vs. non-sleep states?

      In response to the reviewer’s comment, we compared the ripple peak frequency that occurred during wakefulness and NREM epochs in the home cage and ThermoMaze (n = 7 sessions in 4 mice). We found that the peak frequency of the awake ripples was higher compared to both home cage and ThermoMaze NREM sleep (one-way ANOVA with Tukey’s posthoc test, ripple frequencies were: 171.63 ± 11.69, 172.21 ± 11.86, 168.19 ± 11.10 and 168.26 ± 11.08 Hz mean±SD for home cage awake, ThermoMaze awake, home cage NREM and ThermoMaze NREM conditions, p < 0.001 between awake and NREM states). We added this quantification to the revised manuscript.

      Author response image 1.

      NREM sleep either in home cage or in ThermoMaze affects ripple mean peak frequency similarly.

      Comment-2: How many cells per mouse were recorded? How many of them were place cells? How many place cells at the same time on average? What are the place field size, peak, and mean firing rate distributions in these various conditions? It would be helpful if they could report this.

      For each animal on a given day, the average number of cells recorded was 57.5, which depended on the electrodes and duration after implantation. We first applied peak firing rate and spatial information thresholds to identify place cells in each sub-session (see more details in the revised Methods section for place cell definition). We found 40.90%, 45.32%, and 41.26% of pyramidal cells to be place cells in the Pre-cooling, Cooling, and Post-cooling sub-sessions respectively. Furthermore, we found on average 17.36% of pyramidal neurons pass the place cell criteria in all three sub-sessions in a daily session.

      For place cells identified in each sub-session, their place fields size is on average 61.03, 79.86, and 57.51 cm2 (standard deviation = 60.13, 69.98, and 49.64 cm2; Pre-cooling, Cooling, and Post-cooling correspondingly). A place field was defined to be a contiguous region of at least 20 cm2 (20 spatial bins) in which the firing rate was above 60% of the peak firing rate of the cell in the maze (Roux and Buzsaki et al., 2017). A place field also needs to contain at least one bin above 80% of the peak firing rate in the maze. With such definition, the average place field peak firing rate is 5.84, 5.22, and 6.48 Hz (standard deviation = 5.11, 4.65, and 5.83 Hz) and the average mean firing rate within the place fields is 4.54, 4.05, and 5.07 Hz (standard deviation = 4.00, 3.60, and 4.60).

      We would like to point out that these values depend strongly on the definition of place fields, which vary widely across studies. We reason that the ThermoMaze paradigm induced place field remapping which has been reported to occur upon changes in the environment such as visual cues (Leutgeb et al., 2009). We hypothesize that temperature gradient is an important aspect among the environmental cues, thus remapping is expected. Overall, we did not aim for biological discoveries in the first presentation of the ThermoMaze. Instead, our limited goal was the detailed description of the method and its validation for behavioral and physiological experiments.

      References

      (1) Mizuseki K, Royer S, Diba K, Buzsáki G. Activity dynamics and behavioral correlates of CA3 and CA1 hippocampal pyramidal neurons. Hippocampus. 2012 Aug;22(8):1659-80. doi: 10.1002/hipo.22002. Epub 2012 Feb 27. PMID: 22367959; PMCID: PMC3718552.

      (2) Skaggs WE,McNaughton BL,Gothard KM,Markus EJ. 1993. An information-theoretic approach to deciphering the hippocampal code. In: SJ Hanson, JD Cowan, CL Giles, editors. Advances in Neural Information Processing Systems, Vol. 5. San Francisco, CA: Morgan Kaufmann. pp 1030–1037.

      (3) Roux L, Hu B, Eichler R, Stark E, Buzsáki G. Sharp wave ripples during learning stabilize the hippocampal spatial map. Nat Neurosci. 2017 Jun;20(6):845-853. doi: 10.1038/nn.4543. Epub 2017 Apr 10. PMID: 28394323; PMCID: PMC5446786.

      (4) Markus, E.J., Barnes, C.A., McNaughton, B.L., Gladden, V.L. & Skaggs, W.E. Spatial information content and reliability of hippocampal CA1 neurons: effects of visual input. Hippocampus 4, 410–421 (1994).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The work is well performed and thoroughly convincing. 

      However, a few points could be improved, by adjusting the manuscript: 

      (1) The wording of the abstract is confusing for the casual reader. The initial impression is that the 2-copy complexes contain the majority of the PSD95 copies. This is not the case, as shown in panel cii. It would be important for the authors to explain in the abstract the exact percentage of molecules found within 2-copy complexes. 

      We have now amended the abstract, making it clear that it’s not most of the complexes.  

      (2) Did the authors find a sizeable population of 2-copy complexes by investigating wild-type proteins, using nanobody labeling (Figure S2)? It would be important to quantify and discuss these data. 

      It was not possible to perform this analysis on the wild-type proteins. The quantification would rely on all the PSD95 molecules being bound by the antibody, which we cannot guarantee. Furthermore, the nanobody labeling would need to be stoichiometric. 

      (3) The authors quote the separation value of 12.7 nm throughout their text, including the abstract. This may be somewhat misleading since the authors investigate the PSD95-GFP molecules, labeled using anti-GFP nanobodies. The large size of the two GFP molecules (~3 nm), and that of the nanobodies, will influence the readout. Two groups have already reported a separation of ~7-8 nm between neighboring PSD95 molecules in synapses, using PSD95 nanobodies, to minimize the linkage

      error: https://doi.org/10.1101/2022.08.03.502284 and https://doi.org/10.1101/2023.10.18.562 700  

      The difference observed here is consistent with an effect of the additional GFP moieties; the authors should cite these works (albeit they are now only provided as biorXiv pre-prints) and should mention this discrepancy, and its potential tagging-related explanation. 

      We have now referenced the work and referred to this in the discussion.

      (4) The authors may want to re-check the manuscript; some minor problems should be corrected, such as the mislabeling of Figure 2 and "Figure 5". 

      This has now been corrected.  

      Reviewer #2 (Recommendations For The Authors): 

      The authors suggest that the stability of the PSD95 dimeric complex correlates with memory formation. However, the turnover experiments were conducted only on three-month-old animals, which can be considered to be at a stage of lower synaptic functionality turnover. It would be appropriate to study dimer turnover during the memory formation period at three to four weeks of age, for example in comparison to the oldest mice. 

      Alternatively, it might be interesting to study the turnover in the hippocampus following exposure to a memory test. 

      Whilst potentially useful, these experiments are outside of the scope of this manuscript.   

      It is not clear whether the different turnover identified in various brain areas is statistically significant, as apparently no statistical analysis has been conducted. 

      The findings were significant, and the SI table containing the p-values has been emphasized further in the manuscript.  

      Reviewer #3 (Recommendations For The Authors): 

      (1) In the last paragraph of the Results section, it could be made clearer what the nature is of the correlation between PSD95 half-life and mixed supercomplexes to understand how to interpret this correlation. In the discussion, it is concluded that stable synapses have long protein lifetimes and slow replacement of scaffolding proteins. However, this is based on the correlation of protein lifetime and mixed supercomplexes in the cortex, which does not provide any evidence that this relation is true in single synapses or is specific for stable synapses. To make this statement, the authors could for instance directly correlate the stoichiometry of supercomplexes with the protein lifetime and size of individual synapses. 

      Unfortunately, we can’t directly measure the lifetime of each complex, and so it’s only possible to compare region-to-region. In doing so, we found that there was a correlation between the protein lifetime and the “mixed” population.  

      (2) Some essential parts seem missing: the materials and methods and Figure 2 are not included. Also, the numbering of figures is incorrect. Both in the figure legends and the text. 

      This has been added. 

      (3) Figure 1a could contain more details of the experimental procedures. For example, it could be made clearer how PSD95 supercomplexes are isolated from brain homogenate. 

      This is now presents in the methods. 

      (4) In Figure 1c, single molecules of PSD95 are identified using PALM with a resolution of 30 nm. However, in Figure 1d it is shown that PSD95 molecules reside on average 13 nm apart, indicating that a resolution of 30 nm is not sufficient to resolve single PSD95 molecules. In addition, it would be of interest to show the distribution of fluorophore separation (assessed with MINFLUX) of only the supercomplexes with two PSD95 molecules, since only these were used to calculate the average distance. 

      The 13 nm distance was measured using MINFLUX, as stated in the text. The fluorophore separation distances are shown in Figure 1dii.

      (5) In the introduction, the authors could be more explicit in their explanation of memory formation and storage and how the presented study contributes to that field. 

      We thank the reviewer for the suggestion, but feel that such a discussion in the introduction would detract from the main points of the manuscript.  

      (6) Throughout the manuscript the authors prominently cite their own work, but relevant literature on synaptic plasticity and synapse nanostructure (EM and super-resolution studies) is lacking. 

      Further references have now been added.  

      (7) The results depicted in Figure 4b would be easier to interpret if a stacked histogram (including error bars) was used. 

      We agree that the data could be presented in such a way, but that would prevent the results from the biological repeats, along with the variation, being presented.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Intro. 

      47-48 rewrite sentence

      This sentence has been rewritten as: Photoreceptor synapses are specialized with a vesicle-associated ribbon organelle and postsynaptic neurites of horizontal and bipolar cells that invaginate deep within the terminal

      Results 

      Major comment. Lines 100-103 

      The new rod data presented here looks like an n = 1. Neither the Results section nor Supp Fig S1, describe the number of cells used. Nor do the authors offer a statistical description with averages, etc.. In addition, the single traces are much improved over their previous study (Maddox et al eLife 2020), but the authors have not described any new approach or trick that improved their rod Ica. Neither Methods section nor Supp section describes the procedure for patching rods (solutions, or Vh which is critical for assessing T-type currents). 

      Suggestion, if more data exists, then present it. Otherwise, drop the argument. 

      The recording methodology for recording rods was like that for cones and this has been clarified in the Methods section (lines 725-752). Averaged data (n= at least 5 per group) and statistical analyses have been added to Fig.S1 (renamed Figure 2-Figure Supplement 1), and clearly show that no Ca2+ currents are present in the KI rods.

      Supp Fig S2. The legend needs to be fixed. Conversion to PDF file may have created these formatting errors. 

      This has been corrected (renamed Figure 3-Figure Supplement 2).

      Fig 8 a. The position of the light stimulus bar in the KO panel appears to be out of place, shifted too far to the left. 

      This has been corrected.

      Major comments. 219-221 

      The use of Fluo3-AM is not properly stated here. The text reads "cone pedicles filled with the Ca2+ indicator Fluo3". The wording used could be wrongly interpreted as: whole-cell filling of the cones via patch electrode. However, the Methods section describes bathing the retina in Fluo3-AM, which presumably fills PRs, HCs tips, Mueller glia and bpc dendrites. The Results section should acknowledge that the retina was loaded with Fluo3-AM. 

      The cell types, and their processes (Muellers, HCs, bpc, PRs), present in a cone pedicle ROI will likely contribute to the Fluo3 readout of Ca2+ in the OPL, because 1) the EM images in Fig 7 highlight how interdigitated the processes are with the presynapse, 2) all express Cav channels, and many if not all express L-Type Cavs in their processes (glia, HC, on-bcs and PRs), and 3) all are depolarized with the addition of high extracellular KCl. The inclusion of Isradipine will inhibit L-type Cavs on pre- and post-synaptic targets, failing to specifically isolate PR Ca2+. Furthermore, Glu Receptor blockers are used here, which would be a great idea if the cones were stimulated with light; however, KCl bypasses the excitatory synaptic pathway and depolarizes all processes within the ROI. Hence, all cellular parts in the ROI will potentially contribute to Fluo3-Ca2+ signals. 

      Suggestions for presentation of these findings. Ultimately your conclusion is suitable " 233 to 234...... Taken together, our results suggest that Cav3 channels nominally support Ca2+ signals and synaptic transmission in cones of G369i KI mice". The dramatic reduction in Fluo3-Ca2+ signals in the OPL G369i retinas (Fig 9) is a valuable finding for the following reasons: 1) the results do not show a clear compensation from intracellular stores that could potentially supersede the T-type currents in the G369i (which is an argument you make), and 2) there is a massive loss of Ca2+ influx in the OPL of G369i retinas. Since G369i is specific to the PRs, and only cones are present in the mutant G369i, the loss of Fluo3-Ca2+ signal in the mutant ROI reflects in large part loss of cone Fluo3-Ca2+ signals. Your findings illustrate the severity of the mutation, which has also been addressed in the various electro-physio sections of the MS. 

      Figure 9 also needs to be more clear about 1) the loading of the cells with AM-dye, and 2) the presence of glia, HCs and bc dendrites in the PNA demarcated ROIs. 

      We regret that we did not make this more clear, but our Fluo 3 loading protocol of whole retina followed by vertical slicing allowed for loading primarily of photoreceptors in the portion of the outer retina that we imaged. We clarified this with the following edit to the text (lines 220-226):

      “To test if the diminished HC light responses correlated with lower presynaptic Ca2+ signals in G369i KI cones, we performed 2-photon imaging of vertical slices prepared from whole retina that was incubated  with the Ca2+ indicator Fluo3-AM and  Alexa-568-conjugated peanut agglutinin (PNA) to demarcate regions of interest (ROIs) corresponding to cone pedicles. With this approach Fluo3 fluorescence was detected only in photoreceptors and ganglion cells and not inner retinal cell-types (e.g., horizontal cells, bipolar cells, Mueller cell soma). Thus, Ca2+ signals reported by Fluo3 fluorescence near PNA-labeling originated primarily from cones.”

      We also note that given the considerably larger volume of the cone pedicle relative to the postsynaptic neurites of horizontal and bipolar cells, as well as neighboring glia, it seems unlikely that the latter would contribute significantly to the isradipine-sensitive Ca2+ signal measured in the ROI above the PNA labeling. Moreover, to our knowledge the contribution of Cav1 L-type channels to postsynaptic Ca2+ signals in the dendritic tips of horizontal cells and bipolar cells has not been demonstrated.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Major shortcomings include the unusual normalization strategies used for many experiments and the lack of quantification/statistical analyses for several experiments. Because of these omissions, it is difficult to conclude that the data justify the conclusions. The significance of the data presented is overstated, as many of the experiments presented confirm/support previously published work. The study provides a modest advance in the understanding of the complex issue of SHH membrane extraction.

      Major shortcomings include the unusual normalization strategies used for many experiments and the lack of quantification/statistical analysis for several experiments.

      This statement is not correct for the revised manuscript: The normalization strategies used are clearly described in the manuscript and are not unusual. Each experiment is now statistically analyzed.

      The significance of the data presented is overstated, as many of the experiments presented confirm/support previously published work.

      As reviewer 2 correctly points out, there are many competing models for Hedgehog release. Our study cannot possibly support them all - the reviewer's statement is therefore misleading. In fact, our careful biochemical analysis of the mechanistics of Dispatched- mediated Shh export supports only two of them: The model of proteolytic processing of Shh lipid anchors (shedding) and the model of lipoprotein-mediated Shh transport. In contrast, our study does not support the predominant model of Dispatched-mediated extraction of dual-lipidated Shh and delivery to Scube2, which is currently thought to act as a soluble Shh chaperone. We also do not support Dispatched function in Shh endocytic recycling and cytoneme loading, or any of the other models such as exosome-mediated or micelle Shh transport.

      Reviewer #2 (Public Review):

      A novel and surprising finding of the present study is the differential removal of Shh N- or C- terminal lipid anchors depending on the presence of HDL and/or Disp. In particular, the identification of a non-palmitoylated but cholesterol-modified Shh variant that associates with lipoproteins is potentially important. The authors use RP-HPLC and defined controls to assess the properties of processed forms of Shh, but their precise molecular identity remains to be defined. One caveat is the heavy reliance on overexpression of Shh in a single cell line. The authors detect Shh variants that are released independently of Disp and Scube2 in secretion assays, but these are excluded from interpretation as experimental artifacts. Therefore, it would be important to demonstrate key findings in cells that endogenously secrete Shh.

      We would like to respond as follows:

      The authors use RP-HPLC and defined controls to assess the properties of processed forms of Shh, but their precise molecular identity remains to be defined.

      This is the original reviewers statement regarding our original manuscript submission. We believe that the biochemical and functional data presented in the VOR clearly describe the molecular identity of solubilized Shh: it is monolipidated, lipoprotein-associated, and highly biologically active in two established Shh bioassays.

      One caveat is the heavy reliance on overexpression of Shh in a single cell line.

      As stated by reviewer 1, the strength of our work is the use of a bicistronic SHH-Hhat system to consistently generate doubly lipidated ligand to determine the amount and lipidation status of SHH released into cell culture media. This unique system therefore eliminates the artifacts of protein overexpression. We have also added two other cell lines to our VOR that produce the same results (including Panc1 cells that endogenously produce Shh, Supplementary Figure 1).

      The authors detect Shh variants that are released independently of Disp and Scube2 in secretion assays, but these are excluded from interpretation as experimental artifacts.

      As the reviewer correctly points out, these variants are released independently of Disp and Scube2, both of which are known as essential release factors in vivo. These variants are therefore by definition experimental artifacts. The forms we have included in our analysis are the alternative forms that are clearly dependent on Dispatched and Scube2 for their release - as shown in the first figure in the manuscript, and in pretty much every other figure after that.


      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Key shortcomings include the unusual normalization strategies used for many experiments and the lack of quantification/statistical analyses for several experiments.

      In the updated version of the paper, we have addressed all of this reviewer's criticisms. Most importantly, we have performed several additional experiments to address the concern that unusual normalization strategies were used in our paper and that quantification and statistical analyses were lacking for several experiments. We have now analyzed the full set of release conditions for Shh and engineered proteins from Disp-expressing n.t. control cells and Disp-/- cells both in the presence and absence of Scube2 (Figure 1A'-D', Figure 2E added to the paper, Figure 3B'-D', Figure 5C and Figure S2F-H). Previously, we had only quantified protein release from n.t. controls and Disp-/- cells in the presence but not in the absence of Scube2 under serum-depleted conditions. Quantifications of serum-free protein release and Shh release under conditions ranging from 0.05% FCS to 10% FCS were completely missing from the earlier versions of the manuscript, but have now been added to our paper. In addition, we have reanalyzed all of the data sets in the above figures, as well as Figures 2C and S1B, to address the issue of "unusual normalization strategies": unlike previous assays in which the highest amount of protein detected in the media was set to 100% and all other proteins in that experiment were expressed relative to that value, we now directly compare the relative amounts of cellular and corresponding solubilized proteins as a method to quantify release without the need for data normalization (Figs. 1A'-D', 2C,E, 3B'-D', E, 5C, Fig. S1B, S2F-H).

      We have also repeated the qPCR analyses in C3H10T1/2 cells and now show that the same Shh/C25AShh activities can be observed when using another Shh responsive cell line, NIH3T3 cells (Fig. 4B, 6B, fig. S5B).

      We would like to point out that if the criticism refers to the presentation of our RP-HPLC and SEC data, the normalization of the strongest eluted protein signal to 100% for all proteins tested is necessary to put their behavior in a clearer relationship. This is because only the relative positions of protein elution, and not their amounts, are important in these experiments.

      The significance of the data provided is overstated because many of the presented experiments confirm/support previously published work.

      To mitigate the first reviewer's comment that the significance of the data presented is overstated, we now clearly distinguish between our novel results and the known aspect of Hh release on lipoproteins throughout our paper. We now clearly describe what is new and important in our paper: First, contrary to the general perception in the field, Disp and Scube2 are not sufficient to solubilize Shh, casting doubt on the currently accepted model that Scube2 accepts dual-lipidated Shh from Disp and transports it to the receptor Ptch. Second, lipoproteins shift dual Shh processing to N-terminal peptide processing only to generate different soluble Hh forms with different activities (as shown in Figure 4C). Third, and again contrary to popular belief, this new release mode does not inactivate Shh, as we now show in two established cellular assays for Hh biofunction (Figures 4A-C, 5B'', 6B and S5C-G). Fourth, and most importantly, we show that spatiotemporally controlled, Disp-, Scube2- and HDL-mediated Shh release absolutely requires dual lipidation of the membrane-associated Shh precursor prior to its release. This finding (as shown in Figures 1 and S2) changes the interpretation of previously published in vivo data that have long been interpreted as evidence for the requirement of dual Shh lipidation for full receptor binding and activation.

      The study provides a modest advance in our understanding of the complex issue of Shh membrane extraction.

      Although we agree that our results integrate our novel observations into previously established concepts of Hh release and trafficking, we also hope that our data cast well-founded doubt on the current view that the issue of Hh release and trafficking is largely resolved by the model of Disp-mediated Shh hand-over to Scube2 and then to Ptch, which requires interactions with both Shh lipids. Our data show that this is clearly not the case in the presence of lipoproteins. Thus, the significance of our data is that models of Shh lipid-regulated signaling to Ptch obtained using the dual-lipidated Shh precursor prior to its Disp- and Scube2-mediated conversion into a delipidated or monolipidated, HDL-associated soluble ligand are likely to describe a non-physiological interaction. Instead, our work describes a highly bioactive soluble ligand with only one lipid still attached, which has not been described before in the literature. The in vivo endpoint analyses presented in Fig. S8 suggest that this new protein variant is likely to play an important role during development.

      Reviewer #2 (Public Review):

      The precise molecular identity (of the released Shh) remains to be defined.

      We would like to respond that the direct comparison of soluble proteins and their well-defined double-lipidated precursors side-by-side in the same experiment, as shown in our paper, determines all relevant molecular changes in the Shh release process. Most importantly, we show by SDS-PAGE and RP-HPLC that HDL restricts Shh processing to the N-terminus and that the absence of HDL results in double processing of Shh during its release. We also show by SEC that the C-terminus binds the protein to HDL. In addition, the fly experiments confirm the requirement for N-terminal Hh processing, but not for processing of the C-terminal peptide, and suggest that the N-terminal Cardin-Weintraub sequence replaced by the functionally blocking tag represents the physiological cleavage site.

      It would be important to demonstrate key findings in cells that secrete Shh endogenously.

      We now confirm the key findings of our study in Panc1 cells that endogenously produce and secrete Shh: As shown in Fig. S1D, we find that soluble proteins are processed but retain the C-cholesterol, which we now directly confirm by RP-HPLC (Fig. S4F-H). The in vivo analyses shown in Fig. S8 suggest that the key finding - that N-terminal but not C-terminal Hh shedding is required for release - can be supported, at least in the fly: here, Hh variants impaired in their ability to be processed N-terminally strongly repress the endogenous protein, whereas the same protein impaired in its ability to be processed C-terminally does not.

      The authors detect Shh variants that are expressed independently of Disp and Scube2 in secretion assays, but are excluded from interpretation as experimental artifacts.

      We agree with the reviewer's criticism that the amounts of Shh released independently of Disp and Scube2 in secretion assays were not quantified and analyzed statistically to justify their proposed status as not physiologically relevant. We now show that these forms are indeed secretion artifacts (Fig. 3E and Fig. S2F-H show quantification of the lower electrophoretic mobility protein fraction (i.e., the "top" band representing the double-lipidated soluble protein fraction)) because this fraction is released independently of Disp and Scube2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This interesting study explores the mechanism behind an increased susceptibility of daf-18/PTEN mutant nematodes to paralyzing drugs that exacerbate cholinergic transmission. The authors use state-of-theart genetics and neurogenetics coupled with locomotor behavior monitoring and neuroanatomical observations using gene expression reporters to show that the susceptibility occurs due to low levels of DAF-18/PTEN in developing inhibitory GABAergic neurons early during larval development (specifically, during the larval L1 stage). DAF-18/PTEN is convincingly shown to act cell-autonomously in these cells upstream of the PI3K-PDK-1-AKT-DAF-16/FOXO pathway, consistent with its well-known role as an antagonist of this conserved signaling pathway. The authors exclude a role for the TOR pathway in this process and present evidence implicating selectivity towards developing GABAergic neurons. Finally, the authors show that a diet supplemented with a ketogenic body, β-hydroxybutyrate, which also counteracts the PI3K-PDK-1-AKT pathway, promoting DAF-16/FOXO activity, partially rescues the proper development (morphology and function) of GABAergic neurons in daf-18/PTEN mutants, but only if the diet is provided early during larval development. This strongly suggests that the critical function of DAF18/PTEN in developing inhibitory GABAergic neurons is to prevent excessive PI3K-PDK-1-AKT activity during this critical and particularly sensitive period of their development in juvenile L1 stage worms. Whether or not the sensitivity of GABAergic neurons to DAF-18/PTEN function is a defining and widespread characteristic of this class of neurons in C. elegans and other animals, or rather a particularity of the unique early-stage GABAergic neurons investigated remains to be determined.

      Strengths:

      The study reports interesting and important findings, advancing the knowledge of how daf-18/PTEN and the PI3K-PDK-1-AKT pathway can influence neurodevelopment, and providing a valuable paradigm to study the selectivity of gene activities towards certain neurons. It also defines a solid paradigm to study the potential of dietary interventions (such as ketogenic diets) or other drug treatments to counteract (prevent or revert?) neurodevelopment defects and stimulate DAF-16/FOXO activity.

      Weaknesses:

      (1) Insufficiently detailed methods and some inconsistencies between Figure 4 and the text undermine the full understanding of the work and its implications.

      The incomplete methods presented, the imprecise display of Figure 4, and the inconsistency between this figure and the text, make it presently unclear what are the precise timings of observations and treatments around the L1 stage. What exactly do E-L1 and L1-L2 mean in the figure? The timing information is critical for the understanding of the implications of the findings because important changes take place with the whole inhibitory GABAergic neuronal system during the L1 stage into the L2 stage. The precise timing of the events such as neuronal births and remodelling events are welldescribed (e.g., Figure 2 in Hallam and Jin, Nature 1998; Fig 7 in Mulcahy et al., Curr Biol, 2022). Likewise, for proper interpretation of the implication of the findings, it is important to describe the nature of the defects observed in L1 larvae reported in Figure 1E - at present, a representative figure is shown of a branched commissure. What other types of defects, if any, are observed in early L1 larvae? The nature of the defects will be informative. Are they similar or not to the defects observed in older larvae?

      We thank the reviewer for highlighting these areas for improvement. We have updated and clarified the timing of observation in the text, figures, and methodology section accordingly.

      All experiments were conducted using age-synchronized animals. Gravid worms were placed on NGM plates and removed after two hours. The assays were then carried out on animals that hatched from the eggs laid during this specific timeframe.

      Regarding the detailed timings outlined in the original Figure 4 (now Figure 5 in the revised version), we provided the following information in the revised version: For experiments involving continuous exposure to βHB throughout development, the gravid worms were placed on NGM plates containing the ketone body and removed after two hours. Therefore, this exposure covered the ex-utero embryonic development period up to the L4-Young adult stage when the experiments were conducted.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5, revised version), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A, revised version). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage. The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage. The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage.

      All this information has been conveniently included in Figure 5, text (Page13, lines 259-276), and in methodology (Page 4, Lines 85-90, Revised Methods and Supplementary information) of the revised manuscript.

      In response to the reviewer's suggestion, we have also included photos of daf-18 worms at the L1 stage (30 min/1h post-hatching). Defects are already present at this early stage, such as handedness and abnormal branching commissures, which are also observed in adult worm neurons (see Supplementary Figure 4, revised version). 

      These defects manifest in DD neurons shortly after larval birth. The prevalence of animals with errors is higher in L4 worms (when both VDs and DDs are formed) compared to early L1s (Figures 3 C-E and Supplementary Figure 4, revised version). This suggests that defects in VD neurons also occur in daf-18 mutants. Indeed, when we analyzed the neuronal morphology of several wild-type and daf-18 mutant animals, we found defects in the commissures corresponding to both DD and VD neurons (Supplementary Figure 3, revised version). 

      These data are now included in the revised version (Results (Page 10, lines 177-196), Discussion (Pages 14-16), Main Figure 3, and Supplementary Figures 3, 4 and 7 revised version)

      (2) The claim of proof of concept for a reversal of neurodevelopment defects is not fully substantiated by data.

      The authors state that the work "constitutes a proof of concept of the ability to revert a neurodevelopmental defect with a dietary intervention" (Abstract, Line 56), however, the authors do not present sufficient evidence to distinguish between a "reversal" or prevention of the neurodevelopment defect by the dietary intervention. This clarification is critical for therapeutic purposes and claims of proof-of-concept. From the best of my understanding, reversal formally means the defect was present at the time of therapy, which is then reverted to a "normal" state with the therapy. On the other hand, prevention would imply an intervention that does not allow the defect to develop to begin with, i.e., the altered or defective state never arises. In the context of this study, the authors do not convincingly show reversal. This would require showing "embryonic" GABAergic neuron defects or showing convincing data in newly hatched L1 (0-1h), which is unclear if they do so or not, as I have failed to find this information in the manuscript. Again, the method description needs to be improved and the implications can be very different if the data presented in Figure 2D-E regard newly born L1 animals (0-1h) or L1 animals at say 5-7h after hatching. This is critical because the development of the embryonically-born GABAergic DD neurons, for instance, is not finalized embryonically. Their neurites still undergo outgrowth (albeit limited) upon L1 birth (see DataS2 in Mulcahy et al., Curr Biol 2022), hence they are susceptible to both committing developmental errors and to responding to nutritional interventions to prevent them. In contrast to embryonic GABAergic neurons, embryonic cholinergic neurons (DA/DB) do not undergo neurite outgrowth post-embryonically (Mulcahy et al., Curr Biol 2022), a fact which could provide some mechanistic insight considering the data presented. However, neurites from other post-embryonically-born neurons also undergo outgrowth postembryonically, but mostly during the second half of the L1 stage following their birth up to mid-L2, with significant growth occurring during the L1-L2 transition. These are the cholinergic (VA/VB and AS neurons) and GABAergic (VD) neurons. The fact that AS neurons undergo a similar amount of outgrowth as VD neurons is informative if VD neurons are or are not susceptible to daf-18/PTEN activity. Independently, DD neurons are still quite unique on other aspects (see below), which could also bring insight into their selective response.

      Finally, even adjusting the claim to "constitutes a proof-of-concept of the ability of preventing a neurodevelpmental defect with a dietary intervention" would not be completely precise, because it is unclear how much this work "constitutes a proof of concept". This is because, unless I misunderstood something, dietary interventions are already applied to prevent neurodevelopment defects, such as when folic acid supplementation is recommended to pregnant women to prevent neural tube defects in newborns.

      Thank you very much for pointing out this issue and highlighting the need to further investigate the ameliorative capacity of βHB on GABAergic defects in daf-18 mutants. In the revised version, we have included experiments to address this point.

      Our microscopy analyses strongly indicate that the development of DD neurons is affected, with errors observed as early as one-hour post-hatching (Main Figure 3, and Supplementary Figures  4 and 7, revised version). Additionally, based on the position of the commissures in L4s, our results strongly suggest that VD neurons are also affected (Supplementary Figure 3, revised version). Both, the frequency of animals with errors and the number of errors per animal are higher in L4s compared to L1 larvae (Main Figures 3,  and Supplementary Figure 4 and 7, revised version). It is very likely that the errors in VD neurons, which are born in the late L1 stage, are responsible for the higher frequency of defects observed in L4 animals. 

      As the reviewer noted, GABAergic DD neurons, which are born embryonically, do not complete their development during the embryonic stages. Some defects in DD neurons may arise during the postembryonic period. Following the reviewer's suggestion, we analyzed L1 larvae at different times before the appearance of VDs (1 hour post-hatching and 6 hours post-hatching). We did not observe an increase in error prevalence, suggesting that DD defects in daf-18 mutants are mostly embryonic (Supplementary Fig 4B, Revised Version). 

      Our findings suggest that βHB's enhancement is not due to preventive effects in DDs, as defects persist in newly hatched larvae regardless of βHB presence (Supplementary Figure 7, revised version), and postembryonic DD growth does not introduce new errors (Supplementary Figure 4, revised version). This lack of preventive effect could be due to βHB's limited penetration into the embryonic environment. Unlike early L1s, significant improvement occurs in L4s upon βHB early exposure (Supplementary Figure 7, revised version). This could be explained by a reversing effect on malformed DD neurons and/or a protective influence on VD neuron development. While we cannot rule out the first option, even if all errors in DDs in L1 were repaired (which is very unlikely), it wouldn't explain the level of improvement in L4 (Supplementary Figure 7, revised version). Therefore, we speculate that VDs may be targeted by βHB. The notion that exposure to βHB during early L1 can ameliorate defects in neurons primarily emerging in late L1s (VDs) is intriguing. We may hypothesize that residual βHB or a metabolite from prior exposure could forestall these defects in VD neurons. Notably, βHB has demonstrated a capacity for long-lasting effects through epigenetic modifications (Reviewed in He et al, 2023, https://doi.org/10.1016%2Fj.heliyon.2023.e21098). More work is needed to elucidate the underlying fundamental mechanisms regarding the ameliorating effects of βHB supplementation. We have now discussed these possibilities under discussion (Page 17, lines 369-383, revised version).

      We agree with the reviewer that the term "reversal" is not accurate, and we have avoided using this terminology throughout the text. Furthermore, in the title, we have decided to change the word "rescue" to "ameliorate," as our experiments support the latter term but not the former. Additionally, the reviewer is correct that folic acid administration to pregnant women is already a metabolic intervention to prevent neural tube defects. In light of this, we have avoided claiming this as proof of concept in the revised manuscript 

      (3) The data presented do not warrant the dismissal of DD remodeling as a contributing factor to the daf-18/PTEN defects.

      Inhibitory GABAergic DD neurons are quite unique cells. They are well-known for their very particular property of remodeling their synaptic polarity (DD neurons switch the nature of their pre- and postsynaptic targets without changing their wiring). This process is called DD remodeling. It starts in the second half of the L1 stage and finishes during the L2 stage. Unfortunately, the fact that the authors find a specific defect in early GABAergic neurons (which are very likely these unique DD neurons) is not explored in sufficient detail and depth. The facts that these neurons are not fully developed at L1, that they still undergo limited neurite growth, and that they are poised for striking synaptic plasticity in a few hours set them apart from the other explored neurons, such as early cholinergic neurons, which show a more stable dynamics and connectivity at L1 (see Mulcahy et al., Curr Biol 2022).

      The authors use their observation that daf-18/PTEN mutants present morphological defects in GABAergic neurons prior to DD remodeling to dismiss the possibility that the DAF-18/PTEN-dependent effects are "not a consequence of deficient rearrangement during the early larval stages". However, DD remodeling is just another cell-fate-determined process and as such, its timing, for instance, can be affected by mutations in genes that affect cell fates and developmental decisions, such as daf-18 and daf-16, which affect developmental fates such as those related with the dauer fate. Specifically, the authors do not exclude the possibility that the defects observed in the absence of either gene could be explained by precocious DD remodeling. Precocious DD remodeling can occur when certain pathways, such as the lin-14 heterochronic pathway, are affected. Interestingly, lin-14 has been linked with daf16/FOXO in at least two ways: during lifespan determination (Boehm and Slack, Science 2005) and in the

      L1/L2 stages via the direct negative regulation of an insulin-like peptide gene ins-33 (Hristova et al., Mol Cell Bio 2005). It is likely that the prevention of DD dysfunction requires keeping insulin signaling in check (downregulated) in DD neurons in early larval stages, which seems to coincide with the critical timing and function of daf-18/PTEN. Hence, it will be interesting to test the involvement of these genes in the daf-18/daf-16 effects observed by the authors.

      This is another interesting point raised by the reviewer. We have demonstrated that defects manifest in early L1 (30 min-1 hour post-hatching) which corresponds to a pre-remodeling time in wild-type worms.

      We acknowledge the possibility of early remodeling in specific mutants as pointed out by the reviewer.

      However, the following points suggest that the effects of these mutations may extend beyond the particularity of DD remodeling: i) Our experiments also show defects in VD neurons in daf-18 mutants (Supplementary Figure 3, revised version), as discussed in our previous response. These neurons do not undergo significant remodeling during their development. ii) DAF-18 and DAF-16 deficiencies produce neurodevelopmental alteration on other Non-Remodeling Neurons: Severe neurite defects in neurons that are nearly fully formed at larval hatching, such as AIY in daf-18 and daf-16 mutants, have been previously reported (Christensen et al., 2011). Additionally, the migration of another neuron, HSN, is severely affected in these mutants (Kennedy et al., 2013). iii) To the best of our knowledge, DD remodeling only alters synaptic polarity without forming new commissures or significant altering the trajectory of the formed ones. Thus, it is unlikely (though not impossible) for remodeling defects to cause the observed commissural branching and handedness abnormalities in DD neurons. Therefore, we think that the impact of daf-18 mutations on GABAergic neurons is not primarily linked to DD remodeling but extends to various neuron types. It is intriguing and requires further exploration in the future, the apparent resilience of cholinergic motor neurons to these mutations. This resilience is not limited to daf18/PTEN animals since mutants in certain genes expressed in both neuron types (such as neuronal integrin ina-1 or eel-1, the C. elegans ortholog of HUWE1) alter the function or morphology of GABAergic neurons but not cholinergic motor neurons (Kowalski, J. R. et al. Mol Cell Neurosci 2014; Oliver, D. et al. J Dev Biol (2019); Opperman, K. J. et al. Cell Rep 2017). These points are discussed in the manuscript (Discussion, page 15, lines 311-322, revised version) and reveal the existence of compensatory or redundant mechanisms in these excitatory neurons, rendering them much more resistant to both morphological and functional abnormalities.

      Discussion on the impact of the work on the field and beyond:

      The authors significantly advance the field by bringing insight into how DAF-18/PTEN affects neurodevelopment, but fall short of understanding the mechanism of selectivity towards GABAergic neurons, and most importantly, of properly contextualizing their findings within the state-of-the-art C. elegans biology.

      For instance, the authors do not pinpoint which type of GABAergic neuron is affected, despite the fact that there are two very well-described populations of ventral nerve cord inhibitory GABAergic neurons with clear temporal and cell fate differences: the embryonically-born DD neurons and the postembryonically-born VD neurons. The time point of the critical period apparently defined by the authors (pending clarifications of methods, presentation of all data, and confirmation of inconsistencies between the text and figures in the submitted manuscript) could suggest that DAF-18/PTEN is required in either or both populations, which would have important and different implications. An effect on DD neurons seems more likely because an image is presented (Figure 2D) of a defect in an L1 daf-18/PTEN mutant larva with 6 neurons (which means the larva was processed at a time when VD neurons were not yet born or expressing pUnc-47, so supposedly it is an image of a larva in the first half of the L1 stage (0-~7h?)). DD neurons are also likely the critical cells here because the neurodevelopment errors are partially suppressed when the ketogenic diet is provided at an "early" L1 stage, but not later (e.g., from L2-L3, according to the text, L2-L4 according to the figure? ).

      Thank you for this insightful input. As previously mentioned, we conducted experiments in this revision to clarify the specificity of GABAergic errors in daf-18/PTEN mutants, in particular, whether they affect DDs, VDs, or both. Our results suggest that commissural defects are not limited to DD neurons but also occur in VD neurons (Supplementary Figure 3). Regarding the effect of βHB, our findings suggest that VD neurons are targets of βHB action. As mentioned in the previous response and the discussion section (Page 17, lines 369-383, revised version), we might speculate that lingering βHB or a metabolite from prior exposure could mitigate these defects in VD neurons that are born in Late L1s-Early L2s. Additionally, βHB has been noted for its capacity to induce long-term epigenetic changes. Therefore, it could act on precursor cells of VD neurons, with the resulting changes manifesting during VD development independently of whether exposure has ceased. All these possibilities are now discussed in the manuscript.

      Acknowledging that our work raises several questions that we aim to address in the future, we believe our manuscript provides valuable information regarding how the PI3K pathway modulates neuronal development and how dietary interventions can influence this process.

      This study brings important contributions to the understanding of GABAergic neuron development in C. elegans, but unfortunately, it is justified and contextualized mostly in distantly-related fields - where the study has a dubious impact at this stage rather than in the central field of the work (post-embryonic development of C. elegans inhibitory circuits) where the study has stronger impact. This study is fundamentally about a cell fate determination event that occurs in a nutritionally-sensitive

      developmental stage (post-embryonic L1 larval stage) yet the introduction and discussion are focused on more distantly related problems such as excitatory/inhibitory (E/I) balance, pathophysiology of human diseases, and treatments for them. Whereas speculation is warranted in the discussion, the reduced indepth consideration of the known biology of these neurons and organisms weakens the impact of the study as redacted. For instance, the critical role of DAF-18/PTEN seems to occur at the early L1 larval stage, a stage that is particularly sensitive to nutritional conditions. The developmental progression of L1 larvae is well-known to be sensitive to nutrition - eg, L1 larvae arrest development in the absence of food, something that is explored in nematode labs to synchronize animals at the L1 stage by allowing embryos to hatch into starvation conditions (water). Development resumes when they are exposed to food. Hence, the extensive postembryonic developmental trajectory that GABAergic neurons need to complete is expected to be highly susceptible to nutrition. Is it? The sensitivity towards the ketogenic diet intervention seems to favor this. In this sense, the attribution of the findings to issues with the nutrition-sensitive insulin-like signaling pathway seems quite plausible, yet this possibility seems insufficiently considered and discussed.

      We greatly appreciate the reviewer's emphasis on the sensitivity of the L1 stage to nutritional status. As the reviewer points out, C. elegans adjusts its development based on food availability, potentially arresting development in L1 in the absence of food. It is therefore reasonable that both the completion of DD neuron trajectories and the initial development steps of VD neurons are particularly sensitive to dietary modulation of the insulin pathway, in which both DAF-18 and DAF-16 play roles. This important point has also been included in the discussion (Page 18, lines 384-407, revised version).

      Finally, the fact that imbalances in excitatory/inhibitory (E/I) inputs are linked to Autism Spectrum Disorders (ASD) is used to justify the relevance of the study and its findings. Maybe at this stage, the speculation would be more appropriate if restricted to the discussion. In order to be relevant to ASD, for instance, the selectivity of PTEN towards inhibitory neurons should occur in humans too. However, at present, the E/I balance alteration caused by the absence of daf-18/PTEN in C. elegans could simply be a coincidence due to the uniqueness of the post-embryonic developmental program of GABAergic neurons in C. elegans. To be relevant, human GABAergic neurons should also pass through a unique developmental stage that is critically susceptible to the PI3K-PDK1-AKT pathway in order for DAF18/PTEN to have any role in determining their function. Is this the case? Hence, even in the discussion, where the authors state that "this study provides universally relevant information on.... the mechanisms underlying the positive effects of ketogenic diets on neuronal disorders characterized by GABA dysfunction and altered E/I ratios", this claim seems unsubstantiated as written particularly without acknowledging/mentioning the criteria that would have to be fulfilled and demonstrated for this claim to be true.

      Our results suggest that defects in GABAergic neurons are not limited to DDs, which, as the reviewer rightly notes, are quite unique in their post-embryonic development primarily due to the synaptic remodeling process they undergo. These defects also extend to VD neurons, which do not exhibit significant developmental peculiarities once they are born. Therefore, we think that the defects are not specific to the developmental program of DD neurons but are more related to all GABAergic motoneurons. Additionally, the observation of defects in non-GABAergic neurons in C. elegans daf-18 mutants supports the hypothesis that the role of daf-18 is not limited to DD neurons (Christensen et al., 2011; Kennedy et al., 2013).

      In mammals, Pten conditional knockout (cKO) animals have been extensively studied for synaptic connectivity and plasticity, revealing an imbalance between synaptic excitation and inhibition (E/I balance) (Reviewed in Rademacher and Eickholt, 2019, Cold Spring Harbor Perspect Med, https://doi.org/10.1101%2Fcshperspect.a036780). This imbalance is now widely accepted as a key pathological mechanism linked to the development of ASD-related behavior (Lee et al, 2017; Biological Psychiatry, https://doi.org/10.1016/j.biopsych.2016.05.011) . The importance of PTEN in the development of GABAergic neurons in mammals is well-documented. For instance, embryonic PTEN deletion from inhibitory neurons impacts the establishment of appropriate numbers of parvalbumin and somatostatin-expressing interneurons, indicating a central role for PTEN in inhibitory cell development (Vogt et al, 2015, Cell Rep, https://doi.org/10.1016%2Fj.celrep.2015.04.019). Additionally, conditional PTEN knockout in GABAergic neurons is sufficient to generate mice with seizures and autism-related behavioral phenotypes (Shin et al, 2021, Molecular Brain, https://doi.org/10.1186%2Fs13041-02100731-8). Moreover, while mice in which PV GABAergic neurons lacked both copies of Pten experienced seizures and died, heterozygous animals (PV-Pten+/−) showed impaired formation of perisomatic inhibition (Baohan et al, 2016, Nature Comm, OI: 10.1038/ncomms12829). Therefore, there is substantial evidence in mammals linking PTEN mutations to neurodevelopmental disorders in general and affecting GABAergic neurons in particular. Hence, we believe that the role of daf-18/PTEN in GABAergic development could be a more widespread phenomenon across the animal kingdom rather than a specific process unique to C. elegans.

      Beyond the points discussed, we have addressed the reviewer's comment regarding the last sentence of the abstract. We have revised it to more cautiously frame the relationship between our findings, ASD, and mammalian neurodevelopmental disorders.

      Reviewer #2 (Public Review):

      Summary:

      Disruption of the excitatory/inhibitory (E/I) balance has been reported in Autism Spectrum Disorders

      (ASD), with which PTEN mutations have been associated. Giunti et al choose to explore the impact of PTEN mutations on the balance between E/I signaling using as a platform the C. elegans neuromuscular system where both cholinergic (E) and GABAergic (I) motor neurons regulate muscle contraction and relaxation. Mutations in daf-18/PTEN specifically affect morphologically and functionally the GABAergic (I) system, while leaving the cholinergic (E) system unaffected. The study further reveals that the observed defects in the GABAergic system in daf-18/PTEN mutants are attributed to reduced activity of DAF-16/FOXO during development.

      Moreover, ketogenic diets (KGDs), known for their effectiveness in disorders associated with E/I imbalances such as epilepsy and ASD, are found to induce DAF-16/FOXO during early development. Supplementation with β-hydroxybutyrate in the nematode at early developmental stages proves to be both necessary and sufficient to correct the effects on GABAergic signaling in daf-18/PTEN mutants.

      Strengths:

      The authors combined pharmacological, behavioral, and optogenetic experiments to show the

      GABAergic signaling impairment at the C. elegans neuromuscular junction in DAF-18/PTEN and DAF-

      16/FOXO mutants. Moreover, by studying the neuron morphology, they point towards

      neurodevelopmental defects in the GABAergic motoneurons involved in locomotion. Using the same set of experiments, they demonstrate that a ketogenic diet can rescue the inhibitory defect in the daf18/PTEN mutant at an early stage.

      Weaknesses:

      The morphological experiments hint towards a pre-synaptic defect to explain the GABAergic signaling impairment, but it would have also been interesting to check the post-synaptic part of the inhibitory neuromuscular junctions such as the GABA receptor clusters to assess if the impairment is only presynaptic or both post and presynaptic.

      Moreover, all observations done at the L4 stage and /or adult stage don't discriminate between the different GABAergic neurons of the ventral nerve cord, ie the DDs which are born embryonically and undergo remodeling at the late L1 stage, and VDs which are born post-embryonically at the end of the L1 stage. Those additional elements would provide information on the mechanism of action of the FOXO pathway and the ketone bodies.

      Thank you for your insightful suggestions. 

      This is an initial study that serves as a cornerstone, demonstrating the sensitivity of GABAergic neuron development to alterations in the PI3K pathway and how these alterations can be mitigated by a dietary intervention with a ketone body. While we have determined that the transcription factor DAF-16/FOXO is essential in the neurodevelopmental process and is the target of ketone bodies to alleviate defects, there are still underlying mechanisms to be elucidated. This is only the first step that opens many avenues for further investigation, including the study of post-synaptic partners.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in future research. This includes examining GABAergic receptors as well as cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre- and post-synaptic components (such as rab-3, unc-49, unc29, acr-16 fusion to GFP or mCherry). Unfortunately, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, which require significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet.  Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal developmental defects leading to functional alterations in daf-18/PTEN mutants and the novel finding that these can be mitigated by supplementing food with hydroxybutyrate. We will study the structure and functionality of the post-synapse in our future projects and also plan to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      We also agree that discriminating between DD and VD neurons provides significant insights into the neurodevelopmental phenomena dependent on the FOXO pathway and the action of βHB. In this revised version, we present evidence that not only DD neurons are affected but also VD neurons (see

      Supplementary Figure 3, revised version). This allows us to suggest that daf-18 affects the development of GABAergic neurons regardless of whether they are born embryonically (DDs) or post-embryonically (VDs) (see also our response to the previous reviewer). We hope to distinguish the defects observed in each type of neuron in future studies. For this, we would need to use strains specifically marked in one neuronal type or another, which, for the same reasons mentioned earlier, would take a considerable amount of time under current conditions. 

      Conclusion:

      Giunti et al provide fundamental insights into the connection between PTEN mutations and neurodevelopmental defects through DAF-16/FOXO and shed light on the mechanisms through which ketogenic diets positively impact neuronal disorders characterized by E/I imbalances.  

      Reviewer #3 (Public Review):

      Summary:

      This is a conceptually appealing study by Giunti et al in which the authors identify a role for PTEN/daf-18 and daf-16/FOXO in the development of inhibitory GABA neurons, and then demonstrate that a diet rich in ketone body β-hydroxybutyrate partially suppresses the PTEN mutant phenotypes. The authors use three assays to assess their phenotypes: (1) pharmacological assays (with levamisole and aldicarb); (2) locomotory assays and (3) cell morphological assays. These assays are carefully performed and the article is clearly written. While neurodevelopmental phenotypes had been previously demonstrated for PTEN/daf-18 and daf-16/FOXO (in other neurons), and while KB β-hydroxybutyrate had been previously shown to increase daf-16/FOXO activity (in the context of aging), this study is significant because it demonstrates the importance of KB β-hydroxybutyrate and DAF-16 in the context of neurodevelopment. Conceptually, and to my knowledge, this is the first evidence I have seen of a rescue of a developmental defect with dietary metabolic intervention, linking, in an elegant way, the underpinning genetic mechanisms with novel metabolic pathways that could be used to circumvent the defects.

      Strengths:

      What their data clearly demonstrate, is conceptually appealing, and in my opinion, the biggest contribution of the study is the ability of reverting a neurodevelopmental defect with a dietary intervention that acts upstream or in parallel to DAF-16/FOXO.

      Weaknesses:

      The model shows AKT-1 as an inhibitor of DAF-16, yet their studies show no differences from wildtype in akt-1 and akt-2 mutants. AKT is not a major protein studied in this paper, and it can be removed from the model to avoid confusion, or the result can be discussed in the context of the model to clarify interpretation.

      Thank you very much for the suggestion. We agree with the reviewer's appreciation that the study of AKT's action itself is too limited in this study to draw conclusions that would allow its inclusion in the proposed model. Therefore, following the reviewer's suggestion, we have removed this protein from our model

      When testing additional genes in the DAF-18/FOXO pathway, there were no significant differences from wild-type in most cases. This should be discussed. Could there be an alternate pathway via DAF-18/DAF16, excluding the PI3K pathway or are there variations in activity of PI3K genes during a ketogenic diet that are hard to detect with current assays?

      Thank you for bringing up this point. Our pharmacological experiments indeed demonstrate that all mutants associated with an exacerbation of the PI3K pathway, which typically inhibits nuclear translocation and activity of the transcription factor DAF-16, lead to imbalances in E/I

      (excitation/inhibition) that manifest as hypersensitivity to cholinergic drugs. This includes the gain of function of pdk-1 and the loss of function of daf-18 and daf-16 itself. In our subsequent experiments, we demonstrate that this exacerbation of the PI3K pathway leads to errors in the neurodevelopment of GABAergic neurons, which explains the hypersensitivity to aldicarb and levamisole.

      As the reviewer remarks, it is intriguing why mutants inhibiting this pathway do not show differences in their sensitivity to cholinergic drugs compared to wild-type animals. We can speculate, for instance, that during neurodevelopment, there is a critical period where the PI3K pathway must remain with very low activity (or even deactivated) for proper development of GABAergic neurons. This could explain why there are no differences in sensitivity to cholinergic drugs between mutants that inhibit the PI3K pathway and the wild type. The PI3K pathway depends on insulin-like signals, which are in turn positively modulated by molecules associated with the presence of food. Interestingly, larval stage 1 is particularly sensitive to nutritional status, being able to completely arrest development in the absence of food. Therefore, dietary intervention with BHB may generate a signal of dietary restriction (as seen in mammals) and, as a consequence of this dietary restriction, the PI3K pathway is inhibited, resulting in increased DAF-16 activity. This could restore the proper neurodevelopment of GABAergic neurons. However, this is mere speculation, and further deeper experiments (than the pharmacology ones we performed here) with mutants in different genes within the PI3K pathway may shed light on this point.

      Following the reviewer's suggestion, this point has been discussed in the revised version of the manuscript. (Discussion Page 18, Lines 384-407).

      The consequence of SOD-3 expression in the broader context of GABA neurons was not discussed. SOD3 was also measured in the pharynx but measuring it in neurons would bolster the claims.

      SOD-3 is a known target of DAF-16. Previous studies have shown that βHB induces SOD-3 expression through the induction of DAF-16 (Edwards et al, 2014, Aging,

      https://doi.org/10.18632%2Faging.100683). The highest levels of SOD-3 expression are typically observed in the pharynx or intestine (DeRosa et al, 2019 https://doi.org/10.1038/s41586-019-1524-5;  Zheng et al., 2021, PNAS, https://doi.org/10.1073/pnas.2021063118), and it is often used as a measure of general upregulation of DAF-16. Therefore, we used this parameter as a measure of βHB upregulating systemic DAF-16 activity.  While we agree with the reviewer that observing variations in SOD-3 expression in neurons would further support our conclusions, unfortunately, we did not detect measurable signals of SOD-3 in motor neurons in either the control condition or the daf-18 background even upon stress or BHB-exposure. This may be because SOD-3 is a minor target of DAF-16 in these neurons, or its modulation may not correspond to the timing of fluorescence measurements (L4-adults).

      Despite this, our genetic experiments and neuron-specific rescue experiments lead us to conclude that DAF-16 must act autonomously in GABAergic neurons to ensure proper neurodevelopment.

      If they want to include AKT-1, seeing its effect on SOD-3 expression could be meaningful to the model.

      Thank you for this suggestion. We believe that even measuring SOD-3 levels in akt mutant backgrounds would still provide limited information to give it a predominant value in our work. Additionally, to have a complete understanding of the total role of AKT, it would be necessary to measure it in a double mutant background of akt-1; akt-2, and these double mutants generate 100 % dauers even at 15C (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014,

      https://doi.org/10.1371/journal.pone.0107671), greatly complicating the execution of these experiments. Therefore, following the first advice of this reviewer, we have decided to modify our model by excluding AKT.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      ⁃ Please include earlier in the main text the rationale for using unc-25 as a control/reference already when mentioning Figure 1A.

      Thank you for pointing out the need to reference this control earlier. We have included the following paragraph in the description of Figure 1 (Page 5, line 71, revised version):

      “Hypersensitivity to cholinergic drugs is typical of animals with an increased E/I ratio in the neuromuscular system, such as mutants in unc-25 (the C. elegans orthologue for glutamic acid decarboxylase, an essential enzyme for synthesizing GABA). While daf-18/PTEN mutants become paralyzed earlier than wild-type animals, their hypersensitivity to cholinergic drugs is not as severe as that observed in animals completely deficient in GABA synthesis, such unc-25 null mutants (Figures 1B and 1C) indicating a less pronounced imbalance between excitatory and inhibitory signals.”

      ⁃ Please discuss the greater sensitivity of pdk-1(gf) animals to levamisole than to aldicarb.

      Thank you for bringing up this subtle point.  We understand that the reviewer is referring to the paralysis curve in response to aldicarb in pdk-1(gf), which is closer to unc-25 than the curve for levamisole (in both cases, they are more sensitive than the wild type). Therefore, pdk-1(gf) animals seem to be more sensitive to aldicarb than to levamisole. These results are now shown in Figure 1D (revised version).

      The PI3K pathway does not only act in neurons but also in muscles. Gain of function in pdk-1 has been shown to modulate muscle protein degradation (Szewczyk et al, EMBO Journal, 2008. https://doi.org/10.1038/sj.emboj.7601540). In contrast,  no effect on protein degradation has been reported for null mutants in this gene. Several studies have demonstrated that protein degradation levels can differentially affect receptor subunits, particularly acetylcholine receptors (Reviewed in Crespi et al, Br J Pharmacol, 2018). C. elegans is characterized by a wide repertoire of AChR subunits, and there are at least two subtypes of ACh receptors in muscles (one multimeric sensitive to levamisole and one homomeric (ACR-16) insensitive to levamisole) (Richmond et al, 1999 Nature Neuroscience http://dx.doi.org/10.1038/12160; Touroutine D, JBC 2005 https://doi.org/10.1074/jbc.M502818200).

      Interestingly, acr-16 null mutants are hypersensitive to aldicarb (Zeng et al, JCB, 2023, https://doi.org/10.1083/jcb.202301117) while the electrophysiological response to levamisole in this mutant remains similar to that of wild-type (Tourorutine et al, 2005). Therefore, it may be that the gain of function in pdk-1 induces a change in the expression of AChR subtypes in muscle that differentially affect sensitivity to levamisole and ACh. This is purely speculative, and there may be many other explanations. While it would be interesting to explore this difference further, it goes far beyond the scope of this study. The cholinergic drug sensitivity assay is purely exploratory and allowed us to delve into the GABAergic and cholinergic signals in daf-18 mutants. In this sense, the hypersensitivity of pdk-1(gf) to both drugs supports the idea that an increase in PI3K signaling leads to an increased E/I ratio.

      ⁃ Please explain the rationale to perform akt-1 and akt-2 assays separated. Why not test doublemutants? Has their lack of redundancy been determined?.  

      Our pharmacological assays are conducted at the L4 larval stage, making it impossible to analyze the potential redundancy of akt-1 and akt-2 in sensitivity to levamisole and aldicarb. This impossibility arises because the akt-1;akt-2 double mutant exhibits nearly 100% arrest as dauer even at 15°C, as reported in several prior studies (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014, https://doi.org/10.1371/journal.pone.0107671). While the increased dauer arrest in the double mutant compared to the single mutants might suggest redundant functions in dauer entry, there are also reports indicating the absence of redundancy in other processes, such as vulval development (Nakdimon et al., PLOS Genetics 2012, https://doi.org/10.1371%2Fjournal.pgen.1002881).

      The complete Dauer arrest likely underlies why other studies focusing on the role of the PI3K pathway in neurodevelopment utilize both mutants separately (Christensen et al, Development 2011,

      https://doi.org/10.1242/dev.069062). While determining the potential redundancy of these genes is not feasible for this assay, we utilized various mutants of the pathway (age-1, pdk-1, daf-18, daf-16 and daf16;daf-18 in addition to the akt-s) that support the conclusion, which is that exacerbating the PI3K pathway activity makes animals hypersensitive to cholinergic drugs.

      In response to the reviewer's concern, we have added a sentence in the text explaining the impossibility of performing the assay in the akt-1;akt-2 double mutant (Page 6, lines90-92) 

      Figure 1C and D (This applies to all similarly presented bar figures). Please show data points and dispersion (preferably data, median+- 25-75% or average+-SD). 

      Thank you. Done

      ⁃ Line 112 -maybe "and resumes"? 

      Thank you. Done (Line 126, revised version)

      ⁃ Figure 1E and F. Please present mean +-SD (not SEM) of fluctuations. Please change slightly the tones so that the dispersion is easier to distinguish on the "blue light on" box.

      Thank you for the suggestion. We have adjusted the tones as recommended to enhance the visualization of the "blue light on" box. For visualization purposes, we present the shading of the standard error of the mean (SEM), as is usual in these types of optogenetic experiments where traces of animal length variations are measured (Liewald et al, Nature Methods, 2008, doi: 10.1038/nmeth.1252; Schulstheis et al, J. Neurophysiology, 2011, doi: 10.1152/jn.00578.2010; Koopman et al, BMC Biology 2021, https://doi.org/10.1186/s12915-021-01085-2; Seidhenthal et al, Micro Publication Biology, 2022, https://doi.org/10.17912%2Fmicropub.biology.000607 ).

      For the revised version, we have also included bar graphs for each optogenetic experiment, representing the mean of the length average of each worm measured from the first second after the blue light was turned on until the second before the light was turned off (in the graph, this corresponds to the period between seconds 6 and 9 of the traces). These graphs include the standard deviation and the corresponding significance levels. All of this has been included in the new legend (Figure 2D, 2E, 4E-J).

      ⁃ Figure 1A&1B & Supplementary Figure 1D x Supplementary Figure 1E&1F. What is the difference between these experiments? Whereas the unc-25 mutants paralyze in the same amount of time, the WT animals paralyze ~1 h later in Supplementary Figure 1E-1F in response to either drug. Please revise experimental conditions to see if anything can be learned eg, maybe this is a nutritional response from experiments done at different timepoints? Maybe different food recipes affected sensitivity to paralysis?

      Thank you for pointing this out. While the experiments with daf-18 (in both alleles) and daf-16 were conducted at the beginning of this project (2019-2020), the assays with the other mutants in the PI3K and mTOR pathways were performed years later. Changes in the reagents used (agar, peptone, cholesterol, etc.) to grow the worms have occurred, potentially altering the animals' response directly or through the nutritional quality of the bacteria they grow on. In addition, the difference may be attributed to the fact that experiments at the project's outset were conducted by one author, while more recent experiments were carried out by another. The goal is to quantify paralysis in non-responsive worms after touch stimulation. The force of this probing or the thickness of the hair used for touching can be slightly operator-dependent and can lead to variable responses. In addition, always the presence of wild-type and unc-25 strain is included as internal control in every experiment. Nevertheless, despite this userdependent variation, the experiments were always conducted blindly (except for unc-25, whose uncoordinated phenotype is easily identifiable), thus we trust in the outcomes.

      ⁃ Supplementary Figure 1G - Length and Width appear to be switched in both left and right panels - please revise and include a description of N and of statistics depicted. 

      Unfortunately, we don't see the switching error that the reviewer mentioned. In the left panel, we demonstrate that optogenetic activation of GABAergic neurons leads to an increase in length without modifying the width of the animal. Therefore, we conclude that the increase in area, as observed in our Fiji macro for optogenetic response analysis, is due to an increase in the animal's length. In the cholinergic activation shown in the right panel, the animal shortens (decreasing length) without modifying the width, resulting in the reduction of the total body area. 

      We have included information about N (sample size) and the statistical test used in the legends as suggested. These graphs are now shown as Figures 2F and G, revised version.

      ⁃ Supplementary Figure 1G legend lines 779-780. Please describe the post-hoc test applied following ANOVA to obtain the denoted p values. This applies to all datasets where ANOVA or Krusal-Wallis tests were applied.

      Following reviewer´s suggestion, all the post-hoc tests applied after ANOVA or Kruskal-Wallis analysis were included in the legend of each figure and Materials and Methods (statistical analysis section).

      ⁃ Line 174 maybe "arises *from* the hyperactivation" instead of *for*?.

      Corrected. Thank you. Line 190, revised version.

      ⁃ Supplementary Figure 4. On line 816 it says n=40-90, but please check the n of the daf-18, daf-16 samples, which seem to have less than 40 animals.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). We have now included the number of observations below each data point cloud to clearly indicate the sample size for each condition

      ⁃ Supplementary Figure 4 - please state what are the bars on the graphs. Please state which post-hoc test was performed after Kruskal-Wallis and present at least the p values obtained between treated controls and each genotype. Alternatively, present the whole truth table in supplementary daita.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). There was an error in the original legend (thank you for bringing this to our attention) since the statistics were not performed using Kruskall-Wallis in this case, but rather each treated condition was compared to its own untreated control using Mann-Whitney test. We have now added the p-values to the graph. All raw data for this figure, as well as for all other figures, are available in Open Science Framework (https://osf.io/mdpgc/?view_only=3edb6edf2298421e94982268d9802050).

      ⁃ Please cite the figure panels in order: eg, Figure 3E is mentioned in the text after panels Figure 3F-K.

      Done. We have rearranged the figures to adapt them to the text order (Figure 4, revised version)

      ⁃ Figure 4 - line 610 please revise "(n=20-30 (n: 20-25 animals per genotype/trial)."

      Thank you. Corrected.

      ⁃ Figure 4 - there appears to be an inconsistency in the figure with the text (lines 223-225). In figures it says E-L1, but in the text, it says "solely in L1". Does E-L1 include the whole L1 stage? If not- E-L1 can be interpreted only as during the embryonic stage, hence, no exposure to betaHB due to the impermeable chitin eggshell. Then there is L1-L2, which should cover the L1 stage and the L2 or something else. Please revise. The text mentions L2-L3 or L3-L4 and these categories are not in the figures. This clarification is key for the interpretation of the results. The precise developmental time of the exposures is not defined either in the methods or in the figures. Please provide precise times relative to hours and/or molts and revise the text/figure for consistency.

      The reviewer is entirely correct in pointing out the lack of relevant data regarding the exposure time to βHB. We have now clarified the information For the revised version, we have adjusted the nomenclature of each exposure period to precisely reflect the developmental stages involved.

      For the experiments involving continuous exposure to βHB throughout development, the NGM plate contained the ketone body. Therefore, the exposure encompassed, in principle, the ex-utero embryonic development period up to L4-Young adults (E-L4/YA, in Figure 5A) when the experiments were conducted. Since it could be a restriction to drug penetration through the chitin shell of the eggs (see Supplementary Figure 7), we can ensure βHB exposure from hatching.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage (This period is called E-L1, in figure 5 revised version). The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage (L1-L3). The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage (L3-YA).

      All this information has been conveniently included in Figure 5 (and its legend), text (Page 13, lines 259276), and Material and Methods of the revised manuscript.

      ⁃ Some methods are not sufficiently well described. Specifically, how the animals were exposed to treatments and how stages were obtained for each experiment. Was synchronization involved? If so, in which experiments and how exactly was it performed?

      As mentioned in previous responses all the experiments were performed in age-synchronized animals. We include the following sentence in Materials and Methods (C. elegans culture and maintenance section): “All experiments were conducted on age-synchronized animals. This was achieved by placing gravid worms on NGM plates and removing them after two hours. The assays were performed on the animals hatched from the eggs laid in these two hours”.

      Reviewer #2 (Recommendations For The Authors):

      Major points

      (1) To complete the study on the GABAergic signaling at the NMJs, it would be interesting to assess the status of the post-synaptic part of the synapse such as the GABAR clustering. It would also tell if the impairment is only presynaptic or both post and presynaptic.

      Thank you for your insightful suggestion. We agree that exploring post-synaptic elements can shed light on whether the impairment is solely presynaptic or involves both pre and post-synaptic components.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in the future. This includes not only examining GABAergic receptors but also exploring cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre and post-synaptic components (rab-3, unc-49, unc-29, acr-16 driving GFP or mCherry). However, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, requiring significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet. 

      Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal morphological defects leading to functional alterations in daf-18/PTEN mutants.

      We will include these experiments in our future projects, also planning to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      (2) The author always referred to unc-47 promoter or unc-17 promoter, never specifying where those promoters are driving the expression (and in the Materials & Methods, no information on the corresponding sequence). Depending on the promoters they may not only be expressed in the motoneurons involved in locomotion (VA, VB, DA, DB, VD, and DD), but they could also be expressed in other neurons which could be of importance for the conclusions of the optogenetic assays but also the daf-18 expression in GABAergic neurons.

      We appreciate the reviewer's insight regarding the broader expression patterns of the unc-17 and unc-47 promoters in all cholinergic and GABAergic neurons, respectively. The strains expressing constructs with these promoters were obtained from the CGC or other labs and have been widely used in previous papers (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008); Byrne, A. B. et al. Neuron 81, 561-573, doi:10.1016/j.neuron.2013.11.019 (2014).

      Regarding the optogenetic assays, the readout utilized (body length elongation or contraction) is primarily associated with the activity of cholinergic and GABAergic motor neurons and has been used in numerous studies to measure motor neuron functionality (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008);Hwang, H. et al. Sci Rep 6, 19900, doi:10.1038/srep19900 (2016); Schultheis et al,  . J Neurophysiol 106, 817-827, doi:10.1152/jn.00578.2010 (2011); Koopman, M., Janssen, L. & Nollen, E. A. BMC Biol 19, 170, doi:10.1186/s12915-021-01085-2 (2021);). It has previously been established that the shortening observed after optogenetic activation of the unc-17 promoter, while active in various interneurons, depends on the activity of cholinergic motor neurons (Liewald et al., Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008)). This was demonstrated by examining transgenic worms expressing ChR2-YFP from another cholinergic, motoneuronspecific but weaker promoter, Punc-4. They observed contraction and coiling upon illumination, albeit to a milder degree.

      In terms of GABAergic neurons, only 3 do not directly synapse to body wall muscles (AVL, PDV, and RIS) and are primarily involved in defecation. Of the 23 GABAergic motor neurons, 19 are Dtype motoneurons, while the remaining 4 innervate head muscles (Pereira et al, eLife 2015, https://doi.org/10.7554/eLife.12432). It is therefore expected that while there may be some contribution from these latter neurons to the elongation after optogenetic activation in animals containing punc-47::ChR2, the main contribution should be from the D-type neurons. Additionally, while there may be some influence on D-type neuron development due to daf-18 rescue in neurons like RME, DVB or AVL, the most direct explanation for the rescue is that daf-18 acts autonomously in D-type cells.  Additionally, we have pharmacological and behavioral assays that support the findings of optogenetics and enable us to reach final conclusions.

      (3) DD neurons are born during embryogenesis and newborn L1s have neurites even though less than at a later stage. If possible, it would be interesting to take a look at them to see if βHB has an effect or not. It will corroborate the hypothesis that βHB action is prevented by the impermeable eggshell on a system that can respond at a later stage. Moreover, using a specific DD, DA, and DB promoter, it would be possible to check if there is a difference in the morphological defects between embryonic and post-embryonic neurons.

      This is a very interesting point raised by the reviewer. We conducted experiments to analyze the morphology of GABAergic neurons in animals exposed to βHB only during the ex-utero embryonic development (in their laid egg state). We observed that this incubation was not sufficient to rescue the defects in GABAergic neurons (Supplementary Figure 7, revised version). As reported by other authors and discussed in our paper, the chitinous eggshell might act as an impermeable barrier to most drugs. However, we cannot rule out that incubation during this period is necessary but not sufficient to mitigate the defects. We have included these experiments in Supplementary Figure 7 and in the text (Page 13, lines 272-276)

      Additionally, we analyzed confocal images where, based on their position, we could identify and assess errors in DD (embryonic) and VD (Post-embryonic) neurons (Supplementary Figure 3, revised version). These experiments show that the effects are observed in both types of neurons, and we did not observe any differential alterations in neuronal morphology between the two types of neurons.

      Minor points

      (1)   Expression of daf-18/PTEN in muscle or hypodermis, could it ensure a proper development? It could give insights into the action mechanism of βHB.

      The reviewer's observation is indeed very intriguing. Previous studies from the Grishok lab (Kennedy et al, 2013) have demonstrated that the expression of daf-18 or daf-16 in extraneuronal tissues, specifically in the hypodermis, can rescue migratory defects in the serotoninergic neuron HSN in daf-18 or daf-16 null mutants of C. elegans. Clearly, this could also be an option for rescuing the morphological and functional defects of GABAergic motoneurons.

      However, the fact that the expression of daf-18 in GABAergic neurons rescues these defects strongly suggests an autonomous effect. In this regard, autonomous effects of DAF-18 or DAF-16 on neurodevelopmental defects have also been reported in interneurons in C. elegans (Christensen et al, 2011). This is included in the discussion (Page 15, lines 330-335)

      (2) Re-organise the introduction. The paragraph on ketogenic diets (lines 35-38) is not logically linked.

      Following reviewer´s suggestion we have reorganized the introduction and changed the order of explanation regarding the significance of ketogenic diets, linking it with their proven effectiveness in alleviating symptoms of diseases with E/I imbalance (Lines 23-60, revised version)

      (3) Incorporate titles in the result section to guide the reader.

      Done. Thank you

      (4) Systematically add PTEN or FOXO when daf-18 or daf-16 are mentioned (for example lines 69, 84, 85).

      Done. Thank you  

      (5) Strain lists: lines 646 to 653: some information is missing on the different transgenes used in this study (integrated (Is) or extrachromosomal (Ex) with their numbers).

      Thank you for bringing this to our attention. We have now included all the information regarding the different transgenes used in this study, including whether they are integrated (Is) or extrachromosomal (Ex) and their respective numbers. This information can be found in the revised version of the manuscript (Materials and Methods, C. elegans culture and maintenance section highlighted in yellow).

      Reviewer #3 (Recommendations For The Authors):

      In Figure 1, some experiments were done with the unc-25 control while others, such as the optogenetic experiments, were done without those controls.

      Thank you for pointing this out. In the optogenetic experiments, we waited for the worm to move forward for 5 seconds at a sustained speed before exposing it to blue light to standardize the experiment, as the response can vary if the animal is in reverse, going forward, or stationary. Due to the severity of the uncoordinated movement in unc-25 mutants, achieving this forward movement before exposure is very difficult. Additionally, this lack of coordination prevents these animals from performing the escape response tests, as they barely move. Therefore, we limited the use of this severe GABAergic-deficient control to pharmacological or post-prodding shortening experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments to characterize what this novel cell type becomes in older animals would be ideal to strengthen the manuscript, but the authors should at least address this in the Discussion.

      The manuscript could be significantly improved if the authors included, for example, a timeline and/or cartoon contextualizing these cells relative to the formation of other CN neurons and their locations, perhaps as a summary figure at the end. Furthermore, the logic of each figure could be enhanced if the authors graphically show - again, perhaps with a schematic/cartoon - the question being tested for each figure. Furthermore, making the figure titles less descriptive and more explanatory would also help a reader follow the logic of the experiments.

      These are indeed valid and important questions for our research, and understanding the distribution, fate, and connectivity of this new cell type in the cerebellar nuclei postnatally is a focus of ongoing investigation in our lab. To address these questions, we are currently utilizing SNCA-GFP mice, a project led by a PhD student in my lab. While this work will be the subject of a full-length research paper, we do add a sentence to the paper concerning a recent report about the presence of SNCA neurons in the adult CN.  We have included a reference to the postnatal expression of SNCA (“In adult mice, postnatal expression of SNCA has been reported in medial CN neurons. PMID: 32639229”.) on page 8 of our manuscript (highlighted in yellow). In addition, we have included a cartoon as a summary figure (Fig. 9) illustrating the origin of cerebellar nuclei from the caudal and rostral ends in both Atoh1+/+ and Atoh1-/- mice. Thank you once again, we have revised and improved the Fig. titles accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Figure 3:

      (1) If most SNCA+ cells are OTX2+ based on the IHCs, why are there so many SNCA+ Otx2- cells in the sort?

      In each group, 350,000 cells were sorted. Due to the relatively small population size of this subset of cerebellar nuclei neurons, the sorting procedure could not perfectly mirror our immunohistochemistry results. In each group, 350,000 cells were sorted. Due to the relatively small population size of this subset of cerebellar nuclei neurons, the sorting procedure could not perfectly mirror our immunohistochemistry results. However, it is noteworthy that a portion of sorted cells expressed SNCA or Otx2 while a smaller population co-expressed both Otx2 and SNCA in the cerebellar primordium.

      (2) Panel 3F: FACS graphs - the resolution of the figures is too poor on the PDF to read any of the text of these graphs. What are the axes?

      We thank the reviewer for this comment. In the revision a high resolution of the FACS graph has replaced the lower quality graph in panel 3F. This clearly identifies the axes and text for this panel.

      Figure 4:

      (1) Arrowheads are making a subset of + cerebellar cells -Why? Not defined in the legend.

      The population of cells indicated by the arrowheads are now defined in the legend. We have added the statement “Examples of Otx2 expressing cells are indicated by arrowheads in panels B, D, E, and F.”

      (2) The orientation of panels E and F is unclear - please provide low mag panel insets.

      An orientation marker (ie, (r-c and d-v; rostral caudal and dorsal ventral, respectively)) has been added to panel A, which applies to all panels, including panels E and F. Furthermore, the isthmus is noted with an “i” to provide further orientation.

      (3) G - and throughout the paper - whisker plots (not simple box plots) are required. Also, it is unclear from the methods how Otx2+ cells were counted - how many embryos/age? The description of 10 sections across 3 slides is incomplete. Are these cells distributed equally across the mediolateral axis of the anlage? Where are comparable M/L sections compared across ages? Is the increase in # across time because these cells are proliferative or are more migrating into the anlage?

      The plot has been replaced with whisker plots. A more detailed description of the Method used has been on page 15; “To assess the number of OTX2-positive cells, we conducted immunohistochemistry (IHC) labeling on slides containing serial sections from embryonic days 12, 13, 14, and 15 (n=3 at each timepoint). Under the microscope, we systematically counted OTX2-positive cells within the cerebellar primordium. This analysis encompassed a minimum of 10 sections, spread across at least 3 slides, ensuring comprehensive coverage of OTX2 expression along the mediolateral axis of the cerebellar primordium. For each slide, the counts of OTX2-positive cells from all sections were cumulatively calculated to determine the total number of positive cells per slide. Subsequently, statistical analysis was employed to compare the results obtained different developmental time points.”

      Figure 5:

      The use of confocal microscopy creates clear data re Otx2-GFP expression, but I cannot understand the origin of the panels. How do they relate to E/F and H/I? Different sections?

      In Figure 5, panels A-D display Otx2 expressing cells in the cerebellar primordium of Otx2-GFP transgenic mice, whereas panels E-J depict RNAscope fluorescence in situ hybridization (FISH) for the Otx2 probe in wild type mice. These represent complementary approaches to map Otx2+ cells in the developing cerebellum. This is made clear in a revised legend in Fig 5.

      Figure 6:

      The justification for the in-culture experiments, particularly the long (4 and 21DIV) times is unclear and needs to be strengthened or the in vitro data should be removed.

      Thank you for the respected reviewer’s comment. The E-H panels, show the co-expression of SNCA and p75NTR, highlight a significant role in the differentiation of specific neuronal populations during development. These findings validate our previous results (PMID: 31509576) and are consistent with the results of our current study. Therefore, we have chosen to keep these panels. However, in line with the suggestion from the reviewer, we have removed panels I-L from Fig. 6.

      Figure 7:

      SNCA expression in panels A and G is not specific nor is the Otx2 staining in panel B making the data in panels C and I uninterpretable and these panels need to be replaced. The Meis2 data however is much better and I agree this data shows that the dorsal RL-derived cells are deleted in Atoh1-/- while the SNCA+ cells remain. This is strong data supporting the dual origins of NTZ.

      Thank you for the points, Panel A and G have been replaced with high-resolution images. In addition, panels A-C have been carefully cropped to enhance focus on the NTZ area, to improve the quality and visibility of panels.  To enhance clarity, we have included a summary fig. 9 for clarification.

      Figure 8:

      The diI experiments are a key addition to this paper and clearly show the direct movement of some cells from the mesencephalon into the developing cerebellum, but data presentation must be considerably strengthened.

      (1) What is the inset in panel A? Low mag of embryo? Perhaps conversion of image to PDF degraded resolution - add a description in the legend. Arrowhead and arrow identities are reversed in the legend. The arrow points to the isthmus.

      Thank you for the comment, for clarification we have included information in the Fig. legend (highlighted in yellow). In addition, the issues with the arrows have been addressed and corrected.

      (2) Panels B and C are also shown in Supplementary Figure 2 with arrows indicating rostral and caudal movement - these arrows need to be added here. There is no need to replicate these same panels in the supplement.

      Thanks, arrows have been added in panels B, C of Fig. 8.

      (3) The text states that "almost all DiI cells migrated caudally into the cerebellum" and refers to Figure 8E and Suppementl 3 but there is no evidence/support shown for this, just a few + cells in 8E and some very difficult-to-see positive cells in sections in Supplement E-F. Given the importance of this data, I am surprised that the authors chose bright field/phase microscopy to show this. This section's data is not convincing data at all. I find it very difficult to see specific staining. These panels must be improved. This is key data for paper conclusions.

      These are valid points, and we acknowledge that this experiment alone may not provide conclusive evidence regarding the subset of CN originating from mesencephalon. At this stage of the study, we do not claim definitively that the SNCA/OTX2/MEIS2 positive cells originate from the mesencephalon. As stated in our manuscript, "In conclusion, our study indicates that the SNCA+/ OTX2+/ MEIS2+/ p75NTR+/ LMX1A- rostroventral subset of CN neurons do not originate from the well-known distinct germinative zones of the cerebellar primordium. Instead, our findings suggest the existence of a previously unidentified extrinsic germinal zone, potentially the mesencephalon."  We have also discussed embryonic culture approaches in the manuscript, which could involve the use of other agents such as plasmid/viral vectors, hinting at the possibility of origin from the mesencephalon. While tracing the origin from the mesencephalon in vivo and in vitro is promising and on our to-do list, the data will not be available for this manuscript. To prevent confusion, we have eliminated redundant panels of Fig. 8 with Supplementary Fig. 2 and 3. However, if the reviewer deems it necessary to remove these panels, we are prepared to do so.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      The revised manuscript addressed my minor concerns adequately, and the manuscript is now further improved. I have no remaining criticisms.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      line 45 The abbreviation "SytI" should perhaps be introduced above.

      done

      Results:

      line 139 "RRP kinetics" should perhaps read "RRP depletion kinetics" or "secretion kinetics".

      We replaced “RRP kinetics” with “RRP secretion kinetics”

      line 325ff and Figure 8

      As far as I understand, SytI 875 R233Q ki cells shown in violet express wt CplxII. Perhaps this should be explicitly stated?

      To accommodate this suggestion: We now state on page 13 line 302: “Overexpression of the CpxII DN mutant in SytI R233Q ki cells, which is expected to outcompete the function of endogenous CpxII in these cells (Dhara et al., 2014), further slowed down the rate of synchronized release and restored the EB size to the wt level (Figure 7C, D)”

      line 332ff and Figure 8

      What is plotted in Figure 8B bottom and in Figure 8D is not a "rate" but rather a "unitary rate", more commonly referred to as a "rate constant".

      The y-axis label of Figures 8B and 8D should therefore better be changed to "rate constant". See also line 528 of the Discussion.

      Figure (y-axis label) and text were changed accordingly

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Glaser et al present ExA-SPIM, a light-sheet microscope platform with large volumetric coverage (Field of view 85mm^2, working distance 35mm), designed to image expanded mouse brains in their entirety. The authors also present an expansion method optimized for whole mouse brains and an acquisition software suite. The microscope is employed in imaging an expanded mouse brain, the macaque motor cortex, and human brain slices of white matter. 

      This is impressive work and represents a leap over existing light-sheet microscopes. As an example, it offers a fivefold higher resolution than mesoSPIM (https://mesospim.org/), a popular platform for imaging large cleared samples. Thus while this work is rooted in optical engineering, it manifests a huge step forward and has the potential to become an important tool in the neurosciences. 

      Strengths: 

      - ExA-SPIM features an exceptional combination of field of view, working distance, resolution, and throughput. 

      - An expanded mouse brain can be acquired with only 15 tiles, lowering the burden on computational stitching. That the brain does not need to be mechanically sectioned is also seen as an important capability. 

      - The image data is compelling, and tracing of neurons has been performed. This demonstrates the potential of the microscope platform. 

      Weaknesses: 

      - There is a general question about the scaling laws of lenses, and expansion microscopy, which in my opinion remained unanswered: In the context of whole brain imaging, a larger expansion factor requires a microscope system with larger volumetric coverage, which in turn will have lower resolution (Figure 1B). So what is optimal? Could one alternatively image a cleared (non-expanded) brain with a high-resolution ASLM system (Chakraborty, Tonmoy, Nature Methods 2019, potentially upgraded with custom objectives) and get a similar effective resolution as the authors get with expansion? This is not meant to diminish the achievement, but it was unclear if the gains in resolution from the expansion factor are traded off by the scaling laws of current optical systems. 

      Paraphrasing the reviewer: Expanding the tissue requires imaging larger volumes and allows lower optical resolution. What has been gained?

      The answer to the reviewer’s question is nuanced and contains four parts. 

      First, optical engineering requirements are more forgiving for lenses with lower resolution. Lower resolution lenses can have much larger fields of view (in real terms: the number of resolvable elements, proportional to ‘etendue’) and much longer working distances. In other words, it is currently more feasible to engineer lower resolution lenses with larger volumetric coverage, even when accounting for the expansion factor. 

      Second, these lenses are also much better corrected compared to higher resolution (NA) lenses. They have a flat field of view, negligible pincushion distortions, and constant resolution across the field of view. We are not aware of comparable performance for high NA objectives, even when correcting for expansion.

      Third, although clearing and expansion render tissues ‘transparent’, there still exist refractive index inhomogeneities which deteriorate image quality, especially at larger imaging depths. These effects are more severe for higher optical resolutions (NA), because the rays entering the objective at higher angles have longer paths in the tissue and will see more aberrations. For lower NA systems, such as ExaSPIM, the differences in paths between the extreme and axial rays are relatively small and image formation is less sensitive to aberrations. 

      Fourth, aberrations are proportional to the index of refraction inhomogeneities (dn/dx). Since the index of refraction is roughly proportional to density, scattering and aberration of light decreases as M^3, where M is the expansion factor. In contrast, the imaging path length through the tissue only increases as M. This produces a huge win for imaging larger samples with lower resolutions. 

      To our knowledge there are no convincing demonstrations in the literature of diffraction-limited ASLM imaging at a depth of 1 cm in cleared mouse brain tissue, which would be equivalent to the ExA-SPIM imaging results presented in this manuscript.  

      In the discussion of the revised manuscript we discuss these factors in more depth. 

      - It was unclear if 300 nm lateral and 800 nm axial resolution is enough for many questions in neuroscience. Segmenting spines, distinguishing pre- and postsynaptic densities, or tracing densely labeled neurons might be challenging. A discussion about the necessary resolution levels in neuroscience would be appreciated. 

      We have previously shown good results in tracing the thinnest (100 nm thick) axons over cm scales with 1.5 um axial resolution. It is the contrast (SNR) that matters, and the ExaSPIM contrast exceeds the block-face 2-photon contrast, not to mention imaging speed (> 10x).  

      Indeed, for some questions, like distinguishing fluorescence in pre- and postsynaptic structures, higher resolutions will be required (0.2 um isotropic; Rah et al Frontiers Neurosci, 2013). This could be achieved with higher expansion factors.

      This is not within the intended scope of the current manuscript. As mentioned in the discussion section, we are working towards ExA-SPIM-based concepts to achieve better resolution through the design and fabrication of a customized imaging lens that maintains a high volumetric coverage with increased numerical aperture.  

      - Would it be possible to characterize the aberrations that might be still present after whole brain expansion? One approach could be to image small fluorescent nanospheres behind the expanded brain and recover the pupil function via phase retrieval. But even full width half maximum (FWHM) measurements of the nanospheres' images would give some idea of the magnitude of the aberrations. 

      We now included a supplementary figure highlighting images of small axon segments within distal regions of the brain.  

      Reviewer #2 (Public Review)

      Summary: 

      In this manuscript, Glaser et al. describe a new selective plane illumination microscope designed to image a large field of view that is optimized for expanded and cleared tissue samples. For the most part, the microscope design follows a standard formula that is common among many systems (e.g. Keller PJ et al Science 2008, Pitrone PG et al. Nature Methods 2013, Dean KM et al. Biophys J 2015, and Voigt FF et al. Nature Methods 2019). The primary conceptual and technical novelty is to use a detection objective from the metrology industry that has a large field of view and a large area camera. The authors characterize the system resolution, field curvature, and chromatic focal shift by measuring fluorescent beads in a hydrogel and then show example images of expanded samples from mouse, macaque, and human brain tissue. 

      Strengths: 

      I commend the authors for making all of the documentation, models, and acquisition software openly accessible and believe that this will help assist others who would like to replicate the instrument. I anticipate that the protocols for imaging large expanded tissues (such as an entire mouse brain) will also be useful to the community. 

      Weaknesses: 

      The characterization of the instrument needs to be improved to validate the claims. If the manuscript claims that the instrument allows for robust automated neuronal tracing, then this should be included in the data. 

      The reviewer raises a valid concern. Our assertion that the resolution and contrast is sufficient for robust automated neuronal tracing is overstated based on the data in the paper. We are hard at work on automated tracing of datasets from the ExA-SPIM microscope. We have demonstrated full reconstruction of axonal arbors encompassing >20 cm of axonal length.  But including these methods and results is out of the scope of the current manuscript. 

      The claims of robust automated neuronal tracing have been appropriately modified.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Smaller questions to the authors: 

      - Would a multi-directional illumination and detection architecture help? Was there a particular reason the authors did not go that route?

      Despite the clarity of the expanded tissue, and the lower numerical aperture of the ExA-SPIM microscope, image quality still degrades slightly towards the distal regions of the brain relative to both the excitation and detection objective. Therefore, multi-directional illumination and detection would be advantageous. Since the initial submission of the manuscript, we have undertaken re-designing the optics and mechanics of the system. This includes provisions for multi-directional illumination and detection. However, this new design is beyond the scope of this manuscript. We now mention this in L254-255 of the Discussion section.

      - Why did the authors not use the same objective for illumination and detection, which would allow isotropic resolution in ASLM? 

      The current implementation of ASLM requires an infinity corrected objective (i.e. conjugating the axial sweeping mechanism to the back focal plane). This is not possible due to the finite conjugate design of the ExA-SPIM detection lens.

      More fundamentally, pushing the excitation NA higher would result in a shorter light sheet Rayleigh length, which would require a smaller detection slit (shorter exposure time, lower signal to noise ratio). For our purposes an excitation NA of 0.1 is an excellent compromise between axial resolution, signal to noise ratio, and imaging speed. 

      For other potentially brighter biological structures, it may be possible to design a custom infinity corrected objective that enables ASLM with NA > 0.1.

      - Have the authors made any attempt to characterize distortions of the brain tissue that can occur due to expansion? 

      We have not systematically characterized the distortions of the brain tissue pre and post expansion. Imaged mouse brain volumes are registered to the Allen CCF regardless of whether or not the tissue was expanded. It is beyond the scope of this manuscript to include these results and processing methods, but we have confirmed that the ExA-SPIM mouse brain volumes contain only modest deformation that is easily accounted for during registration to the Allen CCF. 

      - The authors state that a custom lens with NA 0.5-0.6 lens can be designed, featuring similar specifications. Is there a practical design? Wouldn't such a lens be more prone to Field curvature? 

      This custom lens has already been designed and is currently being fabricated. The lens maintains a similar space bandwidth product as the current lens (increased numerical aperture but over a proportionally smaller field of view). Over the designed field of view, field curvature is <1 µm. However, including additional discussion or results of this customized lens is beyond the scope of this manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      • System characterization: 

      - Please state what wavelength was used for the resolution measurements in Figure 2.

      An excitation wavelength of 561 nm was used. This has been added to the manuscript text.

      - The manuscript highlights that a key advance for the microscope is the ability to image over a very large 13 mm diameter field of view. Can the authors clarify why they chose to characterize resolution over an 8diameter mm field rather than the full area? 

      The 13 mm diameter field of view refers to the diagonal of the 10.6 x 8.0 mm field of view. The results presented in Figure 1c are with respect to the horizontal x direction and vertical y direction. A note indicating that the 13 mm is with respect to the diagonal of the rectangular imaging field has been added to the manuscript text. The results were presented in this way to present the axial and lateral resolution as a function of y (the axial sweeping direction).

      - The resolution estimates seem lower than I would expect for a 0.30 NA lens (which should be closer to ~850 nm for 515 nm emission). Could the authors clarify the discrepancy? Is this predicted by the Zemax model and due to using the lens in immersion media, related to sampling size on the camera, or something else? It would be helpful if the authors could overlay the expected diffraction-limited performance together with the plots in Figure 2C. 

      As mentioned previously, the resolution measurements were performed with 561 nm excitation and an emission bandpass of ~573 – 616 nm (595 nm average). Based on this we would expect the full width half maximum resolution to be ~975 nm. The resolution is in fact limited by sampling on the camera. The 3.76 µm pixel size, combined with the 5.0X magnification results in a sampling of 752 nm. Based on the Nyquist the resolution is limited to ~1.5 µm. We have added clarifying statements to the text.

      - I'm confused about the characterization of light sheet thickness and how it relates to the measured detection field curvature. The authors state that they "deliver a light sheet with NA = 0.10 which has a width of 12.5 mm (FWHM)." If we estimate that light fills the 0.10 NA, it should have a beam waist (2wo) of ~3 microns (assuming Gaussian beam approximations). Although field curvature is described as "minimal" in the text, it is still ~10-15 microns at the edge of the field for the emission bands for GFP and RFP proteins. Given that this is 5X larger than the light sheet thickness, how do the authors deal with this? 

      The generated light sheet is flat, with a thickness of ~ 3 µm. This flat light sheet will be captured in focus over the depth of focus of the detection objective. The stated field curvature is within 2.5X the depth of focus of the detection lens, which is equivalent to the “Plan” specification of standard microscope objectives.

      - In Figure 2E, it would be helpful if the authors could list the exposure times as well as the total voxels/second for the two-camera comparison. It's also worth noting that the Sony chip used in the VP151MX camera was released last year whereas the Orca Flash V3 chosen for comparison is over a decade old now. I'm confused as to why the authors chose this camera for comparison when they appear to have a more recent Orca BT-Fusion that they show in a picture in the supplement (indicated as Figure S2 in the text, but I believe this is a typo and should be Figure S3). 

      This is a useful addition, and we have added exposure times to the plot. We have also added a note that the Orca Flash V3 is an older generation sCMOS camera and that newer variants exist. Including the Orca BT-Fusion. The BT-Fusion has a read noise of 1.0 e- rms versus 1.6 e- rms, and a peak quantum efficiency of ~95% vs. 85%. Based on the discussion in Supplementary Note S1, we do not expect that these differences in specifications would dramatically change the data presented in the plot. In addition, the typo in Figure S2 has been corrected to Figure S3.

      - In Table S1, the authors note that they only compare their work to prior modalities that are capable of providing <= 1 micron resolution. I'm a bit confused by this choice given that Figure 2 seems to show the resolution of ExA-SPIM as ~1.5 microns at 4 mm off center (1/2 their stated radial field of view). It also excludes a comparison with the mesoSPIM project which at least to me seems to be the most relevant prior to this manuscript. This system is designed for imaging large cleared tissues like the ones shown here. While the original publication in 2019 had a substantially lower lateral resolution, a newer variant, Nikita et al bioRxiv (which is cited in general terms in this manuscript, but not explicitly discussed) also provides 1.5-micron lateral resolution over a comparable field of view. 

      We have updated the table to include the benchtop mesoSPIM from Nikita et al., Nature Communications, 2024. Based on this published version of the manuscript, the lateral resolution is 1.5 µm and axial resolution is 3.3 µm. Assuming the Iris 15 camera sensor, with the stated 2.5 fps, the volumetric rate (megavoxels/sec) is 37.41.

      - The authors state that, "We systematically evaluated dehydration agents, including methanol, ethanol, and tetrahydrofuran (THF), followed by delipidation with commonly used protocols on 1 mm thick brain slices. Slices were expanded and examined for clarity under a macroscope." It would be useful to include some data from this evaluation in the manuscript to make it clear how the authors arrived at their final protocol. 

      Additional details on the expansion protocol may be included in another manuscript.

      General comments: 

      • There is a tendency in the manuscript to use negative qualitative terms when describing prior work and positive qualitative terms when describing the work here. Examples include: 

      - "Throughput is limited in part by cumbersome and error-prone microscopy methods". While I agree that performing single neuron reconstructions at a large scale is a difficult challenge, the terms cumbersome and error-prone are qualitative and lacking objective metrics.

      We have revised this statement to be more precise, stating that throughput is limited in part by the speed and image quality of existing microscopy methods.

      - The resolution of the system is described in several places as "near-isotropic" whereas prior methods were described as "highly anisotropic". I agree that the ~1:3 lateral to axial ratio here is more isotropic than the 1:6 ratio of the other cited publications. However, I'm not sure I'd consider 3-fold worse axial resolution than lateral to be considered "near" isotropic.

      We agree that the term near-isotropic is ambiguous. We have modified the text accordingly, removing the term near-isotropic and where appropriate stating that the resolution is more isotropic than that of other cited publications.

      - exposures (which in the caption is described as "modest"). I'd suggest removing these qualitative terms and just stating the values.

      We agree and have changed the text accordingly.

      • The results section for Figure 5 is titled "Tracing axons in human neocortex and white matter". Although this section states "larger axons (>1 um) are well separated... allowing for robust automated and manual tracing" there is no data for any tracing in the manuscript. Although I agree that the images are visually impressive, I'm not sure that this claim is backed by data.

      We have now removed the text in this section referring to automated and manual tracing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The paper investigates a potential cause of a type of severe epilepsy that develops in early life because of a defect in a gene called KCNQ2. The significance is fundamental because it substantially advances our understanding of a major research question. The strength of the evidence is convincing because appropriate methods are used that are in line with the state-of-the art, although there are some revisions/corrections that would strengthen the evidence further.

      Thank you for the expert, thorough, and helpful review.  We believe that addressing the reviewers’ points has improved our paper greatly.   

      Public Reviews:

      Reviewer #1 (Public Review):

      Abreo et al. performed a detailed multidisciplinary analysis of a pathogenic variant of the KCNQ2 ion channel subunit identified in a child with neonatal-onset epilepsy and neurodevelopmental disorders. These analyses revealed multiple molecular and cellular mechanisms associated with this variant and provided important insights into what distinguishes distinct pathogenic variants of KCNQ2 associated with self-limited familial neonatal epilepsy versus those leading to developmental and epileptic encephalopathy, and how they may mechanistically differ, to result in different extents of developmental impairment.

      The authors first provide a detailed clinical description of the patient heterozygous for a novel pathogenic variant encoding KCNQ2 G256W. They then model the structure of the G256W variant based on recent cryo-EM structures of KCNQ2 and other ion channel subunits and find that while the affected position is quite distinct from the channel pore, it participates in a novel, evolutionarily conserved set of amino acids that form a network of hydrogen bonds that stabilize the structure of the pore domain.

      They then undertake a series of rigorous and quantitative laboratory experiments in which the KCNQ2 G256W variant is coexpressed exogenously with WT KCNQ2 and KCNQ3 subunits in heterologous cells, and endogenously in novel gene-edited mice generated for this study. This includes detailed electrophysiological analyses in the transfected heterologous cells revealing the dominant-negative phenotype of KCNQ2 G256W. They found altered firing properties in hippocampal CA1 neurons in brain slices from the heterozygous KCNQ2 G256W mice.

      They next showed that the expression and localization of KCNQ channels are altered in brain neurons from heterozygous KCNQ2 G256W mice, suggesting that this variant impacts KCNQ2 trafficking and stability.

      Together, these laboratory studies reveal that the molecular and cellular mechanisms shaping KCNQ channel expression, localization, and function are impacted at multiple levels by the variant encoding KCNQ2 G256W, likely contributing to the clinical features of the child heterozygous for this variant relative to patients harboring distinct KCNQ2 pathogenic variants.

      Thank you for the thorough summary and estimation of the initial submission, we are very glad that our approach, analytical methods, and conclusions were convincing.   

      Reviewer #2 (Public Review):

      Summary:

      The paper entitled "Plural molecular and cellular mechanisms of pore domain KCNQ2 encephalopathy" by Abreo et al. is a complex and integrated paper that is well-written with a focus on a single gene variant that causes a severe developmental

      encephalopathy. The paper collates clinical outcomes from 4 individuals and investigates a variant causing KCNQ2-DEE using a wide range of experimental techniques including structural biology, in vitro electrophysiology, generation of genetically modified animal models, immunofluorescence, and brain slice recordings. The overall results provide a plausible explanation of the pathophysiology of the G265W variant and provide important findings to the KCNQ2-DEE field as well as beginning to separate the understanding between seizures and encephalopathies.

      Strengths:

      (1) The authors describe in detail how the structural biology of the channel with a mutation changes the movement of the protein and adds insights into how one variant can change the function of the M-current. The proposed model linking this change to pathogenic consequences should help pave the way for additional studies to further support this type of approach.

      (2) The multiple co-expression ratio experiments drill down to the complex nature of the assembly of channels in over-expression systems and help to move toward an understanding of heterozygosity. It might have been interesting if TEA was tested as a blocker to better understand the assembly of the transfected subunits or possibly use vectors to force desired configurations.

      (3) The immunofluorescent approach to understanding re-distribution is another component of understanding the function of this critical current. The demonstration that Q2 and Q3 are diminished at the AIS is an important finding and a strength to the totality of the data presented in the paper.

      (4) Brain slice work is an important component of studying genetically modified animals as it brings in the systems approach, and helps to explain seizure generation and EEG recordings. The finding that G265W/+ neurons were more sensitive to current injections is a critical component of the paper.

      (5) The strength of this body of work is how the authors integrated different scientific approaches to knitting together a compelling set of experiments to better explain how a single variant, and likely extrapolation to other variants, can cause a severe neonatal developmental encephalopathy with a poor clinical outcome.

      Thank you for the thorough and encouraging reading of our work and its strengths, we are very glad that, excepting the issues mentioned which we have addressed, our approach and conclusions were convincing.

      Weaknesses:

      (1) Minor comment: Under the clinical history it is unclear whether the mother was on Leviracetam for suspected in-utero seizures or if Leviracetam was given to individual 1.

      The latter seems more likely, and if so this should be reworded.

      We revised the results text to clarify that the drug was begun postnatally, after epilepsy was diagnosed in the child.   

      (2) As described in the clinical history of patient 1, treatment with ezogabine was encouraging with rapid onset by a parental global impression with difficulty in weaning off the drug. When studying the genetically modified mice, it would have been beneficial to the paper to talk about any ezogabine effects on the genetically modified mice.

      We agree this is of great interest, but sampling and metrics are challenging due to the very low frequency of seizures and delayed mortality in the heterozygous G256 mice.  Accordingly, we have not performed ezogabine treatment experiments in the mice described in this study, which model a human variant associated with a brief neonatal window of frequent seizures.  We hope to return this issue using other transgenic mice with higher seizure frequency, but such results are outside the current scope.

      (3) It is a bit surprising that CA1 pyramidal neurons from the heterozygous G256W mice have no difference in resting membrane potential. The discussion section might explore this in a bit more detail.

      Thank you for raising this issue. This combination of outcomes has been seen previously and is interpreted as an outcome of low somatodendritic surface expression of the channels.  Relatively higher expression within the AIS membrane, with its the relatively small surface area and electrical isolation from the soma, allow the KCNQ2/3 channels to influence AIS excitability with little or (in this instance) undetectable influence on the RMP (see e.g., Otto et al. 2006, PMID: 16481438; Singh et al. 2008, PMID 16481438  for KCNQ2 mutant mice.  See Hu and Bean, 2018, figure 2; PMID: 29526554 for explicit testing via focal AIS vs. somatic blocker perfusion).  Additionally, in previous work, we did not find any changes to the RMP of CA1 pyramidal neurons in either Kcnq2 knockout mice (PMID: 24719109) or mice expressing a Kcnq2 GOF variant (PMID: 37607817).  We modified the discussion including adding references to prior studies combining experimental and multicompartmental computational models.

      (4) It was mentioned in the paper about a direct comparison between SLFNE and G256W.

      However, in the slice recordings, there was no comparison. Having these data comparing

      SLFNE to G256W would have been a more fulsome story and would have added to the concept around susceptibility to action potential firing.

      Thank you for this point. We agree that such side-by-side recordings would be interesting.  However, slice recordings were not performed on the SLFNE mice. The study design was based on the fact that extensive prior studies of both haploinsufficient and missense human SLFNE variant mice have been published (Otto et al. 2006 J Neuroscience, PMID: 16481438; Singh et al. 2008, PMID 16481438; Kim et al 2020 PMID: 31283873) and show good agreement, but DEE missense variants have not been previously studied. We revised the discussion, to place the current DEE model results in the context of the prior SNFLE model slice work. We contrast the similarity of the CA1 cellular hyperexcitability phenotype ex vivo (at least in CA1 pyramidal cells) across models to the differences in electrographic and behavioral seizures (i.e., network level physiology).  

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes the symptoms of patients harboring KCNQ2 mutation G256W, functional changes of the mutant channel in exogenous expression, and phenotypes of G256W/+ mice. The patients presented seizures, the mutation reduced currents of the channel, and the G256W/+ mice showed seizures, increased firing frequency in neurons, reduced KCNQ2 expression, and altered subcellular distribution.

      Strengths:

      This is a large amount of work and all results corroborated the pathogenicity of the mutation in KCNQ2, providing an interesting example of KCNQ2-associated neurological disorder's impact on functions at all levels including molecular, cellular, tissue, animal model, and patients.

      Weaknesses:

      The manuscript described observations of changes in association with the mutation at molecular cellular functions and animal phenotype, but the results in some aspects are not as strong as in others. Nevertheless, the manuscript made overarching conclusions even when the evidence was not sufficiently strong.

      Thank you for your review.  In our revision (as listed in the recommendations to authors section) we have attempted to better justify the conclusions you mention there.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      Page 7: the authors' statement that G256 could be intolerant to substitution would be strengthened by a straightforward analysis of available genome- and exome-wide sequencing data to determine the level of genic intolerance at this position in the human population, as has been used previously to highlight critical residues including those impacted by pathogenic variants in many other proteins including ion channels (e.g., Genome Biology 17:9, 2016; Am J Hum Genet 99:1261, 2016; Biochim Biophys Acta Biomemb 1862:183058, 2020).

      Thank you for this suggestion, we have revised the opening of this section to point out the low ratio of benign to pathogenic variants in the region surrounding G256 shown by prior work. We have added citations to the papers describing the MTR and gnomAD tools that highlight these data and calculations.   

      The overall interpretation of the CHO cell results would be enhanced by the authors including in their discussion an explicit statement that they did not attempt to evaluate the overall and plasma membrane expression levels of the exogenously expressed WT and mutant KCNQ2 subunits, nor that of KCNQ3, in the transfected CHO cells. They could also highlight that this is an important future experiment to determine whether the dominant negative effects are due to impaired expression/trafficking or impaired function of plasma membrane channels, as this may be an important consideration for designing therapeutic strategies.

      We agree.  We revised the discussion to explicitly mention this additional direction.  We agree this topic has therapeutic implications, especially given our in vivo protein localization results.  We added a mention that combinations of molecules enhancing surface localization with channel openers could be a therapeutic strategy, analogous to approved therapies for cystic fibrosis.  

      The authors conclude that the impact of ezogabine treatment is reduced in the cells expressing G256+/W versus those expressing WT KCNQ2. However, the delta pA/pF graph in panel 3G expresses the effects of ezogabine as absolute increases in current density. Determining the relative increase (i.e., fold change) in current density in ezogabine-treated versus control conditions is a more valid way to analyze these data. This provides a better reflection of the impact of ezogabine as the control currents already have a much larger amplitude than the G256+/W currents. By eye the impact of ezogabine looks comparable or even larger for the G256+/W condition than for WT, fundamentally changing the interpretation of these results.

      Thank you for this helpful comment.  The reviewer calls attention to the fact that although G256W/+ mean whole cell currents from are less than WT, before and after application of ezogabine, it appeared from Fig. 3G that ezogabine enhanced currents to a “proportionally equivalent extent” in G256W/+ and WT cells.  We revised panel 3G to try to make this more clear.  It now shows WT currents +/- ezogabine currents normalized to (WT, no ezogabine at +40 mV), along with G256W/+ cells +/- ezogabine currents, normalized to (G256W/+, no ezogabine at +40 mV).  This normalization shows that the mixed population of channels expressed by G256W/+ cells are equally augmented (with a trend toward greater augmentation), compared to controls.  This is a striking result given that channels lacking WT KCNQ2 subunits do not respond to ezogabine (i.e., the “homozygous heteromer” condition, Fig. 3F) do not respond to ezogabine.  Although the underlying data are unchanged, we agree with the reviewers’ conclusion about emphasizing the effect “per channel”.  This reframing is mechanistically and clinically important.  We have made changes to the results text and discussion to highlight related issues.   

      Figure 7: it is not clear from the information presented whether the qPCR would only measure WT KCNQ2 mRNA levels or detect levels of both WT and E254fs transcripts. The authors assume nonsense-mediated decay, but they did [not] determine experimentally that this occurred. The sequencing in the supplemental figure shows the presence of E254fs transcripts but does not allow for insights into their abundance. It should be straightforward to develop primer sets that could then be used to selectively amplify WT and E254fs transcripts for quantitation. 

      Thank you for this helpful suggestion.  The assay used in the initial submission measures total Kcnq2 mRNA. We developed and performed a new assay where the probe binding site is the WT sequence, centered on the mutations. New Figure 7-Figure supplement 1, panel A is a cartoon showing the differences between the assays.  Using the WT alleleselective RT-qPCR assay, both  G256W/+ and E254fs/+ samples showed a 50% loss of WT Kcnq2.   We now can conclude that NMD is absent for G256W and incomplete for E254fs mRNA. Neither mutant heterozygous line shows a compensatory increase in WT Kcnq2 expression.  These conclusions are much more specific than previously, and documenting incomplete NMD of KCNQ2 is novel and of potential clinical significance.  The KCNQ2 protein (western blot) and WT mRNA (qPCR) results now agree, both showing ~50% loss.   

      For reporting transparency, the authors should provide the sequences of each of the primers used. Perhaps this is in the "key reagents" section, but this was missing from the manuscript. I note the authors use NMD in this section without defining it. and added a reference to a review where “incomplete NMD” is discussed.

      We have added the assay catalogue numbers to the key reagents table.  We eliminated the use of the NMD abbreviation. We added citations to the “incomplete NMD” literature including an excellent recent review and a directly relevant primary paper.  These show how NMD efficiency may differ: between genes, transcripts, cells, tissues and, remarkably, between human individuals (see doi: 10.1093/hmg/ddz028, cited in the review—caffeine inhibits NMD!).  The revised discussion mentions this, and relevance to future studies of novel KCNQ2 variant pathogenicity and severity prediction.  

      Recommendations for improving the writing and presentation.

      I found the presentation of the IHC images deficient in terms of accessibility and transparency. While the movies provided are also useful, it is important the authors also provide conventional static merged images of each of their multiplex labeling images in the body of the paper. This allows a reader to see the labeling with the different antibodies in the context of each other (one of the major advantages of multiplex labeling), instead of trying to remember the pattern each label gave in prior sections of the movie.

      [We queried the reviewer via the eLife editorial staff]: To clarify my suggestion to improve Figure 8, the authors should generate from their movies static images that are basically what they already did in Fig8S3 for the G256W Het panel of the Fig8 movie. This involves revising Fig8S3 to include WT panels, and adding two new supplemental figures that show WT/Het panels with the separate antibodies and then a merged image from Fig8S1 and Fig8S2, just like they did in Fig8S3 for the mutant part of the Fig8 movie.

      Thank you for this comment. As suggested by the reviewer, for each IHC movie (Fig. 8, Fig. 8-figure supplement 1 and Fig. 8-figure supplement 2), we added a new supplementary  figure showing WT and mutant animal static images corresponding to the movies.  For main Figure 8 (CA1, G256W/+ comparison), the new static images enable evaluating the patterns of colocalization by providing selected portions of the images at the highest useful magnification.  These show  each individual antibody in greyscale (best for comparing) and 4 different green-red merged images to show overlap (yellow) vs non-overlap.  The merged images demonstrate colocalization of KCNQ2 and KCNQ3 at the distal portions of AnkG-labelled CA1 pyramidal cell AISs, in agreement with prior publications.  In G256W/+ but not E254fs/+ images, KCNQ2 and KCNQ3 show reduced relative labeling of AISs and increased relative labeling of somata in the pyramidal cell layer.   For CA3, the merged views show the redistributed relative labeling of KCNQ2 and KCNQ3 between stratum lucidum and stratum pyramidale.  

      We also revised Fig. 8 supplement 3 (CA1) to include WT panels, On reexamination, all WT interneurons  in the small sample lacked somatic KCNQ2 and KCNQ3 labeling.  Some s. oriens and radiatum AISs of both WT and G256W/+ sections showed KCNQ2 and KCNQ3 labeling, as shown in the revised figure.  Counting statistics are included in the supporting data.  Importantly, our belief that the images shown are representative is supported by the blinded analysis of a much larger sample (Figure 9, unchanged in revision).  

      Dragging the movie viewer “slider” allows the viewer to move  back and forth between color channels.  It works well in eLife if used in that way.   This is a way of seeing the “representativeness” of the merges shown in the CA1 conventional static images, which necessarily include a smaller x-y area and include only a few AISs.   We also added a KCNQ2/KCNQ3 merge to the movies. 

      Western blot results in Figure 9 - Supplement 1: for transparency, the authors need to show the entire blot, as they did in Figure 4 - Supplement 2. This is required in many journals, and in the case of KCNQ2 it provides crucial information as to the different forms of KCNQ2 present on SDS gels in these samples that contain different KCNQ2 isoforms. Given the surprising decrease in levels of KCNQ2 monomer in the G256+/W mice, it is important to present and analyze the levels of the monomer, dimer, and higher oligomeric forms of KCNQ in these samples, to determine whether protein "missing" in the monomeric form is not present in the dimeric or higher oligomeric form. This is especially important as the G256W mutant could lead to misfolding and aggregation leading to a higher proportion of both WT and G256W subunits being present in a higher-order oligomeric form. I note that it is odd that the figure legend states "Images of entire filter used for western blot of lysates, probed for KCNQ2 and KCNQ3.", even though only selected portions are shown.

      Thank you for this suggestion. We agree that the wording of the legend needed improvement.  

      In revision, the western blots are renumbered as Figure 10, and Figure 10-Figure supplement 1. In the main figure, monomer bands and densitometry are shown, as previously.  In the new Figure 10-Figure supplement 1,  we show (1) the ECL image of the entire filter probed with rabbit anti-KCNQ2, (2) the same blot, stripped, and reprobed with guinea pig KCNQ3, (3) the lower portion, probed with mouse anti-tubulin. The revised Fig. 10-fig supplement 1 shows 3 genotypes x 3 individual (male) p21 mice, with all steps performed in parallel from homogenization to ECL detection.  As suggested, we performed new analysis of the immunoreactive bands corresponding to (apparent) monomer, dimer, and higher oligomeric forms of KCNQ2. Analysis of the sum of those bands showed loss of KCNQ2 protein in both mutant lines.  

      The methods are sufficiently detailed with the exception that there is inconsistent inclusion of catalog numbers and RRIDs. Having these would improve transparency as to specific reagents used and would allow for enhanced reproducibility of the lab research performed here.

      The revised submission includes the key resources table, which we understood was not requested from eLife at initial submission. 

      Minor corrections to the text and figures.

      Typos/mistakes as to antibodies used in the IHC methods section "anti-AnkG36 N106/36 " should be "anti-AnkG N106/36", and "mouse anti-PanNav IgG1 supernatant" should be mouse anti-PanNav IgG1 purified antibody". 

      Thank you, corrections made.

      It would facilitate a reader's interpretation of the IHC results if the authors explicitly stated in the IHC results section that the KCNQ2 antibody used is against the N-terminus and therefore should recognize both mutant isoforms as the mutations are downstream of this.

      We added this point to the results section in relation to Figure 4-figure supplement 2 (western), and in IHC methods.

      PV is not defined when used in the discussion, nor is why knowing that somatic KCNQ2 immunolabeling is present in both PV and non- PV interneurons of WT mice of value to the reader.

      We revised these sentences for clarity.

      The IHC methods state that "mice were transcardially perfused with....ice cold 2% paraformaldehyde in PBS, freshly prepared from a 20% stock (Electron Microscopy Sciences).". The authors presumably mean "formaldehyde" as paraformaldehyde is the inert polymeric storage form of active depolymerized monomeric formaldehyde that is a fixative.

      The reviewer is correct regarding the chemistry; the manufacturer’s product name is “Paraformaldehyde 20% aqueous solution”.  We revised accordingly.

      Reviewer #3 (Recommendations For The Authors):

      Some comments regarding the presentation are as follows.

      (1) The section "G256W lies atop a dome-shaped hydrogen bond network linking helix S5 to the turret and selectivity filter" is entirely based on structural observations without functional validation. This may be more appropriate in Discussion. The emphasis on the "turret arch" bonding should be tuned down due to the lack of functional support.

      We understand and agree with this concern about the distinction between structural analysis and implied function.  However, we believe that the structural model reinterpretation and phylogenetic sequence analysis in our submission are results.  Structures as complex as those of KCNQ channels necessarily cannot be fully shown or analyzed in an initial publication. To our knowledge, the word “turret” has not appeared in a KCNQ channel cryoEM paper to date.  Bringing clinical motivation to prioritize study of an overlooked spot on the channel is creditworthy. The comprehensive heterologous patch clamp results in our study (including absence of effects on voltage-dependence, evidence of partial functional activity of channels containing one mutant subunit per channel shown for KCNQ2 homomers, KCNQ2/3 heteromers, and via acute ezogabine rescue experiments in the biologically most relevant heteromers) are functional evidence consistent with G256W acting through disruption of the SF.  

      However, we agree that more support is needed. The words “dome” and “arch”, though accurate for describing shape, tend to imply a mechanical “load bearing and distributing” function --our study does not prove this. Accordingly, we have toned down the emphasis by removing the words “keystone”, “turret dome bonding”, and  “as a structural novelty” from the abstract.   The revised discussion section replaces arch with “arch-shaped”, calls the idea that the turret functions as a stabilizing arch a “novel hypothesis”, and proposes next experiments (with relevant citations).

      Section title "Heterozygous G256W mice have neonatal seizures" does not seem to match the results since there was only one mouse that showed neonatal seizures.

      Thank you, we have revised the section title.  The text is transparent regarding sample size. The discussion highlights that these seizures are rare (indeed, not previously shown for any heterozygous missense model, to our knowledge).

      (2) It will be nice for the non-expert readers if the observations of "discrete seizures", "clusters", "diffuse bilateral onset", "unilateral onset" etc. are marked in Figure 1.

      Thank you for making this point. Figure 1 shows key excerpts of one bilateral onset seizure; a unilateral onset example isn’t shown since previous KCNQ2 DEE papers we cite have emphasized and illustrated focal onset seizures (Weckhuysen et al., 2013; Numis et al., 2014).    We revised the results section (p. 4) and Figure 1 and supplement captions to improve clarity for all readers including non-specialists.  

      (3) Figure 5 and page 10 first paragraph. Please specify the number of cells and the number of mice that were studied.

      Thank you, this information has been added to legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      (1) The authors claim that the negative frequency dependence that maintains polymorphism in their model results from a non-linear relationship between the display trait and sexual success [...] Maybe I missed something, but the authors do not provide support for their claim about the negative frequency-dependence of sexual selection in their simulations. To do so they could (1) extract the relationship between the relative mating success of the two male types from the simulations and (2) demonstrate that polymorphism is not maintained if the relationship between male display trait and mating success is linear.

      We believe that there is a confusion of terminology here. We agree that for the two alleles at a locus impacting male display in our model, the allele conferring inferior display quality will have a fitness that increases as its frequency increases, so this allele displays positive frequency dependent fitness. And, the alternate, display-favoring allele at the locus does display negative frequency dependence. Our use of the terminology ‘negative frequency dependence’ was meant to refer to the negative dependence of the fitness of the display-favoring allele with respect to its own frequency. However, a significant body of literature instead discusses models in which both an allele and its alternate(s) are beneficial when at low frequency and deleterious when at high frequency under the same selective challenge, entailing negative frequency dependence of fitness for all alleles involved. This benefit-when-rare model of a single trait is often described simply as negative frequency dependence, and generates balancing selection at the locus, but is not the model we are presenting here, and does not encompass all models involving negative frequency dependent fitness. This lexical expectation may make the interpretation of our work more difficult, and we have amended the manuscript to make our model clearer (lines 227-231). In this model, we have a negative frequency dependence for the fitness of the display-favoring allele in mate competition, but the net selective disadvantage of this allele at high frequency is due to a cost in another, pleiotropic, fitness challenge: the constant survival effect. So, the alleles are under balancing selection where alternate alleles are favored by selection when rare, but not due solely to selection during mate competition. Instead, our model relies on pleiotropy for an emergent form of frequency-dependent balancing selection (in the sense that each allele is predicted to be beneficial on balance when rare).

      In the reviewer’s model of the success of two alleles at one locus, the ratio of success is vaguely linear with allele frequency for n=3, though it starts quite convex and has an inflection point between convex and concave segments (for the disfavored allele) at p≈0.532. This is visualized easily by plotting the function and its derivatives in Wolfram-Alpha. For n>=4, the fitness function with respect to the display-favoring/disfavoring allele becomes increasingly concave/convex respectively, and this specific nonlinearity is needed to act along with the antagonistic pleiotropy to maintain balancing selection, rather than being maintained by a model that favors any rare allele on the basis of its rarity in some manner. In an attempt to make the importance of the encounter number parameter clearer, we’ve generated new panels for Figure S1 which simulate encounter numbers 2, 3, and 4, and we have updated corresponding text and figure references in lines 335-338.

      For (1-2), it is not clear how to modify the simulation such that the relationship between the trait value and mating success can be perfectly linear - either linear with respect to allele frequency in a one locus model or linear with respect to trait value at a specific population composition, without removing the simulation of mate competition altogether. While it may be of interest to explore a more comprehensive range of biological trade-offs in future studies, we are not able to meaningfully do so within the context of the present manuscript.

      (2) The authors only explore versions of the model where the survival costs are paid by females or by both sexes. We do not know if polymorphism would be maintained or not if the survival cost only affected males, and thus if sexual antagonism is crucial.

      We now present simulations with male costs only as added panels to Figure S1 and mention these results in the main text (lines 334-335). Maintenance of the polymorphism is significantly reduced or completely absent in such simulations.

      (3) The authors assume no cost to aneuploidy, with no justification. Biologically, investment in aneuploid eggs would not be recoverable by Drosophila females and thus would potentially act against inversions when they are rare.

      We did offer some discussion and justification of our decision to model no inherent fitness of the inversion mutation itself, specifically aneuploidy, in lines 36-39 and 78-80 of the original reviewed preprint. Previous research suggests that D. melanogaster females may not actually invest in aneuploid eggs generated from crossover within paracentric inversions. While surprising, and potentially limited to a subset of clades, many ‘r-selected’ taxa or those in which maternal investment is spread out over time may have some degree of reproductive compensation for non-viable offspring, which can reduce the costs of generating aneuploids significantly (for example, t-haplotypes in mice). We have added this example and citation to lines 34ff in the current draft.

      (4) The authors appear to define balanced polymorphism as a situation in which the average allele frequency from multiple simulation runs is intermediate between zero and one (e.g., Figure 3). However, a situation where 50% of simulation runs end up with the fixation of allele A and the rest with the fixation of allele B (average frequency of 0.5) is not a balanced polymorphism. The conditions for balanced polymorphism require that selection favors either variant when it is rare.

      We originally chose mean final frequency for presenting the single locus simulations based on the ease of generating a visual plot that included information on fixation vs loss and equilibrium frequency. Figure 3 and related supplemental images have been changed to now also represent the proportion of simulations retaining polymorphism at the locus in the final generation.

      (5) Possibly the most striking result of the experiment is the fact that for 14 out of 16 combinations of inversion x maternal background, the changes in allele frequencies between embryo and adult appear greater in magnitude in females than in males irrespective of the direction of change, being the same in the remaining two combinations. The authors interpret this as consistent with sexually antagonistic pleiotropy in the case of In(3L)Ok and In(3R)K. The frequencies of adult inversion frequencies were, however, measured at the age of 2 months, at which point 80% of flies had died. For all we know, this may have been 90% of females and 70% of males that died at this point. If so, it might well be that the effects of inversion on longevity do not systematically differ between the ages and the difference in Figure 9B results from the fact that the sample includes 30% longest-lived males and 10% longest-lived females.

      This critique deserves some consideration. The aging adults were separated by sex during aging, but while we recorded the number of survivors, we did not record the numbers of eclosed adults and their sexes initially collected out of an interest in maintaining high throughput collection. We therefore cannot directly calculate the associated survival proportions, but we can estimate them. We collected 1960 females and 3156 males, and we can very roughly estimate survival if we assume that equal numbers of each sex eclosed, and that the survivors represent 20% of the original population. That gives 12790 individuals per sex, or 84.7% female mortality and 75.3% male mortality.

      So, we have added a qualification discussing the possibility of stronger selection on females and its influence on observed sex-specific frequency changes, on lines 602-605.

      (6) Irrespective of the above problem, survival until the age of 2 months is arguably irrelevant from the viewpoint of fitness consequences and thus maintenance of inversion polymorphism in nature. It would seem that trade-offs in egg-to-adult survival (as assumed in the model), female fecundity, and possibly traits such as females resistance to male harm would be much more relevant to the maintenance of inversion polymorphisms.

      Adult Drosophila will continue to reproduce in good conditions until mortality, and the estimated age of a mean reproductive event for a Drosophila melanogaster individual is 24 days (Pool 2015), and likewise for D. simulans (Turelli and Hoffman 1995). Given that reproduction is centered around 24 days, we expect sampling at 2 months of age to still be relevant to fitness. In seasonally varying climates, either temperate or with long dry season, survival through challenging conditions is expected to require several months. In many such cases, females are in reproductive diapause, and so longevity is the main selective pressure. See lines 931-936 in the revised manuscript.

      As we agreed above, it would of interest to investigate a wider range of trade-offs in future studies. We focused here on the balanced between survival and male reproductive success because the latter trait generates negative frequency dependence for display-favoring alleles and a disproportionate skew towards higher quality competitors, whereas many other fitness-relevant traits lack that property.

      (7) The experiment is rather minimalistic in size, with four cages in total; given that each cage contains a different female strain, it essentially means N=1. The lack of replication makes statements like " In(2L)t and In(2R)NS each showed elevated survival with all maternal strains except ZI418N" (l. 493) unsubstantiated because the claimed special effect of ZI418N is based on a single cage subject to genetic drift and sampling error. The same applies to statements on inversion x female background interac7on (e.g., l. 550), as this is inseparable from residual variation. It is fortunate that the most interesting effects appear largely consistent across the cages/female backgrounds. Still, I am wondering why more replicates had not been included.

      Our experimental approach might be described as “diversity replication”. Essentially, the four maternal genetic backgrounds are serving dual purposes – both to assess experimental consistency and to ensure that our conclusions are not solely driven by a single non-representative genotype (which in so many published studies, can not be ruled out). It would indeed be interesting if we could have quadrupled the size of our experiment by having four replicates per maternal background. However, we suspect the reviewer may not recognize the substantial effort involved in our four existing experiments. Each of these involved collecting 500+ virgin females, hand-picking thousands of embryos during the duration of egg-laying, and repeatedly transferring offspring to maintain conditions during aging, such that cages had to be staggered by more than a month. These four cages took a year of benchwork just to collect frozen samples, before any preparation and quality control of the associated amplicon libraries for sequencing. Adding a further multiplier would take it well beyond the scope of a single PhD thesis.  Fortunately, we were able to obtain the key results of interest without that additional effort, even if clearer insights into the role of maternal background would also be of strong interest.

      We do agree that no firm conclusions about maternal background can be reached without further replication, and so we have qualified or removed relevant statements accordingly (lines 568ff, 620-622).

      Reviewer #1 (Recommendations For The Authors):

      The description of the model is confusing and incomplete, e.g., the values of several parameters used to obtain the numerical results are not given. It is first stated (l. 223) that the model is haploid, but text elsewhere talks about homozygotes and heterozygotes. If the model is diploid (this in itself is not clear), what is assumed about dominance?

      We are not presenting results for a mathematical model estimated numerically. We have now clarified our transition from a conceptual depiction of our model, in which we use haploid representations for simplified presentation, to our forward population genetic simulations, which are entirely diploid. More broadly, we have improved our communication of the assumptions and parameters used in our simulations. The scenarios we investigate involve purely additive trait effects within and between loci (except that survival probabilities are multiplicative to avoid negative values). We think that considering other dominance scenarios would be a worthy subject for a follow-up study, whereas the present manuscript is already covering a great deal of ground.   

      Similarly, it is hard to understand the design (l.442ff). I was confused as to whether a population was set up for each inversion or for all of them and what the unit or replication was. I found the description in Methods (l. 763-771) much clearer and only slightly longer; I suggest the authors transfer it to the Results. Also, Figure 8 should contain the entire crossing scheme; the current version is misleading in that it implies males with only two genotypes.

      All four tested inversions were segregating within the same karyotypically diverse population of males, and were assayed from the same experiments. We have attempted to improve the relevant description. For Figure 8, we had trouble conceiving a graphic update that contained a more complete cross scheme without seeming much more confused and cluttered. We have tried to clarify in the relevant text and the figure caption instead.

      There are a number of small issues that should be addressed:

      - No epistasis for viability assumed - what would be the consequence?

      We explored a model in which we intentionally included no terms for epistatic effects on phenotype. All epistasis with regard to fitness is emergent from competition between individuals with phenotypes composed of non-epistatic, non-dominant genetic effects. So, the simplest model of antagonism would have no epistasis for viability whatsoever. One could explore a model that has emergent viability epistasis in a similar way, by implementing stabilizing selection on a quantitative trait with a gaussian or similar non-linear phenotype-to-fitness map, but that might be better served as a topic for a future study. We have, however, tried to make this intent clearer in the text.

      l. 750 implies that aneuploidy generated by the inversion has no cost (aneuploid games are resampled)

      Yes, as addressed in public review item (3). Alternately see lines 34ff, 293, 369, 392 for in-text edits.

      l. 24-25: unclear; is this to mean that there is haplotype x sex interaction for survival?

      l. 25: success in what? (I assume this will be explained in the paper, but the abstract should stand on its own).

      l. 193-4: "producing among most competitive males": something missing or a word too much?? Figure 1B,C: a tiny detail, but the plots would be more intuitive if the blue (average) bars were ager (i.e., to the right) of the male and female ones, given that the average is derived from the two sex-specific values.

      Each of the above have been edited or implemented as suggested

      l. 205. It is convex function, but I do not understand what the authors mean by "convex distribution".

      Hopefully the updated text is clearer: “yielding a distribution of male reproductive output that follows a relatively convex trend”.

      l. 223ff: some references to Fig 1 panels in this paragraph seem off by one letter (i.e., A should be B, etc.).

      l. 231 "fitness...are equally fit": rephrase 

      l. 260: maybe "thrown out" is not the most fortunate term, maybe "eliminated" would be better?

      Each of the above have been edited or implemented as suggested

      Figure 3: I do not understand the meaning of "additive" and "multiplicative" in the case of a single locus haploid model

      All presented simulations are diploid, and these refer to the interactions between the two alleles at the locus. Hopefully the language is overall clearer in this draft.

      l. 274: "Mutation of new nucleotide" meaning what? Or is it mutation _to_ a new nucleotide?

      Hopefully the revised text is clearer.

      Figure 5. The right panel of figure 5A implies that, with the inversion, the population evolves to an extreme display trait that is so costly that it fills 95% of all individuals (or of all females?

      What is assumed about this here?). Apart from the biological realism of this result, what does it say about the accumulation of polymorphism and maintenance of the inversion? The graphs in fig 5B do plot a divergence between haplotypes, but it is not clear how they relate to those in panel A - the parameter values used to generate these plots are again not listed. Furthermore, from the viewpoint of the polymorphism, it would be good to report the frequencies at the steady-state.

      We have now clarified the figure description, including the parameter values used. The distribution of frequencies at the end of the simulation is represented in figure 6. Given that we set up the simulation with assumptions that are otherwise common to population models, what biological process would prevent this extreme? Why isn’t this extreme observed in natural populations? One possible explanation is that they become sex chromosomes, with increasing likelihood as the cost increases. Or other compensatory changes may occur that we don’t simulate, like regulatory evolution giving a complementary phenotype. Maybe genetic constraints in natural populations prevent the mutation of the kind of pleiotropic mutations that drive this dynamic. The populations still survive, though they are parameterized by relative fitness. What would an absolute fitness population function be? Would it go extinct or not? It would be of interest to explore a wider range of models, but it is the purpose of this paper to establish that this is a viable model for the maintenance of sexually antagonistic polymorphism and association with inversions. We have added a paragraph motivated by this comment to the Discussion starting on line 765.

      l. 401-2: Z-like, W-like : please specify you are talking about patterns resembling sex chromosomes. 

      l. 738: "population calculates"?

      l. 743-4 and 746-7: is this the same thing said twice, or are there two components of noise?  l. 357: there is no figure 5C.

      Each of the above have been addressed with text edits.

      L. 473-5: Yes, the offspring did not contain inversion homozygotes, but the sire pool did, didn't it? So homozygous inversions may have affected male reproductive success. Anyway, most of this paragraph (from line 473) seems to belong in Discussion rather than Results.

      We have revised this sentence to focus on offspring survival. 

      We can understand the reviewer’s suggestion about Results vs. Discussion text. While this can often be a challenging balance, we find that papers are often clearer if some initial interpretation is offered within the Results text. However, we moved the portion of this paragraph relating our findings to the published literature to the Discussion.

      l. 516: " In(3L)Ok favored male survival": this is misleading/confusing given the data, " In(3L)Ok reduced female survival more strongly than male survival..."

      Hopefully the phrasing is clearer now.

      l. 663ff: I did not have an impression that this section added anything new and could safely be cut.

      We have done some editing to make this more concise and emphasize what we think is essential, but we believe that the model of an autosomal, sexually antagonistic inversion differentiating before contributing to the origin of a sex chromosome is novel and interesting. And, that this additional emphasis is worthwhile to encourage thought and consideration of this idea in future research and among interested researchers.

      l. 751: "flat probability per locus": do the authors mean a constant probability?

      Edited.

      Reviewer #2 (Public Review):

      The manuscript lacks clarity of writing. It is impossible to fully grasp what the authors did in this study and how they reached their conclusions. Therefore, I will highlight some cases that I found problematic.

      Hopefully the revised manuscript improves writing clarity. 

      Although this is an interesting idea, it clearly cannot explain the apparent influence of seasonal and clinal variation on inversion frequencies.

      We do not believe that our model predicts a non-existence of temporal and spatial dependence of the fitness of inverted haplotypes, nor do we seek to identify the manner in which seasonal and clinal differences affect fitness of inverted haplotypes. Rather, we argued that the influence of seasonal and clinal selection on inversions does not on its own predict the observed maintenance of inversions at low to intermediate frequencies across such a diverse geographic range, along with the higher frequencies of many derived inversions in more ancestral environments. 

      We might imagine that trade-offs between life history traits such as mate competition and survival should be universal across the range of an organism. But in practice, the fitness benefits and costs of a pleiotropic variant (or haplotype) may be heavily dependent on the environment. A harsh environment such as a temperate winter may both reduce the number of females that a male encounters (decreasing the benefit of display-enhancing variants) and also increase the likelihood that survival-costly variants lead to mortality (thus increasing their survival penalty). In light of such dynamics, our model would predict that equilibrium inversion frequencies should be spatially and temporally variable, in agreement with a number of empirical observations regarding D. melanogaster inversions.

      We have edited the introduction to emphasize that inversion frequencies vary temporally as well as seasonally, on lines 144ff. We also note relevant discussion of the potential interplay between the environment and trade-offs such as those we investigate, on lines 153-155.

      The simulations are highly specific and make very strong assumptions, which are not well-justified.

      We respond to all specific concerns expressed in the Recommendations For The Authors section below. We also note that we have made further clarifications throughout the text regarding the assumptions made in our analysis and their justification.  

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript would greatly benefit from a major rewrite and probably also a reanalysis of the empirical data.

      In particular, a genome-wide analysis of differences in SNP frequencies between sexes and developmental stages would help the reader to appreciate that inversions are special.

      [moved up within this section for clarity] We are lacking a genomic null model-how often do the authors see similar allele frequency differences when looking at the entire genome? This could be easily done with whole genome Pool-Seq and would tell us whether inversions are really different from the genomic background. I think that this information would be essential given the many uncertainties about the statistical tests performed. 

      We expect that autosome-wide SNP frequencies will be heavily influenced by the frequencies of inversions, which occur on all four major autosomal chromosome arms. These inversions often show moderate disequilibrium with distant variants (e.g. Corbett-Detig & Hartl 2012).

      Furthermore, the limited number of haplotypes present, given that the paternal population was founded from 10 inbred lines, would further enhance associations between inversions and distant variants. Therefore, we do not expect that whole-genome Pool-Seq data would provide an appropriate empirical null distribution for frequency changes. Instead, we have generated appropriate null predictions by accounting for both sampling effects and experimental variance, and we have aimed to make this methodology clearer in the current draft. 

      Some basic questions:

      why start at a frequency of 50% (line 287)?

      Isn't it obvious that in this scenario strong alleles with sexually antagonistic effects can survive?

      The initial goal of the associated Figure 4 was not to show that a strongly antagonistic variant could persist. Instead, we wanted to test the linkage conditions in which a second, relatively weaker antagonistic variant survived – which did not occur in the absence of strong linkage. 

      We have now added simulations with relatively lower initial frequencies, in which the weaker variant and the inversion both start at 0.05 frequency, while the stronger variant is still initialized at 0.5 to reflect the initial presence of one balanced locus with a strongly antagonistic variant. Here, the weaker antagonistic variant is still usually maintained when it is close to the stronger variant, and while the inversion-mediated maintenance of the weaker variant at greater distance from the stronger variant because less frequent than the original investigated case, it still happens often enough to hypothetically allow for such outcomes over evolutionary time-scales.

      Still, we should also emphasize that the goals of this proof-of-concept analysis are to establish and convey some basic elements of our model. Subsequently, analyses such as those presented in Figures 5 and 6 provide clearer evidence that the hypothesized dynamics of inversions facilitating the accumulation of sexual antagonism actually occur in our simulations.

      The experiments seem to be conducted in replicate (which is of course essential), but I could not find a clear statement of how many replicates were done for each maternal line cross.

      How did the authors arrive at 16 binomial trials (line 473)? 4 inversions, 4 maternal genotypes?

      How were replicates dealt with?

      In Figure 9, it would be important to visualize the variation among replicates.

      Unfortunately, we did not have the bandwidth to perform replicates of each maternal line. Instead, we use four maternal backgrounds to simultaneously establish consistency across independent experiments and genetic backgrounds (see our response to Reviewer 1, point 7). We’ve edited the draft to make this clearer and more clearly delineate what is supported and not supported by our data. Replicate variation for the control replicates of the extraction and sequencing process, and the exact read counts of the experiment, are available in Supplemental Tables S5, S6, and S7.

      The statistical analysis of trade-off is not clear: which null model was tested? No frequency change? In my opinion, two significances are needed: a significant difference between parental and embryo and then embryo and adult offspring. The issue with this is, however, that the embryo data are used twice and an error in estimating the frequency of the embryos could be easily mistaken as antagonistic selection.

      Hopefully the description of our null model is clearer in the text, now starting around line 967 in the Methods. We are aware of the positive dependence when performing tests comparing the paternal to embryo and then embryo to offspring frequencies, and this is accounted for by our analysis strategy - see lines 1009-1012.

      It was not clear how the authors adjusted their chi-squared test expectations. Were they reinventing the wheel? There is an improved version of the chi-squared test, which accounts for sampling variation.

      We did not actually perform chi-square tests. Instead, we used the chi statistic from the chi-squared test as a quantitative summary of the differences in read counts between samples. We compared an observed value of chi to values for this statistic obtained from simulated replicates of the experiment. Sampling from this simulation generated our ‘expected’ distribution of read counts, sampled to match sources of variance introduced in the experimental procedure, but without any effect of natural selection, per lines 825ff in the original submission. Hence, we are approximating the likelihood of observing an empirical chi statistic by generating random draws from a model of the experiment and comparing values calculated from each draw to the experimental value: a Monte Carlo method of approximating a p-value for our data. We have attempted to make the structure of these simulations and their use as a null-model clearer in this draft.

      It is not sufficiently motivated why the authors model differences in the extraction procedure with a binomial distribution.

      Adding a source of variance here seemed necessary as running control sequencing replicates revealed that there was residual variance not fully recapitulated by sample-size-dependent resampling. Given that we were still sampling a number of draws from a binomial outcome (the read being from the inverted or standard arrangement), a binomial distribution seemed a reasonable model, and we fit the level of this additional noise source to an experiment-wide constant, read-count or genome-count independent parameter that best fit the variance observed in the controls (lines 830ff in the original draft). Clarification is made in this manuscript draft, lines 979-989.

      How many reads were obtained from each amplicon? It looks like the authors tried to mimic differences between technical replicates by a binomial distribution, which matches the noise for a given sample size, but this depends on the sequence coverage of the technical replicates.

      We provide read counts in Supplemental Tables S6 and S7. The relevant paragraph in the methods has been edited for clarity, lines 972ff. Accounting for sampling differences between replicates used a hypergeometric distribution for paternal samples to account for paternal mortality before collection, and the rest were resampled with a binomial distribution. There were two additional binomial samplings, to account for resampling the read counts and to capture further residual variance in the library prep that did not seem to depend on either allele or read counts.

      It would be good to see an estimate for the strength of selection: 10% difference in a single generation appears rather high to me.

      Estimates of selection strength based on solving for a Wright-Fisher selection coefficient for each tested comparison can now be found in Table S8, mentioned in text on lines 589-590. The mean magnitude of selection coefficients for all paternal to embryo comparisons was 0.322, and for embryo to all adult offspring it was 0.648. For In(3L)Ok the mean selection coefficients were 0.479 and -0.53, and for In(3R)K they were -0.189 and 1.28, respectively. Some are of quite large magnitude, but we emphasize that the coefficients for embryo to adult are based on survival to old age, rather than developmental viability. That factor, in addition to the laboratory environment, makes these estimates distinct from selection coefficients that might be experienced in natural populations.

      Reviewer #3 (Public Review):

      Strengths:

      (1) …the authors developed and used a new simulator (although it was not 100% clear as to why SLiM could not have been used as SLiM has been used to study inversions).

      Before SLiM 3.7 or so (and including when we did the bulk of our simulation work), we do not think it would have been feasible to use SLiM to model the mutation of inversions with random breakpoints and recombination between without altering the SLiM internals. Separately, needing to script custom selection, mutation, and recombination functions in Eidos would have slowed SLiM down significantly. Given our greater familiarity with python and numpy, and the ability to implement a similar efficiency simulator more quickly than through learning C++ and Eidos, we chose to write our own.

      It should be a fair bit easier to implement comparable simulations in SLiM now, but it will still require scripting custom mutation, selection, and recombination functions and would still result in a similarly slow runtime. The current script recipe recommended by SLiM for simulating inversions uses constants to specify the breakpoints of a single inversion, without the ability to draw multiple inversions from a mutational distribution, or model recombination between more complicated karyotypes. Hence, our simulator still seems to be a more versatile and functional option for the purposes of this study.

      Weaknesses:

      [Comments 1 through 4 on Weaknesses included numerous citation suggestions, and some discussion recommendations as well. In our revised manuscript, we have substantially implemented these suggestions. In particular, we have deepened our introduction of mechanisms of balancing selection and prior work on inversion polymorphism, integrating many

      suggested references. While especially helpful, these suggestions are too extensive to completely quote and respond to in this already-copious document. Therefore, we focus our response on two select topics from these comments, and then proceed to comment 5 thereafter.]

      (2) The general reduction principle and inversion polymorphism. In Section 1.2., the authors state that "there has not been a proposed mechanism whereby alleles at multiple linked loci would directly benefit from linkage and thereby maintain an associated inversion polymorphism under indirect selection." Perhaps I am misunderstanding something, but in my reading, this statement is factually incorrect. In fact, the simplest version of Dobzhansky's epistatic coadaptation model

      (see Charlesworth 1974; also see Charlesworth and Charlesworth 1973 and discussion in Charlesworth & Flatt 2021; Berdan et al. 2023) seems to be an example of exactly what the authors seem to have in mind here: two loci experiencing overdominance, with the double heterozygote possessing the highest fitness (i.,e., 2 loci under epistatic selection, inducing some degree of LD between these loci), with subsequent capture by an inversion; in such a situation, a new inversion might capture a haplotype that is present in excess of random expectation (and which is thus filer than average)…

      We agree that the quoted statement could be misleading and have rewritten it. We intended to point out that we are presenting a model in which all loci contribute additively (with respect to display) or multiplicatively (with respect to survival probability), without any dominance relationships or genetic interaction terms. And yet, the model generates epistatic balancing selection in a panmictic population under a constant environment. This represents a novel mechanism by which (the life-history characteristics of) a population would generate epistatic balancing selection as an emergent property, instead of assuming a priori that there is some balancing mechanism and representing frequency dependence, dominance effects, or epistatic interactions directly using model parameters. We have therefore refined the scope of the statement in question (lines 155-158). 

      (4) Hearn et al. 2022 on Littorina saxatilis snails. 

      A good reference. There is considerable work on ecotype-associated inversions in L. saxatalis, but we previously cut some discussion of this and of other populations with high gene flow but identifiable spatial structure for inversion-associated phenotypes (e.g. butterfly mimicry polymorphisms, Mimulus, etc.). Due to the spatially discrete environmental preferences and sampled ranges of the inversions in these populations, we considered these examples to be somewhat distinct from explaining inversion polymorphism in a potentially homogenous and panmictic environment. 

      (4) cont. A very interesting paper that may be worth discussing is Connallon & Chenoweth (2019) about dominance reversals of antagonistically selected alleles (even though C&C do not discuss inversions): AP alleles (with dominance reversals) affecting two or more life-history traits provide one example of such antagonistically selected alleles (also see Rose 1982, 1985; Curtsinger et al. 1994) and sexually antagonistically selected alleles provide another. The two are of course not necessarily mutually exclusive, thus making a conceptual connection to what the authors model here.

      We had removed a previously drafted discussion of dominance reversal for brevity’s sake, but this topic is once again represented in the updated draft of the manuscript with a short reference in the introduction, lines 76-80. We also mention ‘segregation lift’ (Wittmann et al. 2017) involving a similar reversal of dominance for fitness between temporally fluctuating conditions, as opposed to between sexes or life history stages. 

      (5) The model. In general, the description of the model and of the simulation results was somewhat hard to follow and vague. There are several aspects that could be improved:  [5](1) it would help the reader if the terminology and distinction of inverted vs. standard arrangements and of the three karyotypes would be used throughout, wherever appropriate.

      We have attempted to do so, using the suggested heterokaryotypic/homokaryotypic terminology.

      [5](2) The mention of haploid populations/situations and haploid loci (e.g., legend to Figure 1) is somewhat confusing: the mechanism modelled here, of course, requires suppressed recombination in the inversion/standard heterokaryotype; and thus, while it may make sense to speak of haplotypes, we're dealing with an inherently diploid situation. 

      While eukaryotes with haploid-dominant life history may still experience similar dynamics, we do expect that most male display competition is in diploid animals, and we are only simulating diploid fitnesses and experimenting with diploid Drosophila. We have tried to minimize the discussion of haploids in this draft.

      [5](3) The authors have a situation in mind where the 2 karyotypes (INV vs. STD) in the heterokaryotype carry distinct sets of loci in LD with each other, with one karyotype/haplotype carrying antagonistic variants favoring high male display success and with the other karyotype/haplotype carrying non-antagonistic alternative alleles at these loci and which favor survival. Thus, at each of the linked loci, we have antagonistic alleles and non-antagonistic alleles - however, the authors don't mention or discuss the degree of dominance of these alleles. The degree of dominance of the alleles could be an important consideration, and I found it curious that this was not mentioned (or, for that matter, examined). 

      In this study, our goal was to show that the investigated model could produce balanced and increasing antagonism without the need to invoke dominance. We think there would be a strong case for a follow-up study that more investigates how dominance and other variables impact the parameter space of balanced antagonism, but this goal is beyond our capacity to pursue in this initial study. We’ve added several lines clarifying the absence of dominance from our investigated models, and pointing out that dominance could modulate the predictions of these models (lines 211-213, 278-282).  

      [5](4) In many cases, the authors do not provide sufficient detail (in the main text and the main figures) about which parameter values they used for simulations; the same is true for the Materials & Methods section that describes the simulations. Conversely, when the text does mention specific values (e.g., 20N generations, 0.22-0.25M, etc.), little or no clear context or justification is being provided. 

      We have sought to clarify in this draft that 20N was chosen as an ample time frame to establish equilibrium levels and frequencies of genetic variation under neutrality. We present a time sequence in Figure 5, and these results indicate that that antagonism has stabilized in models without inversions or with higher recombination rates, whereas its rate of increase has slowed in a model with inversions and lower levels of crossing over. 

      The inversion breakpoints and the position of the locus with stronger antagonistic effects in Figure 4 were chosen arbitrarily for this simple proof of concept demonstration, with the intent that this locus was close to one breakpoint. Hopefully these and other parameters are clearer in the revised manuscript.

      [5](5) The authors sometimes refer to "inversion mutation(s)" - the meaning of this terminology is rather ambiguous.

      Edited, hopefully the wording is clearer now. The quoted phrase had uniformly referred to the origin of new inversions by a mutagenic process. 

      (6) Throughout the manuscript, especially in the description and the discussion of the model and simulations, a clearer conceptual distinction between initial "capture" and subsequent accumulation / "gain" of variants by an inversion should be made. This distinction is important in terms of understanding the initial establishment of an inversion polymorphism and its subsequent short- as well as long-term fate. For example, it is clear from the model/simulations that an inversion accumulates (sexually) antagonistic variants over time - but barely anything is said about the initial capture of such loci by a new inversion.

      We do not have a good method of assessing a transition between these two phases for the simulations in which both antagonistic alleles and inversions arise stochastically by a mutagenic process. However, we have tried to be clearer on the distinction in this draft: we have included simulations in Figure 4 with variants starting at lower frequencies, and we have tried to better contextualize the temporal trajectories in Figure 5 as (in part) modeling the accumulation of variants after such an origin.

      Reviewer #3 (Recommendations For The Authors):

      - In general: the whole paper is quite long, and I felt that many parts could be written more clearly and succinctly - the whole manuscript would benefit from shortening, polishing, and making the wording maximally precise. Especially the Introduction (> 8 pages) and Discussion (7.5 pages) sections are quite long, and the description of the model and model results was quite hard to follow.

      We have attempted to condense some portions of the manuscript, but inevitably added to others based on important reviewer suggestions. Regarding the length Introduction and Discussion, we are covering a lot of intellectual territory in this study, and we aim to make it accessible to readers with less prior familiarity. At this point, we have well over 100 citations – far more than a typical primary research paper – in part thanks to the relevant sources provided by this reviewer. We are therefore optimistic that our text will provide a valuable reference point for future studies. We have also made significant efforts to clarify the Results and Methods text in this draft without notably expanding these sections.

      - In general: the conceptual parts of the paper (introduction, discussion) could be better connected to previous work - this concerns e.g. the theoretical mechanisms of balancing selection that might be involved in maintaining inversions; the general, theoretical role of antagonistic pleiotropy (AP) and trade-offs in maintaining polymorphisms; previously made empirical connections between inversions and AP/trade-offs; previously made empirical connections between inversions and sexual antagonism.

      In the revised manuscript, we have improved the connection of these topics to prior work.

      - L3: "accumulate". A clearer distinction could be made, throughout, between initial capture of alleles/haplotypes by an inversion vs. subsequent gain.

      Please see point 6 in the response to the Public Review, above.

      - L29: I basically agree about the enigma, however, there are quite many empirical examples in D. melanogaster / D. pseudoobscura and other species where we do know something about the nature of selection involved, e.g., cases of NFDS, spatially and temporally varying selection, fitness trade-offs, etc.

      At least for our focal species, we have emphasized that geographic (and now temporal) associations have been found for some inversions. For the sake of length and focus, we probably should not go down the road of documenting each phenotypic association that has been reported for these inversions, or say too much about specific inversions found in other species. As indicated in our response to reviewer 2, some previously documented inversion-associated trade-offs may be compatible with the model presented here. However, we did locate and add to our Discussion one report of frequency-dependent selection on a D. melanogaster inversion (Nassar et al. 1973).

      - L43: it is actually rather unlikely, though not impossible, that new inversions are ever completely neutral (see the review by Berdan et al. 2023).

      This line was intended to convey that, in line with Said et al. 2018’s results, the structural alterations involved in common segregating inversions are not expected to contribute significantly to the phenotype and fitness (as indicated by lack of strong regulatory effects), and that their phenotypic consequences are instead due to linked variation. We have rewritten this passage to better communicate this point, now lines 44-52. Interpreting Section 2 and Figure 1 of Berdan et al. 2023, the linked variation may be what is in mind when saying that inversions are almost never neutral. We have also added a line referencing the expected linked variation of a new inversion (lines 49-52).

      - L51-73: I felt this overview should be more comprehensive. The model by Kirkpatrick & Barton (2016 ) is in many ways less generic than the one of Charlesworth (1974) which essentially represents one way of modeling Dobzhansky's epistatic coadaptation. Also, the AOD mechanism is perhaps given too much weight here as this mechanism is very unlikely to be able to explain the establishment of a balanced inversion polymorphism (see Charlesworth 2023 preprint on bioRxiv). NFDS, spatially varying selection and temporally varying selection (for all of which there is quite good empirical evidence) should all be mentioned here, including the classical study of Wright and Dobzhansky (1946) which found evidence for NFDS (also see Chevin et al. 2021 in Evol. Lett.)

      On reflection, we agree that we put too much emphasis on AOD and have edited the section to be more representative.

      - L57. Two earlier Dobzhansky references, about epistatic coadaptation, would be: Dobzhansky, T. (1949). Observations and experiments on natural selection in Drosophila. Hereditas, 35(S1), 210-224. hlps://doi.org/10.1111/j.1601-5223.1949.tb033 34.xM; Dobzhansky, T. (1950). Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics, 35, 288-302.hlps://doi.org/10.1093/gene7cs/35.3.288 - In general, in the introduction, the classical chapter by Lemeunier and Aulard (1992) should be cited as the primary reference and most comprehensive review of D. melanogaster inversion polymorphisms.

      - L101: this is of course true, though there are some exceptions, such as In(3R)Mo.

      - L110: the papers by Knibb, the chapter by Lemeunier and Aulard (1992), and the meta-analysis of INV frequencies by Kapun & Flatt (2019) could be cited here as well.

      Citation suggestions integrated.

      - L123 and elsewhere: the common D. melanogaster inversions are old but perhaps not THAT old - if we take the Corbett-Detig & Hartl (2012) es7mates, then most of them do not really exceed an age of Ne generations, or at least not by much. I mean: yes, they are somewhat old but not super-old (cf. discussion in Andolfatto et al. 2001).

      Edited to curb any hyperbole. We agree that there are much more ancient polymorphisms in populations.

      - L133-135. This needs to be rewritten: this claim is incorrect, to my mind (Charlesworth 1974; also see Charlesworth and Charlesworth 1973; discussion in Charlesworth & Flatt 2021).

      Edited. See public review response (2).

      - L154: the example of inversion polymorphism is actually explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Edited to mention this. Inversions are also mentioned in Feldman et al. 1980, Feldman and Balkau 1973, Feldman 1972, and have been in discussion since the origins of the idea.

      - L162ff: see Connallon & Chenoweth (2019).

      Citation suggestion integrated, along with Cox & Calsbeek 2009 which seems more directly applicable, now line 185ff.

      - L169: why? There is much evidence for other important trade-offs in this system.

      Reworded.

      - L178-179: other studies have found that trade-offs/AP contribute to the maintenance of inversion polymorphisms, e.g. Mérot et al. 2020 and Betrán et al. 1998, etc.

      Added Betrán et al. 1998 - a good reference. Moved up mention of Mérot et al. 2020 from later in the text and directed readers to the Discussion, lines 202-205.

      - L198. "alternate inversion karyotypes" - you mean INV vs. STD? It would be good to adopt a maximally clear, uniform terminology throughout.

      Edited to communicate this better.

      - L215-217: this is a theoretically well-known result due to Hazel (1943); Dickerson (1955); Robertson (1955); e.g., see the discussion in the quantative genetics book by Roff (1997) or in the review of Flatt (2020).

      Citations integrated, now lines 232ff.

      - L223 and L245: "haploid" - somewhat confusing (see public review). 

      - L259-260: This may need some explanation. 

      - L261-262: simply state that there is no recombination in D. melanogaster males.

      Edited for increased clarity.

      - L274 (and elsewhere): the meaning of "mutation...of new..inversion polymorphisms" is ambiguous - do you mean a polymorphic inversion and hence a new inversion polymorphism or do you mean polymorphisms/variants accumulating in an inversion?

      - L275: maybe better heterokaryotypic instead of heterozygous? (note that INV homokaryotypes or STD homokaryotypes can be homo- or heterozygous, so when referring to chromosomal heterozygotes instead of heterozygous chromosomes it may be best to refer to heterokaryotypes).

      Per [5](1) and [5](5) in the public review, we have edited our terminology.

      - L276: referral to M&M - I found the description of the model/simulation details there to be somewhat vague, e.g. in terms of parameter settings, etc.

      Further described.

      - L281-282: would SLiM not have worked?

      See public review response.

      - L286-287: why these parameters?

      Further described.

      - L296ff: it is not immediately clear that the loci under consideration are polymorphic for antagonistic alleles vs. non-antagonistic alternative alleles - maybe this could be made clear very explicitly.

      Edited to be explicit as suggested.

      - L341, 343: "inversion mutation" - meaning ambiguous.

      - L348, 352: "specified rate" - vague.

      - L354-357: initial capture and/or accumulation/gain? 

      - L401, 402, 404: Z-, W- and Y- are brought up here without sufficient context/explanation.

      The above have been addressed by edits in the text.

      - L523, 557, 639, 646, and elsewhere: not the first evidence - see the paper by Mérot et al. (2020) (and e.g. also by Yifan Pei et al. (2023)). 

      Citations integrated in the introduction and discussion. Mérot et al. (2020) was cited (L486 in original) but discussion was curtailed in the previous draft. 

      - L558-559. I agree but it is clear that there are many mechanisms of balancing selection that can achieve this, at least in principle; for some of them (NFDS, etc.) we have pretty good evidence. 

      - L576-577. This is correct but for In(3R)C that study did find a differential hot vs. cold selection response.

      Addressed with text edit. 

      - L584-L586: cf. Betrán et al. (1998), Mérot et al. (2020), Pei et al. (2023), etc.

      - L591. "other forms of balancing selection": yes! This should be stressed throughout. Multiple forms of balancing selection exist and they are not mutually exclusive. 

      - L593: consider adding Dobzhansky (1943), Machado et al. (2021) 

      - L596-597: this is rather unlikely, at least in terms of inversion establishment (see Charlesworth 2023; hlps://www.biorxiv.org/content/10.1101/2023.10.16.562579v1).

      - L608: consider adding Kapun & Flal (2019). 

      - L611-612: see studies by Mukai & Yamaguchi, 1974; and Watanabe et al., 1976. 

      - L639, 646: AP - see general literature on AP as a factor in maintaining polymorphism (Rose

      1982, 1985; Curtsinger et al. 1994; Charlesworth & Hughes 2000 chapter in Lewontin Festschrift; Conallon & Chenoweth 2019 - this latter paper is par7cularly relevant in terms of AP effects in the context of sexual antagonism) 

      Citation suggestions integrated.

      - L657: inversion polymorphism is explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Hopefully this is better communicated.

      - L724-755: I felt that this section generally lacks sufficient details, especially in terms of parameter choices and settings for the simula7ons. 

      - L732L: why not state these rates?

      Parameter values are now given a fuller description in figure legends and in the methods.  

      - L746: but we know that mutational effect sizes are not uniformly distributed (?).

      We made this choice for simplicity and to avoid invoking seemingly arbitrary distribution, but one could instead simulate trait effects with some gamma distribution. Display values would still have variable fitness effects that fluctuate with population composition, but we agree that distribution shifted toward small effects would be more realistic.

      - L765: In(3R)P is not mentioned elsewhere - is this really correct?

      That was incorrect, fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Malaria parasites detoxify free heme molecules released from digested host hemoglobins by biomineralizing them into inert hemozoin. Thus, why malaria parasites retain PfHO, a dead enzyme that loses the capacity of catabolizing heme, is an outstanding question that has puzzled researchers for more than a decade. In the current manuscript, the authors addressed this question by first solving the crystal structure of PfHO and aligning it with structures of other heme oxygenase (HO) proteins. They found that the N-terminal 95 residues of PfHO, which failed to crystalize due to their disordered nature, may serve as signal and transit peptides for PfHO subcellular localization. This was confirmed by subsequent microscopic analysis with episomally expressed PfHO-GFP and a GFP reporter fused to the first 83 residues of PfHO (PfHO N-term-GFP). To investigate the functional importance of PfHO, the authors generated an anhydrotetracycline (aTC) controlled PfHO knockdown strain. Strikingly, the parasites lacking PfHO failed to grow and lost their apicoplast. Finally, by chromatin immunoprecipitation (ChIP), quantitative PCR/RT-PCR, and growth assays, the authors showed that both the cognate N-terminus and HO-like domain were required for PfHO function as an apicoplast DNA interacting protein.

      The authors systemically performed multidisciplinary approaches to address this difficult question: what is the function of this enzymatically dead PfHO? I enjoyed reading this manuscript and its thoughtful discussion. This study is not of clinical importance for antimalarial treatments but also deepens our understanding of protein function evolution. While I understand these experiments are challenging to conduct in malaria parasites, the data quality of some of the experiments could be improved. For example, most of the Western blots and Southern blots are not of high quality.

      We thank the reviewer for the positive comments but are a bit puzzled by the final statement about western and Southern blot quality. We agree that the two anti-PfHO western blots probed with custom antibody (Fig. 3- source data 2 and 8) have substantial background signal in the higher molecular mass region >75 kDa. However, we note that the critical region <50 kDa is clear in both cases and readily enables target band visualization. All other western blots probing GFP or HA epitopes are of high quality with minimal off-target background. We present two Southern blot images. We agree that the signal is somewhat faint for the Southern blot demonstrating on-target integration of the aptamer/TetR-DOZI plasmid (Fig. 3- fig. supplement 4), although we note that the correct band pattern for integration is visible. We also note that the accompanying genomic PCR data is unambiguous. The Southern blot for GFP-DHFRDD incorporation into the PfHO locus (Fig. 3- fig. supplement 1) has clear signal and strongly supports on-target integration. The minor background signal in the lower left region of the image does not extend into nor impact interpretation of correct clonal integration.

      Reviewer #2 (Public Review):

      Summary:

      Blackwell et al. investigated the structure, localization, and physiological function of Plasmodium falciparum (Pf) heme oxygenase (HO). Pf and other malaria parasites scavenge and digest large amounts of hemoglobin from red cells for sustenance. To counter the potentially cytotoxic effects of heme, it is biomineralized into hemozoin and stored in the food vacuole. Another mechanism to counteract heme toxicity is through its enzymatic degradation via heme oxygenases. However, it was previously found by the authors that PfHO lacks the ability to catalyze heme degradation, raising the intriguing question of what the physiological function of PfHO is. In the current contribution, the authors determine that PfHO localizes to the apicoplast, determine its targeting sequence, establish the essentiality of PfHO for parasite viability, and determine that PfHO is required for proper maintenance of apicoplasts and apicoplast gene expression. In sum, the authors establish an essential physiological function for PfHO, thereby providing new insights into the role of PfHO in plasmodium metabolism.

      Strengths:

      The studies are rigorously conducted and the results of the experiments unambiguously support a role for PfHO as being an apicoplast-targeted protein required for parasite viability and maintenance of apicoplasts.

      Weaknesses:

      While the studies conducted are rigorous and support the primary conclusions, the lack of experiments probing the molecular function of PfHO limits the impact of the work. Nevertheless, the knowledge that PfHO is required for parasite viability and plays a role in the maintenance of apicoplasts is still an important advance.

      We appreciate the positive assessment. We agree that further mechanistic understanding of PfHO function remains a key future challenge. Indeed, we made extensive efforts to unravel PfHO interactions that underpin its critical function. We elucidated key interactions with the apicoplast genome, reliance on the electropositive N-terminus, association with DNA-binding proteins, and a specific defect in apicoplast mRNA levels. The major limitation we faced in further defining PfHO function is the general lack of understanding of apicoplast transcription and broader gene expression. That limitation and the challenges to overcome it go well beyond our study and will require concerted efforts across several manuscripts (likely by multiple groups) to define the mechanistic features of apicoplast gene expression. We look forward to contributing further molecular understanding of PfHO function as broader understanding of apicoplast transcription emerges.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation. We will make a note of this argument in the revised manuscript.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl.

      sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.  

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      However, we do agree that the unbalanced nature of the sample and effect sizes can be problematic for the between-group comparisons. We aim to replace the student’s t-tests that directly compares explicit and implicit participants with Welch’s t-tests that are better suited for unequal sample sizes and variances.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we will explicitly discuss these potential pitfalls in the revised version of the manuscript.  

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and partype differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, we will add the requested information on achieved power by experiment to the revised version of the manuscript.  

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.  

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we will revamp and expand how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we will expand on our description of the matching and decision process and add additional descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test.  The plots will give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.  

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference. We will clarify these ideas in the revised manuscript.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and

      Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.  

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we can expand on in the revised manuscript.

    1. Author response:

      eLife assessment

      This study presents valuable findings on the role of a well-studied signal transduction pathway, the Slit/Robo system, in the context of the assembly of the hematopoietic niche in the Drosophila embryo. The evidence supporting the claims of the authors is solid. However, one aspect that needs attention is whether the cells are migrating and not being pushed to a more dorsal position through dorsal closure and/or other similar large-scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis and will interest developmental biologists working on molecular mechanisms of tissue morphogenesis.

      We appreciate the thoughtful and quite useful comments provided by each of the referees. Our responses are noted below each referee’s comment.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Nelson et al. is focused on the formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large-scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      Since each referee asked for clarification concerning collective cell migration, we present a combined response further below, placed after the comments from Reviewer #3.

      Strengths:

      (1) Using the expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      (2) Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      (3) Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G).

      (4) New insight into dorsal vessel formation by VM is presented in Figure 4A, B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      (1) The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since the loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Figure 4G).

      We have reexamined our wording in the relevant Results section and, given that this referee agrees that we, “make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G)”, it was not clear how we might temper our conclusions more. Given that PSC cells express Robo1 and Robo2, and that the Vm does not contact the PSC, our ‘reasonable argument’ appears fair and parsimonious. Since we agree with the referee that a reader should be made as aware as possible of alternatives, we will add a comment to the Discussion, reminding the reader of the possibility of a secondary defect.

      (2) If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Figure 4D.

      While we agree that PSC-specific knockdown of Robo1 and Robo2 simultaneously would be ideal, this is not possible. First, the most-effective UAS-RNAi transgenes (that is, those in a Valium 20 backbone) are both integrated at the same chromosomal position; these cannot be simultaneously crossed with a GAL4 transgenic line to attempt double knock down. Additionally, as with all RNAi approaches that must rely on efficient knockdown over the rapid embryonic period, even having facile access to the above does not ensure the RNAi approach will cause as effective depletion as the genetic null condition that we use. Second, as the referee concedes, there is no embryonic PSC-specific GAL4. The proposed use of Antp-GAL4 would cause knockdown in many tissues (PSC, CB, Vm, epidermis and amnioserosa). This would lead to a reservation similar to that caused by our use of the straight genetic double mutant, as regards potential indirect requirement for Robo function.

      (3) Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      First, the Vm does not directly contact the PSC, so it cannot be pushing the PSC dorsally. We will re-examine our text to be certain to make this clear. Second, in our analysis of bin mutants, which lack Vm, LGs and PSCs are able to reach the dorsal midline region in the absence of Vm. Finally, please see our response to Reviewer #3, point 2, for why we maintain that PSC cells are “migrating” even though some PSC cells are attached to CBs.

      Reviewer #2 (Public Review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence, and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams), and cardioblasts (CBs) are in proximity to PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like the specification of their cell types, structure, and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We agree that the functional consequence of PSC mis-positioning is important and a relevant question to eventually address. However, virtually all markers and reagents used to assess the effect of the PSC on progenitor cells and their differentiated descendants are restricted to analyses carried out on the third larval instar - some three days after the experiments reported here. Most of the manipulated conditions in our work are no longer viable at this phase and, thus, addressing the functional consequences of a malformed PSC will require the field to develop new tools. 

      As we noted in the Introduction, the consistency with which the wildtype PSC forms as a coalesced collective at the posterior of the LG strongly suggests importance of its specific positioning and shape, as has now been found for other niches (citations in manuscript). Additionally, in the Discussion we mention the existence of a gap junction-dependent calcium signaling network in the PSC that is important for progenitor maintenance. Without continuity of this network amongst all PSC cells (under conditions of PSC mis-positioning), we strongly anticipate that the balance of progenitors to differentiated hemocytes will be mis-managed, either constitutively, and / or under immune challenge conditions. 

      Finally, to our knowledge, the tools do not exist to build a “split Gal4 system using Antp and Odd promoters”. The expression pattern observed using the genomic Antp-GAL4 line must be driven by endogenous enhancers–none of which have been defined by the field, and thus cannot be used in constructing second order drivers. Similarly, for odd skipped, in the embryo the extant Odd-GAL4 driver expresses only in the epidermis, with no expression in the embryonic LG. Thus, the cis regulatory element controlling Odd expression in the embryonic LG is unknown. In the future, the discovery of an embryonic PSC-specific driver will aid in addressing the specific functional consequences of PSC mis-positioning.

      (2) The densely, parallel aligned fibers in the part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      See response to Reviewer #3.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We appreciate the Reviewer’s concern and acknowledge that the phenotypes we observed were indeed variable, and, at times subtle. As we show and discuss in the paper, our results revealed that the signaling requirements for proper PSC positioning are complex; this was favorably commented upon by Reviewer #1 (“...highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development.…”). We suspect the phenotypic variability is attributable to any number of biological differences such as heterogeneity of PSC cells and an accompanying difference in the timing of their competence to receive and respond to Slit-Robo signaling, the timing of release of Slit from CBs and Vm, number of cells in a given PSC, which PSC cells in the cluster respond to too little or too much signaling, and/or typical variability between organisms. Furthermore, PSC positioning analyses were conducted by two of the authors, who independently came to the same conclusions. For many of the manipulations double blinding was not possible since the genotype of the embryo was discernible due to the obvious phenotype of the manipulated tissue.

      (4) The Discussion is very lengthy and should [be] shortened.

      We will re-examine the prose and emphasize more conciseness, while maintaining clarity for the reader.

      Reviewer #3 (Public Review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessels. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented, well documented, and fully support the conclusions drawn from the different experiments. The manuscript differs in character from the mainstay of "big data" papers (for example, no sets of single-cell RNAseq data of, for instance, PSC cells with more or less Slit input, are offered), but what it lacks in this regard, it makes up in carefully planned and executed visualizations and genetic manipulations.

      Weaknesses:

      A few suggestions concerning improvement of the way the story is told and contextualized.

      (1) The minute cluster of PSC progenitors (5 or so cells per side) is embedded (as known before and shown nicely in this study) in other "migrating" cell pools, like the cardioblasts, pericardial cells, lymph gland progenitors, alary muscle progenitors. These all appear to move more or less synchronously. What should also be mentioned is another tissue, the dorsal epidermis, which also "moves" (better: stretches?) towards the dorsal midline during dorsal closure. Would it be reasonable to speculate (based on previously published data) that without the force of dorsal closure, operating in the epidermis, at least the lateral>medial component of the "migration" of the PSC (and neighboring tissues) would be missing? If dorsal closure is blocked, do essential components of PSC and lymph gland morphogenesis (except for the coming-together of the left and right halves) still occur? Are there any published data on this?

      Each of the Reviewers is interested in our response to this very relevant question, and, thus, we will address the issue en bloc here. First, we will add a Supplementary Figure showing that LG and CBs are still able to progress medially towards the dorsal midline when dorsal closure stalls.  This rules out any major effect for the most prominent “large-scale embryo cell sheet movement” in positioning the PSC. Second, published work by Haack et. al. and Balaghi et. al. shows that CBs and leading edge epidermal cells are independently migratory, and we will add this context to the manuscript for the reader.

      (2) Along similar lines: the process of PSC formation is characterized as "migration". To be fair: the authors bring up the possibility that some of the phenotypes they observe could be "passive"/secondary: "Thus, it became important to test whether all PSC phenotypes might be 'passive', explained by PSC attachment to a malforming dorsal vessel. Alternatively, the PSC defects could reflect a requirement for Robo activation directly in PSC cells." And the issue is resolved satisfactorily. But more generally, "cell migration" implies active displacement (by cytoskeletal forces) of cells relative to a substrate or to their neighbors (like for example migration of hemocytes). This to me doesn't seem really clearly to happen here for the dorsal mesodermal structures. Couldn't one rather characterize the assembly of PSC, lymph gland, pericardial cells, and dorsal vessel in terms of differential adhesion, on top of a more general adhesion of cells to each other and the epidermis, and then dorsal closure as a driving force for cell displacement? The authors should bring in the published literature to provide a background that does (or does not) justify the term "migration".

      Before addressing this specifically, we remind readers of our response above that states the rationale ruling out large, embryo-scale movements, such as epidermal dorsal closure, in driving PSC positioning. So, how are PSC cells arriving at their reproducible position? This manuscript reports the first live-imaging of the PSC as it comes to be positioned in the embryo. We interpret these movies to suggest strongly that these cells are a ‘collective’ that migrates. Neither the data, nor we, are asserting that each PSC cell is ‘individually’ migrating to its final position. Rather, our data suggest that the PSC migrates as a collective. The most paradigmatic example of directed, collective cell migration, is of Drosophila ovarian border cells. That cell cluster is surrounded at all times by other cells (nurse cells, in that case), and for the collective to traverse through the tissue, the process requires constant remodeling of associations amongst the migrating cells in the collective (the border cells), as well as between cells in the collective and those outside of it (the nurse cells). In fact, the nurse cells are considered the substrate upon which border cells migrate. Note also that in collective border cell migration cells within the collective can switch neighbors, suggesting dynamic changes to cell associations and adhesions. 

      In our analysis, the PSC cells exhibit qualities reminiscent of the border cells, and thus we infer that the PSC constitutes a migratory cell collective.  We also show in Figure 1H that PSC cells exhibit cellular extensions, and thus have a very active, intrinsic actin-based cytoskeleton. In fact, in Figure 1I, we point out that PSC cells shift position within the collective, which is not only a direct feature of migration, but also occurs within the border cell collective as that collective migrates. Additionally, the fact that the lateral-most PSC cells shift position in the collective while remaining a part of the collective–and they do this while executing net directional movement–makes a strong argument that the PSC is migratory, as no cell types other than PSCs are contacting the surfaces of those shifting PSC cells. Lastly, the Reviewer’s supposition that, rather than migration, dorsal mesoderm structures form via “differential adhesion, on top of a more general adhesion of cells to each other” is, actually, precisely an inherent aspect of collective cell migration as summarized above for the ovarian border collective.

      In our resubmission we will adjust text citing the existing literature to better put into context the reasoning for why PSC formation based on our data is an example of collective cell migration.

      (3) That brings up the mechanistic centerpiece of this story, the Slit/Robo system. First: I suggest adding more detailed data from the study by Morin-Poulard et al 2016, in the Introduction, since these authors had already implicated Slit-Robo in PSC function and offered a concrete molecular mechanism: "vascular cells produce Slit that activates Robo receptors in the PSC. Robo activation controls proliferation and clustering of PSC cells by regulating Myc, and small GTPase and DE-cadherin activity, respectively". As stated in the Discussion: the mechanism of Slit/Robo action on the PSC in the embryo is likely different, since DE-cadherin is not expressed in the embryonic PSC; however, it maybe not be THAT different: it could also act on adhesion between PSC cells themselves and their neighbors. What are other adhesion proteins that appear in the late lateral mesodermal structures? Could DN-cadherin or Fasciclins be involved?

      We agree with the Reviewer that Slit-Robo signaling likely acts in part on the PSC by affecting PSC cell adhesion to each other and/or to CBs (lines 428-435). As stated in the Discussion, we do not observe Fasciclin III expression in the PSC until late stages when the PSC has already been positioned, suggesting that Fasciclin III is not an active player in PSC formation. Assessing whether the PSC expresses any other of the suite of potential cell adhesion molecules such as DN-Cadherin or other Fasciclins, and then study their potential involvement in the Slit-Robo pathway in PSC cells, would be part of a follow-up study.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield high-confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give highconfidence information. The profiles presented as base-case examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to double-check the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Figure: Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      We plan to assess the endogenous effect by generating CRISPR knock-in clones carrying the UTR variant.

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human Disease‐Relevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a proteindependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysomeshifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors develop a self-returning self-avoiding polymer model of chromosome organization and show that their framework can recapitulate at the same time local density and large-scale contact structural properties observed experimentally by various technologies. The presented theoretical framework and the results are valuable for the community of modelers working on 3D genomics. The work provides solid evidence that such a framework can be used, is reliable in describing chromatin organization at multiple scales, and could represent an interesting alternative to standard molecular dynamics simulations of chromatin polymer models.

      We appreciate the editor for an accurate description of the scope of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Carignano et al propose an extension of the self-returning random walk (SRRW) model for chromatin to include excluded volume aspects and use it to investigate generic local and global properties of the chromosome 3D organization inside eukaryotic nuclei. In particular, they focus on chromatin volumic density, contact probability, and domain size and suggest that their framework can recapitulate several experimental observations and predict the effect of some perturbations.

      We thanks the reviewer for the attention paid to the manuscript and all the relevant comments.

      Strengths:

      - The developed methodology is convincing and may offer an alternative - less computationally demanding - framework to investigate the single-cell and population structural properties of 3D genome organization at multiple scales.

      - Compared to the previous SRRW model, it allows for investigation of the role of excluded volume locally.

      Excluded volume is accounted for everywhere, not locally. We emphasized this on page 3, line 182:

      “The method that we employ to remove overlaps is a low-temperature-controlled molecular dynamics simulation using a soft repulsive interaction potential between initially overlapping beads, that is terminated as soon as all overlaps have been resolved, as described in the Appendix 3.”


      - They perform some experiments to compare with model predictions and show consistency between the two.

      Weaknesses:

      - The model is a homopolymer model and currently cannot fully account for specific mechanisms that may shape the heterogeneous, complex organization of chromosomes (TAD at specific positions, A/B compartmentalization, promoter-enhancer loops, etc.).

      The SR-EV model is definitely not a homo-polymer, as it is not a regular concatenation of a single monomeric unit.

      The model includes loops, which may happen in two ways: 1) As in the SRRW, branching structures emerging from the configuration backbone can be interpreted as nested loops and 2) A relatively long forward step followed by a return is a single loop. The model induces the formation of packing domains, which are not TADs, and are quantitatively in agreement with ChromSTEM experiments.

      We consider convenient to add a new figure that will further clarify the structures obtained with the SR-EV model. The following paragraph and figure has been added in page 5:

      “The density heterogeneity displayed by the SR-EV configurations can be analyzed in terms of the accessibility. One way to reveal this accessibility is by calculating the coordinations number (CN) for each nucleosome, using a coordination radius of 11.5 nm, along the SR-EV configuration. CN values range from 0 for an isolated nucleosome to 12 for a nucleosome immersed in a packing domain. In Figure 3 we show the SR-EV configuration showed in Figure 2, but colored according to CN. CN can be also considered as a measure to discriminate heterochromatin (red) and euchromatin (blue). Figure 3-A shows how the density inhomogeneity is coupled to different CN, with high CN represented in red and low CN represented in blue. Figure 3-B show a 50 nm thick slab obtained from the same configuration that clearly show the nucleosomes at the center of each packing domains are almost completely inaccesible, while those outside are open and accessible. It is also clear that the surface of the packing domains are characterized by nearly white nucleosomes, i.e. coordinated towards the center of the domain and open in the opposite direction.”

      - By construction of their framework, the effect of excluded volume is only local and larger-scale properties for which excluded volume could be a main actor (formation of chromosome territories [Rosa & Everaers, PLoS CB 2009], bottle-brush effects due to loop extrusion [Polovnikov et al, PRX 2023], etc.) cannot be captured.

      Excluded volume is considered for all nucleosomes, including overlapping beads distant along the polymer chain. Chromosome territories can be treated, but it is not in this case because we look at a single model chromosome.

      - Apart from being a computationally interesting approach to generating realistic 3D chromosome organization, the method offers fewer possibilities than standard polymer models (eg, MD simulations) of chromatin (no dynamics, no specific mechanisms, etc.) with likely the same predictive power under the same hypotheses. In particular, authors often claim the superiority of their approach to describing the local chromatin compaction compared to previous polymer models without showing it or citing any relevant references that would show it.

      We apologize if the text transmit an idea of superiority over other methods that was not intended. SR-EV is an alternative tool that may give a different, even complementary point of view, to standard polymer models.

      - Comparisons with experiments are solid but are not quantified.

      The comparisons that we have presented are quantitative. We do not have so far a way to characterize alpha or phi, a priori, for a particular system.

      Impact:

      Building on the presented framework in the future to incorporate TAD and compartments may offer an interesting model to study the single-cell heterogeneity of chromatin organization. But currently, in this reviewer's opinion, standard polymer modeling frameworks may offer more possibilities.

      We thank the reviewer for the positive opinion on the potential of the presented method. The incorporation of TADs and compartments is left for a future evolution of the model as its complexity will make this work extremely long.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a simple Self Returning Excluded Volume (SR-EV) model to investigate the 3D organization of chromatin. This is a random walk with a probability to self-return accounting for the excluded volume effects. The authors use this method to study the statistical properties of chromatin organization in 3D. They compute contact probabilities, 3D distances, and packing properties of chromatin and compare them with a set of experimental data.

      We thank the reviewer for the attention paid to our manuscript.

      Strengths:

      (1) Typically, to generate a polymer with excluded volume interactions, one needs to run long simulations with computationally expensive repulsive potentials like the WeeksChanlder-Anderson potential. However, here, instead of performing long simulations, the authors have devised a method where they can grow polymer, enabling quick generation of configurations.

      (2) Authors show that the chromatin configurations generated from their models do satisfy many of the experimentally known statistical properties of chromatin. Contact probability scalings and packing properties are comparable with Chromatin Scanning Transmission Electron Microscopy (ChromSTEM)  experimental data from some of the cell types.

      Weaknesses:

      This can only generate broad statistical distributions. This method cannot generate sequence-dependent effects, specific TAD structures, or compartments without a prior model for the folding parameter alpha. It cannot generate a 3D distance between specific sets of genes. This is an interesting soft-matter physics study. However, the output is only as good as the alpha value one provides as input.

      We proposed a model to create realistic chromatin configuration that we have contrasted with specific single cell experiments, and also reproducing ensemble average properties. 3D distances between genes can be calculated after mapping the genome to the SR-EV configuration. The future incorporation of the genome sequence will also allow us to describe TADs and A/B compartments. See added paragraph in the Discussion section:

      “The incorporation of genomic character to the SR-EV model will allow us to study all individual single chromosomes properties, and also topological associated domains and A/B compartmentalization from ensemble of configurations as in HiC experiments. “

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major:

      - In the introduction and along the text, the authors are often making strong criticisms of previous works (mostly polymer simulation-based) to emphasize the need for an alternative approach or to emphasize the outcomes of their model. Most of these statements (see below) are incomplete if not wrong. I would suggest tuning down or completely removing them unless they are explicitly demonstrated (eg, by explicit quantitative comparisons). There is no need to claim any - fake - superiority over other approaches to demonstrate the usefulness of an approach. Complementarity or redundance in the approaches could also be beneficial.

      We regret if we unintentionally transmitted a claim of superiority. We have made several small edits to change that.

      - Line 42-43: at least there exist many works towards that direction (including polymer modeling, but also statistical modeling). For eg, see the recent review of Franck Alber.

      Line removed. Citation to Franck Alber included below in the text.

      - Line 54-57: Point 1 is correct but is it a fair limitation? These models can predict TADs & compartments while SR-EV no. Point 2 is wrong, it depends on the resolution of the model and computer capacity but it is not an intrinsic limitation. Point 3 is wrong, such models can predict very well single-cell properties, and again it is not an intrinsic limitation of the model. Point 4 is incorrect. The space-filling/fractal organization was an (unfortunate) picture to emphasize the typical organization of chromosomes in the early times (2009), but crumpled polymers which are a more realistic description are not space-filling (see Halverson et al, 2013).

      Text involving points 1 to 4 removed. It was unnecessary and does not change the line of the paper.

      - L400-402 + 409-411: in such a model, the biphasic structure may emerge from loop extrusion but also naturally from the crumpled polymer organization. Simple crumpled polymer without loop extrusion and phase separation would also produce biphasic structures.

      Yes, we agree. Also SR-EV leads to biphasic structures.

      - L 448-449: any data to show that existing polymer modeling would predict a strong dependency of C_p(n) on the volumic fraction (in the range studied here)?

      No, I don’t know a work predicting that.

      - Fig. 4:

      - Large-scale structural properties (R^2(n) and C_p(n)) are not dependent on phi. Is it surprising that by construction, SR-EV only relaxes the system locally after SRRW application?

      Excluded volume is considered at all length scales. However, as the decreasing C_p curves observed in theories and experiments imply, the fraction of overlap (or contacts) is more important at small separations (local) than at large separations. Yet, it was a surprise for us to observed negligible effect on phi.

      - Why not make a quantitative comparison between predicted and measured C_p(n)? Or at least plotting them on the same panel.

      Panels B and C are in the same scale and show a good agreement between SR-EV and experiments. However, it is not perfectly quantitative agreement. SR-EV represents the generic structure of chromatin and perfect agreement should not be expected.

      - Comparison with an average C_p(n) over all the chromosomes would be better.

      Possibly, but we don’t think it adds anything to the paper.

      - In Figure 5,6,7 (and related text): authors often describe some parameter values that are 'closest to experiment findings'. Can the authors quantify/justify this? The various 'closest' parameters are different. Can the authors comment?

      The folding parameter and average volume fraction are chose so that the agreement is best with the displayed experimental system, different cell for each case.

      - Figure 5: why not show the experimental distribution from Ou et al?

      - Figure 6 & 7: experimental results. Can the authors show images from their own experiments? Can they show that cohesion/RAD21 is really depleted after auxin treatment?

      It is currently under review in a different journal.

      - In the Discussion, a fair discussion on the limitations of the methods (dynamics, etc) is missing.

      Minor

      - Line 34-36: the logical relationship between this sentence and the ones before and after is very unclear.

      - Along the text, authors use the term 'connectivity' to describe 3D (Hi-C) contacts between different regions of the same chromosome/polymer. This is misleading as connectivity in polymer physics describes the connection along the polymer and not in the 3D space.

      No. I don’t think we used connectivity in that sense. We agree with your statement on the use of connectivity in polymer physics, and is what we always had in mind for this model.

      - Line 92: typo.

      - On the SR-EV method: does the relaxation process create local knots in the structure?

      We have not checked for knots.

      - Table 1: the good correspondence with linker length is remarkable but likely 'fortunate', other chosen resolutions would have led to other results. Moreover, the model cannot account for the fine structure of chromatin fiber. Can the authors comment on that?

      Fortunate to the extent that we sample the model parameter to overall catch the structure of chromatin.

      - Line 211: 'without the need of imposing any parameter': alpha is a parameter, no?

      Correct. Phrase deleted.

      - L267-269 & 450-451: actually in Liu & Dekker, they do observe an effect on Hi-C map (C_p(n)), weak but significant and not negligible.

      Our statements read ‘minimal’ and ‘relatively insensitive’. It is observed, but very small.

      - L283-286: This is a perspective statement that should be in the discussion.

      Moved to the Discussion, as suggested.

      - L239-241: The authors seem to emphasize some contradictions with recent results on phase separation. This is unclear and should be relocated to discussion.

      We just pointed out recent experiments, as stated. No intention to generate a discussion with any of them.

      - L311-313: Unclear statement.

      - L316-325: This is not results but discussion/speculation.

      Moved to Discussion

      - Along the text: 'promotor'-> 'promoter'. 

      - Corrected.

      - L364: explain more in detail PWS microscopy.

      Reviewer #2 (Recommendations For The Authors):

      Even though there are claims about nucleosome-resolution chromatin polymer, it is not clear that this work can generate structures with known nucleosome-resolution features. Nucleosome-level structure is much beyond a random walk with excluded volume and is driven by specific interactions. The authors should clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang, Hu et al. examined the molecular mechanisms underlying astrocyte activation and its implications for multiple sclerosis. This study shows that the glycolytic enzyme PKM2 relocates to astrocyte nuclei upon activation in EAE mice. Inhibiting PKM2's nuclear import reduces astrocyte activation, as evidenced by decreased proliferation, glycolysis, and inflammatory cytokine release. Crucially, the study identifies TRIM21 as pivotal in regulating PKM2 nuclear import via ubiquitination. TRIM21 interacts with PKM2, promoting its nuclear translocation and enhancing its activity, affecting multiple signaling pathways. Confirmatory analyses using single-cell RNA sequencing and immunofluorescence demonstrate TRIM21 upregulation in EAE astrocytes. Modulating TRIM21 expression in primary astrocytes impacts PKM2-dependent glycolysis and proliferation. In vivo experiments targeting this mechanism effectively mitigate disease severity, CNS inflammation, and demyelination in EAE.

      The authors supported their claims with various experimental approaches, however, some results should be supported with higher-quality images clearly depicting the conclusions and additional quantitative analyses of Western blots.

      Thanks for the reviewer’s comments. We agree with the reviewer and have added higher magnification images, for example Fig.2A to better visualize the localization of PKM2 in DASA-treated conditions, and Fig. 3A and Fig.3B to better visualize the pSTAT3 and pp65. Moreover, we have added quantitative analyses of Western blots for some key experiments, for example quantitative results for Fig.2D is added in Fig.S3 to show the change of PKM2 and p-c-myc in DASA-58-treated conditions and quantitative results for Fig. 3D are added in Fig.S4B and S4C to show the change of nuclear and cytoplasmic PKM2, STAT3 and NF-κB in different conditions.

      Strength:

      This study presents a comprehensive investigation into the function and molecular mechanism of metabolic reprogramming in the activation of astrocytes, a critical aspect of various neurological diseases, especially multiple sclerosis. The study uses the EAE mouse model, which closely resembles MS. This makes the results relevant and potentially translational. The research clarifies how TRIM21 regulates the nuclear import of PKM2 through ubiquitination by integrating advanced techniques. Targeting this axis may have therapeutic benefits since lentiviral vector-mediated knockdown of TRIM21 in vivo significantly reduces disease severity, CNS inflammation, and demyelination in EAE animals.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The authors reported that PKM2 levels are elevated in the nucleus of astrocytes at different EAE phases compared to cytoplasmic localization. However, Figure 1 also shows elevated cytoplasmic expression of PKM2. The authors should clarify the nuclear localization of PKM2 by providing zoomed-in images. An explanation for the increased cytoplasmic PKM2 expression should provided. Similarly, while PKM2 translocation is inhibited by DASA-58, in addition to its nuclear localization, a decrease in the cytoplasmic localization of PKM2 is also observed. This situation brings to mind the possibility of a degradation mechanism being involved when its nuclear translocation of PKM2 is inhibited.

      According to the results of immunofluorescence staining of PKM2 in spinal cord of EAE mice and in cultured primary astrocytes, in addition to the observation of PKM2 nuclear translocation in EAE conditions, we showed an elevated expression of PKM2 in astrocytes, including the cytoplasmic and nuclear expression. In neurological diseases, various studies showed consistent results, for example, following spinal cord injury (SCI), not only the upregulated expressing of PKM2 but also nuclear translocation was observed in astrocytes (Zhang et al., 2015). In EAE conditions, CNS inflammation is elevated and several proinflammatory cytokines and chemokines might contribute to the upregulated expression of PKM2 in astrocytes. We have tested TNFα and IL-1β, which are recognized to play important roles in EAE and MS (Lin and Edelson, 2017, Wheeler et al., 2020), and results from western blots showed the increased expression of PKM2 upon stimulation with TNFα and IL-1β (Author response image 1). Moreover, according to the reviewer’s suggestions, we have added zoomed-in images for figure 2A.

      Additionally, the reviewer has noted the decrease in the cytoplasmic PKM2 level, degradation-related mechanism and other mechanisms might be involved in this process.

      Author response image 1.

      Upregulated expression of PKM2 in astrocytes following stimulation with TNF-α and IL-1β. Primary astrocytes were stimulated with TNF-α and IL-1β (50 ng/mL) for 48 h and western blotting analysis were performed.

      In Figure 3D, the authors claim that PKM2 expression causes nuclear retention of STAT3, p65, and p50, and inhibiting PKM2 localization with DASA-58 suppresses this retention. The western blot results for the MOG-stimulated group show high levels of STAT3, p50, and p65 in nuclear localization. However, in the MOG and DASA-58 treated group, one would expect high levels of p50, p65, and STAT3 proteins in the cytoplasm, while their levels decrease in the nucleus. These western blot results could be expanded. Additionally, intensity quantification for these results would be beneficial to see the statistical difference in their expressions, especially to observe the nuclear localization of PKM2.

      We agree with the reviewer’s comments and we have incorporated the quantification of STAT3,p50 and p65 for Fig.3D and Fig.S4B and Fig.S4C. Nevertheless, given that DASA-58 did not trigger a notable increase in the cytoplasmic level of PKM2, we did not detect an upregulation of STAT3, p50, or p65 in the cytoplasm of the MOG and DASA-58-treated groups. With the quantification results, it is more obvious to see the changes of these proteins in different conditions.

      The discrepancy between Figure 7A and its explaining text is confusing. The expectation from the knocking down of TRIM21 is the amelioration of activated astrocytes, leading to a decrease in inflammation and the disease state. The presented results support these expectations, while the images showing demyelination in EAE animals are not highly supportive. Clearly labeling demyelinated areas would enhance readers' understanding of the important impact of TRIM21 knockdown on reducing the disease severity.

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      Additionally, we have added the whole image of the spinal cord for MBP in Author Response image 2. Moreover, we have labelled the demyelinated areas to facilitate readers’ understanding.

      Author response image 2.

      MBP staining of the whole spinal cord in EAE mice from shVec and shTRIM21 group. Scale bar: 100 μm. Demyelinated areas are marked with dashed lines.

      Reviewer #2 (Public Review):

      This study significantly advances our understanding of the metabolic reprogramming underlying astrocyte activation in neurological diseases such as multiple sclerosis. By employing an experimental autoimmune encephalomyelitis (EAE) mouse model, the authors discovered a notable nuclear translocation of PKM2, a key enzyme in glycolysis, within astrocytes.

      Preventing this nuclear import via DASA 58 substantially attenuated primary astrocyte activation, characterized by reduced proliferation, glycolysis, and inflammatory cytokine secretion.<br /> Moreover, the authors uncovered a novel regulatory mechanism involving the ubiquitin ligase TRIM21, which mediates PKM2 nuclear import. TRIM21 interaction with PKM2 facilitated its nuclear translocation, enhancing its activity in phosphorylating STAT3, NFκB, and c-myc. Single-cell RNA sequencing and immunofluorescence staining further supported the upregulation of TRIM21 expression in astrocytes during EAE.

      Manipulating this pathway, either through TRIM21 overexpression in primary astrocytes or knockdown of TRIM21 in vivo, had profound effects on disease severity, CNS inflammation, and demyelination in EAE mice. This comprehensive study provides invaluable insights into the pathological role of nuclear PKM2 and the ubiquitination-mediated regulatory mechanism driving astrocyte activation.

      The author's use of diverse techniques, including single-cell RNA sequencing, immunofluorescence staining, and lentiviral vector knockdown, underscores the robustness of their findings and interpretations. Ultimately, targeting this PKM2-TRIM21 axis emerges as a promising therapeutic strategy for neurological diseases involving astrocyte dysfunction.

      While the strengths of this piece of work are undeniable, some concerns could be addressed to refine its impact and clarity further; as outlined in the recommendations for the authors.

      Thanks for the reviewer’s comment and positive evaluation of our present work. We have further answered each question in recommendations section.

      Reviewer #3 (Public Review):

      Summary:

      Pyruvate kinase M2 (PKM2) is a rate-limiting enzyme in glycolysis and its translocation to the nucleus in astrocytes in various nervous system pathologies has been associated with a metabolic switch to glycolysis which is a sign of reactive astrogliosis. The authors investigated whether this occurs in experimental autoimmune encephalomyelitis (EAA), an animal model of multiple sclerosis (MS). They show that in EAA, PKM2 is ubiquitinated by TRIM21 and transferred to the nucleus in astrocytes. Inhibition of TRIM21-PKM2 axis efficiently blocks reactive gliosis and partially alleviates symptoms of EAA. Authors conclude that this axis can be a potential new therapeutic target in the treatment of MS.

      Strengths:

      The study is well-designed, controls are appropriate and a comprehensive battery of experiments has been successfully performed. Results of in vitro assays, single-cell RNA sequencing, immunoprecipitation, RNA interference, molecular docking, and in vivo modeling etc. complement and support each other.

      Weaknesses:

      Though EAA is a valid model of MS, a proposed new therapeutic strategy based on this study needs to have support from human studies.

      We agree that although we have clarified the therapeutic potential of targeting TRIM21 or PKM2 in the treatment of EAE, a mouse model of MS, the application in human studies warrants further studies. While considering the use of TRIM21 as a target for treating multiple sclerosis in clinical trials, several issues need to be addressed to ensure the safety, efficacy and feasibility. One such aspect is the development of drug that specifically target TRIM21 in brain, capable of crossing the blood-brain barrier and have minimal off-target effects. The translation of preclinical finding into clinical trials poses a significant challenge. To provide evidence for the similarities between the EAE model and multiple sclerosis, we have screened GEO databases (Author response image 3). In GSE214334 which analyzed transcriptional profiles of normal-appearing white matter from non-MS and different subtypes of disease (RRMS, SPMS and PPMS). Although no statistical difference was observed among different groups, the TRIM21 expression has tendency to increase in SPMS (secondary progressive MS) and PPMS (primary progressive MS) patients. In GSE83670, astrocytes from 3 control white matter and 4 multiple sclerosis normal appearing white matter (NAWM) were analyzed. TRIM21 mRNA expression is higher in MS group (78.73 ± 10.44) compared to control group (46.67 ± 24.15). Although these two GEO databases did not yield statistically significant differences, TRIM21 expression appears to be elevated in the white matter of MS patients compared to controls.

      To address this limitation, we have incorporated the following statement in the discussion section: “However, whether TRIM21-PKM2 could potentially serve as therapeutic targets in multiple sclerosis warrants further studies.”

      Author response image 3.

      TRIM21 expression in control and MS patients based on published GEO database. (A) The expression of TRIM21 in normal-appearing white matter in non-MS (Ctl) and different clinical subtypes of MS (RRMS, SPMS, PPMS) based on GSE214334 (one-way ANOVA). (B) The expression of TRIM21 from multiple sclerosis normal appearing white matter (NAWM) and control WM based on GSE83670. RRMS, relapsing--remitting MS; SPMS, secondary progressive MS; PPMS, primary progressive MS (unpaired Student's t test). Data are represented as the means ± SEM.

      Reviewer #4 (Public Review):

      Summary:

      The authors report the role of the Pyruvate Kinase M2 (PKM2) enzyme nuclear translocation as fundamental in the activation of astrocytes in a model of autoimmune encephalitis (EAE). They show that astrocytes, activated through culturing in EAE splenocytes medium, increase their nuclear PKM2 with consequent activation of NFkB and STAT3 pathways. Prevention of PKM2 nuclear translocation decreases astrocyte counteracts this activation. The authors found that the E3 ubiquitin ligase TRIM21 interacts with PKM2 and promotes its nuclear translocation. In vivo, either silencing of TRIM21 or inhibition of PKM2 nuclear translocation ameliorates the severity of the disease in the EAE model.

      Strengths:

      This work contributes to the knowledge of the complex action of the PKM2 enzyme in the context of an autoimmune-neurological disease, highlighting its nuclear role and a novel partner, TRIM21, and thus adding a novel rationale for therapeutic targeting.

      Weaknesses:

      Despite the relevance of the work and its goals, some of the conclusions drawn would require more thorough proof:

      I believe that the major weakness is the fact that TRIM21 is known to have per se many roles in autoimmune and immune pathways and some of the effects observed might be due to a PKM2-independent action. Some of the experiments to link the two proteins, besides their interaction, do not completely clarify the issue. On top of that, the in vivo experiments address the role of TRIM21 and the nuclear localisation of PKM2 independently, thus leaving the matter unsolved.

      We agree that TRIM21 has multifunctional roles and only some of their effects are due to PKM2-independent action. It is obvious that TRIM21 functions as ubiquitin ligases and its substrate are various. Here we identify PKM2 as one of its interacting proteins and our focus is the relationship between TRIM21 and the nuclear translocation PKM2, we have used diverse experiments to clarify their relationships, for example immunoprecipitation, western blotting, immunofluorescence, cyto-nuclear protein extraction. These aforementioned experiments are key points of our studies. From the results of in vitro experiments, targeting either TRIM21 or PKM2 might be potential targets for EAE treatment. Expectedly, from in vivo experiments, either targeting TRIM21 or PKM2 nuclear transport ameliorated EAE. In order to test the relationship of TRIM21 and PKM2 nuclear transport in vivo, we have stained PKM2 in shVec and shTRIM21-treated mice. Expectedly, knocking down TRIM21 led to a decrease in the nuclear staining of PKM2 in spinal cord astrocytes in EAE models (Figure S7A). This observation underscores that the therapeutic potential of inhibiting TRIM21 in astrocytes in vivo might be partially due to its role in triggering the reduced nuclear translocation of PKM2.

      Some experimental settings are not described to a level that is necessary to fully understand the data, especially for a non-expert audience: e.g. the EAE model and MOG treatment; action and reference of the different nuclear import inhibitors; use of splenocyte culture medium and the possible effect of non-EAE splenocytes.

      According to the reviewer’s suggestions, we have added more detailed descriptions in the materials and methods section, for example, the use of splenocytes culture medium, mass spectrometry, HE and LFB staining have been added. More details are incorporated in the part for “EAE induction and isolation and culture of primary astrocytes”. Moreover, the reference of DASA-58 in vitro and TEPP-46 in vivo as inhibitors of PKM2 nuclear transport were added.

      The statement that PKM2 is a substrate of TRIM21 ubiquitin ligase activity is an overinterpretation. There is no evidence that this interaction results in ubiquitin modification of PKM2; the ubiquitination experiment is minimal and is not performed in conditions that would allow us to see ubiquitination of PKM2 (e.g. denaturing conditions, reciprocal pull-down, catalytically inactive TRIM21, etc.).

      To prevent the misunderstanding, we have revised certain statements in the manuscript. In the updated version, the description is as follows: Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General recommendations:

      - The whole manuscript needs language editing.

      We appreciate the comments of the reviewers. We have improved the writing of the manuscript. All modifications are underlined.

      - Details of many experiments are not given in the materials and methods.

      According to the reviewer’s suggestions, we have added more details for experiments in the materials and methods. For example, “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes”, “mass spectrometry”, “Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining” were added in the section of Materials and Methods. More detailed information is given for EAE induction and isolation and culture of primary astrocytes.

      - Line properties in graphics should be corrected, some lines in box plots and error bars are very weak and hardly visible. Statistical tests should be included in figure legends as well. Statistical differences should be mentioned for control vs DASA-58 (alone) in all related figures.

      We have revised the figures to enhance their visibility by thickening the lines and error bars. In accordance with the reviewer’s suggestions, we have incorporated statistical tests in figure legends. Moreover, statistical analysis has been made among all groups, if there is no asterisk indicated in the figure legend and figure panels, it means there is no statistical difference between the control vs DASA-58 groups. For most of the experiments conducted in our studies, including lactate production, glucose consumption, the EdU analysis and CCK8 analysis, the change of STAT3 and NF-κB pathways, no statistical difference was observed between the control and DASA-58 group. The reason might be due to that in unstimulated astrocytes, the expression of PKM2 is low and nuclear translocation of PKM2 are few, which may explain why DASA-58 did not exert the anticipated effect. Thus, in our experiments, we have used MOGsup to stimulate astrocytes, enabling us to observe the impact of DASA-58 on the astrocyte proliferation and glycolysis in this condition.

      - Scale bars, arrows, and labeling in the images are not visible.

      We have improved the images according to the reviewer’s suggestions. The scale bars, arrows are made thicker and labeling are larger. The updated figures are visible.

      - Quantitative analysis of all western blot results and their statistics could be provided in every image and for every protein.

      For western blotting results which are further processed with quantitative analysis, for example, Fig.2D, fig. 5G, Fig. 6A and 6B, Fig. S4, we have added their statistics in the raw data sections. The other western blot results, for example, IP analysis, which are used to analyze protein-protein binding are not further processed with quantitative analysis.

      - Proteins that are used for normalizations in western blots should be stated in the text.

      We have added description of proteins that are used for normalization in western blots in figure legends. Moreover, in figure panels, proteins used for normalization are indicated. Globally, whole protein level is normalized to protein level of β-actin. For nuclear and cytoplasmic proteins, nuclear protein is normalized to the expression of lamin, cytoplasmic protein is normalized to the expression of tubulin. 

      - The manuscript investigates the role of TRIM21 in the nuclear localization of PKM2 in astrocytes in EAE mice, however almost no information is given about TRIM21 in the introduction. Extra information is given for PKM2, yet can be concisely explained.

      We have added a paragraph that describes the information of TRIM21 in the introduction section. The description is as follows: “TRIM21 belongs to the TRIM protein family which possess the E3 ubiquitin ligase activity. In addition to its well-recognized function in antiviral responses, emerging evidences have documented the multifaceted role of TRIM21 in cell cycle regulation, inflammation and metabolism (Chen et al., 2022). Nevertheless, the precise mechanisms underlying the involvement of TRIM21 in CNS diseases remain largely unexplored.”

      - "As such, deciphering glycolysis-dominant metabolic switch in astrocytes is the basis for understanding astrogliosis and the development of neurological diseases such as multiple sclerosis." The sentence could be supported by references.

      To support this sentence, we have added the following references:

      (1) Xiong XY, Tang Y, Yang QW. Metabolic changes favor the activity and heterogeneity of reactive astrocytes. Trends in endocrinology and metabolism: TEM 2022;33(6):390-400.

      (2) das Neves SP, Sousa JC, Magalhães R, Gao F, Coppola G, Mériaux S, et al. Astrocytes Undergo Metabolic Reprogramming in the Multiple Sclerosis Animal Model. Cells 2023;12(20):2484.

      Figure 1/Result 1:

      - Figure 1A-B: Quality of the images should be improved.

      According to the reviewer’s suggestion, we have improved the quality of the image, images with higher resolution were added in figure 1A and figure 1B.

      - Control images of Figure 1B are not satisfying. GFAP staining is very dim. Images from control cells should be renewed.

      As mentioned by the reviewer’s, we have renewed the control images and added the DAPI staining figures for all groups. Compared with MOGsup stimulated astrocytes, the control cells are not in activated state and GFAP are relatively low.

      - Labelings on the images are not sufficient, arrows and scale bars are not visible.

      We have improved the images including labels, arrows and scale bars in all figures.

      - How splenocytes were obtained from MOG induced mice were not given in the material and methods section. Thus, it should be clearly stated how splenocyte supernatant is generated (treatment details).

      We have added the detailed information relating to splenocyte isolation and splenocyte supernatant entitled “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes” in the section of Materials and methods. “Splenocytes were isolated from EAE mice 15 d (disease onset) after MOG35-55 immunization. Briefly, spleen cells were suspended in RPMI-1640 medium containing 10% FBS. Splenocytes were plated in 12-well plates at 1x106 cells/well containing 50 μg/mL MOG35-55 and cultured at 37°C in 5% CO2. After stimulation for 60 h, cell suspension was centrifuged at 3000 rpm for 5 min and supernatants were collected. For the culture of MOGsup-stimulated astrocytes, astrocytes were grown in medium containing 70% DMEM supplemented with 10% FBS and 30% supernatant from MOG35-55-stimulated-splenocytes.”

      - For general astrocyte morphology: authors showed the cells are GFAP+ astrocytes. It is surprising that these cells do not bear classical astrocyte morphology in cell culture. How long do you culture astrocytes before treatment? How do you explain their morphological difference?

      Astrocytes were cultured for 2 to 3 weeks which correspond to 2-3 passages before treatment. There are several possible reasons for the morphological differences observed between GFAP+ astrocytes and their classical morphology. Firstly, the cell density. In low-density culture just as shown in Figure 1B, we have observed that astrocytes adopt a more flattened morphology. In high-density cultures, they adopt a stellate shape. Moreover, variations in culture conditions, such as the use of different fetal bovine serum, can also influence the morphology of astrocytes. In addition, the mechanical injury induced by the isolation procedures for astrocytes might contribute to variations in their morphology during in vitro cultivation. In summary, the morphological differences observed in GFAP+ astrocytes in cell culture likely result from a combination of culture conditions, cell density, and mechanical injury occured during astrocyte isolation etc.

      - Additional verification of reactive astrocytes could be performed by different reactive astrocyte markers, such as GLAST, Sox9, S100ß. Thus, quantitative analysis of activated astrocytes can be done by counting DAPI vs GLAST, Sox9 or S100ß positive cells.

      We really agree with the reviewer that there are other markers of reactive astrocytes such as GLAST, sox9 and S100β. However, numerous evidences support that GFAP is the most commonly used reactive astrocyte markers. Most of the cases, reactive astrocytes undergo GFAP overexpression. GFAP is one the most consistently induced gene in transcriptomic datasets of reactive astrocytes, confirming its usefulness as a reactive marker (Escartin et al., 2019). Thus, we have used GFAP as the marker of astrocyte activation in our study.

      - How you performed quantifications for Figures 1C and 1D should be clearly explained, details are not given.

      Quantification for Figure 1C and 1D were added in the figure legend. In general, Mean fluorescence intensity of PKM2 in different groups of (B) was calculated by ImageJ. The number of nuclear PKM2 was quantified by Image-Pro Plus software manually (eg. nuclear or cytoplasmic based on DAPI blue staining). The proportion of nuclear P KM2 is determined by normalizing the count of nuclear PKM2 to the count of nuclear DAPI, which represents the number of cell nuclei.

      - "Together, these data demonstrated the nuclear translocation of PKM2 in astrocytes from EAE mice." Here the usage of "suggests" instead of "demonstrated".

      Based on the reviewer's suggestion, we have revised the use of "demonstrated" to "suggest" in this sentence.

      Result 2 and 3:

      - In the literature, DASA-58 is shown to be the activator of PKM2 (https://www.nature.com/articles/nchembio.1060https://doi.org/10.1016/j.cmet.2019.10.015).

      - Providing references for the inhibitory use of DASA-58 for PKM2 would be appreciated.

      DASA-58 is referred to as “PKM2 activator” due to its ability to enforce the tetramerization of PKM2, enhancing the enzymatic ability of PKM2 to catalyze PEP to pyruvate conversion. However, the enforced conversion of tetramerization of PKM2 inhibited the dimer form of PKM2, thereby inhibiting its nuclear translocation. For this reason, DASA-58 is also used as the inhibitor of nuclear translocation of PKM2. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      - Western blot results and statistics for PKM2 should be quantitatively given for all groups.

      According to the reviewer’s suggestions, we have added the quantification of PKM2 for western blots in figure 2 and figure 3. Quantification of PKM2 in figure 2D is added in Fig S3. Quantification of PKM2 in figure 3D is added in Fig.S4B and Fig. S4C.

      - Figure 3A-B: staining method/details are not mentioned in materials and methods.

      Staining methods is in the paragraph entitled “Immunofluorescence” in the section of materials and methods. The descriptions are as follows:

      For cell immunochemistry, cells cultured on glass coverslips were fixed with 4% PFA for 10 min at RT, followed by permeabilization with 0.3% Triton X-100. Non-specific binding was blocked with buffer containing 3% BSA for 30 min at RT. Briefly, samples were then incubated with primary antibodies and secondary antibodies. DAPI was used to stain the nuclei. Tissues and cells were observed and images were acquired using an EVOS FL Auto 2 Cell image system (Invitrogen). The fluorescence intensity was measured by ImageJ.

      - In Figure 3A, in only DASA-58 treated cells, it looks like GFAP staining is decreased. It would be better to include MFI analysis for GFAP in the supplementary information.

      We have added the MFI analysis for GFAP in Figure 3A in Fig.S4A. GFAP expression is decreased after DASA-58 treatment (in both control and MOGsup condition), the reason might be due to the effect of DASA-58 on inhibition of PKM2 nuclear transport, which subsequently suppress the activation of astrocytes, leading to the decreased expression of GFAP.

      Result 4

      - Detailed explanation of the mass spectrometry and IP experiments should be given in materials and methods. What are the conditions of the cells? Which groups were analyzed? Are they only MOG stimulated, MOG-DASA-58 treated, or only primary astrocytes without any treatment? The results should be interpreted according to the experimental group that has been analyzed.

      We have added the detailed information relating to mass spectrometry and immunoprecipitation in the materials and methods. In general, two groups of cells were subjected to mass spectrometry analysis, primary astrocytes without any treatment and MOGsup-stimulated primary astrocytes. These two groups were immunoprecipitated with anti-PKM2 antibody. Moreover, in the manuscript, we have revised the sentence concerning the description of mass spectrometry. The description is as follows: “To illustrate underlying mechanism accounting for nuclear translocation of PKM2 in astrocytes, we sought to identify PKM2-interacting proteins. Here, unstimulated and MOGsup-stimulated primary astrocytes were subjected to PKM2 immunoprecipitation, followed by mass spectrometry”. Furthermore, the description of these two groups of cells were added in the figure legend of Fig.4.

      Result 5:

      - For the reader, it would be better to start this part by explaining the role of TRIM21 in cells by referring to the literature.

      We agreed with the reviewer that beginning this part by explaining the role of TRIM21 would be better. Accordingly, we have added the following descriptions at the beginning of this part: “TRIM21 is a multifunctional E3 ubiquitin ligase that plays a crucial role in orchestrating diverse biological processes, including cell proliferation, antiviral responses, cell metabolism and inflammatory processes (Chen X. et al., 2022).” The relevant literature has been included: Chen X, Cao M, Wang P, Chu S, Li M, Hou P, et al. The emerging roles of TRIM21 in coordinating cancer metabolism, immunity and cancer treatment. Front Immunol 2022;13:968755.

      - The source and the state of the cells (control vs MOG induced) should be stated (Figure 5A).

      In figure 5A to 5D, single-cell RNA-seq were performed from CNS tissues of naive and different phases of EAE mice (peak and chronic). We have added this detailed information in the figure legend of Figure 5.

      - Figure 5D can be placed after 5A. Data in Figure 5A is probably from naive animals, if so, it should be stated in the legend where A is explained. The group details of the data shown in Figure 5 should be clearly stated.

      According to the reviewer’s suggestions, we have placed 5D after 5A. Single-cell RNA seq analysis were performed from CNS tissues of naïve mice and EAE mice. This information is stated in the legend of Figure 5A-D. “Single-cell RNA-seq profiles from naive and EAE mice (peak and chronic phase) CNS tissues. Naive (n=2); peak (dpi 14–24, n=3); chronic (dpi 21–26, n=2).”

      - Immunofluorescence images should be replaced with better quality images, in control images, stainings are not visible.

      We have replaced with better quality images in figure 5H and in control images, the staining is now visible.

      Result 6:

      - Experimental procedures should be given in detail in materials and methods.

      We have revised the section of materials and methods, and more details are added. Detailed information was added for astrocyte isolation, immunoprecipitation. Moreover, mass spectrometry, Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining, Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes were added in materials and methods.

      Result 7:

      - In Figure 7A, the mean clinical score seems significantly reduced in the shTRIM21-treated group, although it is explained in the result text that it is not significant. Explain to us the difference between Figure 7A and the explaining text?

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      - The staining methods for luxury fast blue and HE are not given in materials and methods.

      According to the reviewer’s comments, we have added the staining methods for HE and LFB in materials and methods.

      - In Figure 7E, authors claim that MBP staining is low in an image, however the image covers approximately 500 um area. One would like to see the demyelinated areas in dashed lines, and also the whole area of the spinal cord sections.

      In Author response image 2, we have added the images for MBP staining of the whole area of spinal cord sections. Demyelinated areas are marked with dashed lines.

      - "TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization." should be supported by references.

      We have added two references for this sentence. Anastasiou D et al. showed that TEPP-46 acts as an activator by stabilizing subunit interactions and promoting tetramer formation of PKM2. Angiari S et al. showed that TEPP-46 prevented the nuclear transport of PKM2 by promoting its tetramerization in T cells.

      These two references are added:

      Angiari S, Runtsch MC, Sutton CE, Palsson-McDermott EM, Kelly B, Rana N, et al. Pharmacological Activation of Pyruvate Kinase M2 Inhibits CD4(+) T Cell Pathogenicity and Suppresses Autoimmunity. Cell metabolism 2020;31(2):391-405.e8.

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      - Could you explain what the prevention stage is?

      The term “prevention stage” was used to describe the administration of TEPP-46 before disease onset. To be more accurate, we have revised the phrase from “prevention stage” to “preventive treatment” as described in other references. For example, Ferrara et al. (Ferrara et al., 2020) used “preventive” and “preventive treatment” to mean administration before disease onset.

      The revised sentences are as follows: “To test the effect of TEPP-46 on the development of EAE, the “preventive treatment” (i.e, administration before disease onset) was administered. Intraperitoneal treatment with TEPP-46 at a dosage of 50 mg/kg every other day from day 0 to day 8 post-immunization with MOG35-55 resulted in decreased disease severity (Fig. S8A).”

      - In in vitro experiments, authors used DASA-58, and in vivo they used TEPP-46. What might be the reason that DASA-58 is not applied in vivo?

      The effects of DASA-58 and TEPP-46 in promoting PKM2 tetramerization have been tested in vitro and has been documented. Based on in vitro absorption, distribution, metabolism and excretion profiling studies, Anastasiou et al. predicted that TEPP-46 had better in vivo drug exposure compared to DASA-58. Moreover, TEPP-46, but not DASA-58, is pharmacokinetically validated in vivo (Anastasiou et al., 2012). Thus, we used TEPP-46 for in vivo studies.

      - Authors claim that TEPP-46 activates PKM2 and leads it its nuclear translocation, however, they did not verify PKM2 expression in the nucleus.

      To support that TEPP-46 exerts effects in inhibiting PKM2 nuclear translocation both in vivo and in vitro, we have performed western blotting analysis and immunofluorescence staining. In vitro, TEPP-46 administration inhibited the MOGsup-induced PKM2 nuclear translocation, which exerts similar effects as DASA-58 (Author response image 4). The in vivo effects of TEPP-46 was analyzed by co-immunostaining of PKM2 and GFAP. The results showed reduced nuclear staining of PKM2 in spinal cord astrocytes in TEPP-46-treated EAE mice compared with control EAE mice (Figure S7B).

      Author response image 4.

      TEPP-46 inhibited the nuclear transport of PKM2 in primary astrocytes. Nuclear-cytoplasmic protein extraction analysis showed the nuclear and cytoplasmic changes of PKM2 in TEPP-46 treated astrocytes and MOGsup-stimulated astrocytes. Primary astrocytes were pretreated with 50 μM TEPP-46 for 30 min and stimulated with MOGsup for 24 h.

      Supplementary Figure 3:

      - In Figure 3D, merge should be stated on top of the merged images, it is confusing to the reader.

      According to the reviewer’s comments, we have added merge on top of the merged images.

      Discussion:

      All results should be discussed in detail by interpreting them according to the literature.

      We have further discussed the results in the discussion n section. Firstly, we added a paragraph describing the role of nuclear translocation of PKM2 in diverse CNS diseases. Moreover, a paragraph discussing the nuclear function of PKM2 as a protein kinase or transcriptional co-activator was added. Now the discussion section is more comprehensive, which nearly discuss all the results by interpreting them according to the literature in detail.

      Reviewer #2 (Recommendations For The Authors):

      The authors could address the following points:

      (1) In Figure 1A, the authors present immunofluorescence staining of PKM2 in both control mice and MOG35-725 55-induced EAE mice across different stages of disease progression: onset, peak, and chronic stages. Observing the representative images suggests a notable increase in PKM2 levels, particularly within the nucleus of MOG35-725 55-induced EAE mice. However, to provide a more comprehensive analysis, it would be beneficial for the authors to include statistical data, such as average intensities {plus minus} standard deviation (SD), along with the nuclear PKM2 ratio, akin to the presentation for cultured primary astrocytes in vitro in panels B-D. Additionally, the authors should clearly specify the number of technical repeats and the total number of animals utilized for these data sets to ensure transparency and reproducibility of the findings.

      Thanks for the reviewer’s suggestion. Accordingly, for figure 1A, we have added the nuclear PKM2 ratio in astrocytes in control and different stages of EAE mice in Supplementary figure S1A. Moreover, the quantification of mean fluorescence intensity (MFI) for PKM2 was added in figure S1B. Moreover, we have added the number of animals used in each group in figure legend.

      (2) The blue hue observed in the merged images of Figure 1B (lower panel) presents a challenge for interpretation. The source of this coloration remains unclear from the provided information. Did the authors also include a co-stain for the nucleus in their imaging? To enhance clarity, especially for individuals with color vision deficiency, the authors might consider utilizing different color combinations, such as presenting PKM2 in green and GFAP in magenta, which would aid in distinguishing the two components. Furthermore, for in vitro cell analysis, incorporating a nuclear stain could provide valuable insights into estimating the cytosolic-to-nuclear ratio of PKM2.

      For the question relating to the merged images in figure 1B, PKM2 was presented in green, GFAP was presented in red and blue represents the nuclear staining by DAPI. “Merge” represents the merged images of these three colors. To enhance the clarity, we have added the images for the nuclear staining of DAPI.

      (3) To substantiate the conclusion of the authors regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes, employing supplementary methodologies such as high-resolution respirometry and metabolomics could offer valuable insights. These techniques would provide a more comprehensive understanding of metabolic alterations and further validate the observed changes in glycolytic activity.

      While we recognize the merits of techniques such as high-resolution respirometry and metabolomics, we believe that the conclusions regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes are sufficiently supported by the current experimental evidence. Our study has relied on a robust set of experiments, including lactate production, glucose consumption, cyto-nuclear localization analysis and western blotting analysis of key enzymes in glycolysis. These results, in conjunction with the literature on the role of PKM2 in various cancer cells, keratinocytes and immune cells, provide a strong foundation for our conclusions. Although metabolomics could offer a global view of the changes in metabolic states in astrocytes, as the end product of aerobic glycolysis is lactate, our study, which analyze the change of lactate levels in different experimental conditions might be more direct. However, we fully acknowledge that future studies employing these advanced methodologies could provide further insights into the precise mechanisms underlying PKM2's effects on aerobic glycolysis.

      (4) Minor: Why is the style of the columns different in Gig 2 panel D compared to those shown in panels B, C, and G of Figure 2.

      To maintain consistency in the column style across figure 2, we have updated the column in figure 2D. Now, we use same style of columns in Fig 2B, C, D and G.

      (5) The effect of stimulating astrocytes with MOGsup on cell proliferation, as shown in Figure 2E, is very moderate. Does DASA-58 reduce the proliferation of control cells in this assay?

      In response to the reviewer’s questions, we conducted a CCK8 analysis in astrocytes subjected to DASA-58 treatment. As depicted in Author response image 5, administration of DASA-58 did not reduce the proliferation of control cells. This result aligns with our other findings in the glycolysis assays and EdU analysis, where there is no statistical difference between control group and DASA-58-treated group. One plausible explanation for this is that in their steady state, astrocytes in the control group are not in a hyperproliferative state. Under such conditions, inhibiting the translocation of PKM2 via DASA-58 or other inhibitors did not significantly affect the proliferation of astrocytes.

      Author response image 5.

      CCK8 analysis of astrocyte proliferation. Primary astrocytes were pretreated with 50 μM DASA-58 for 30 min before stimulation with MOGsup. Data are represented as mean ± SEM. ***P<0.001. SEM, standard error of the mean.

      (6) The tables and lists in Figure 4, panels A-D, are notably small, hindering readability and comprehension. Consider relocating these components to the supplementary materials as larger versions.

      We have updated the tables and lists, the lines are made thicker. As suggested by the reviewer, we relocate theses components in Supplementary Figure S5.

      Reviewer #3 (Recommendations For The Authors):

      Higher magnification images that more clearly show nuclear translocation of PKM2 and pp65 and pSTAT3 immunoreactivity should be added to the figures panels, for example as inlets.

      Thank you for pointing out this issue in the manuscript. According to the reviewer’s comments we have included higher magnification images as inlets for Figure 3A, Figure 3B and Figure 2A. These enlarged images now provide a clearer visualization of the nuclear translocation state of PKM2, pp65, and pSTAT3.

      There are seldom wording errors like features => feathers at line 364.

      We are very sorry for our incorrect writing. We have corrected this spelling mistake in the manuscript.

      Reviewer #4 (Recommendations For The Authors):

      Here below are major and minor concerns on the data presented:

      (1) It is not clear from the Methods section what are the culture conditions defined as 'control' in Figure 1B-D. I believe the control should be culturing with the conditioned medium of normal (non-EAE) mice splenocytes to be sure the effect is not from cytokines naturally secreted by these cells.

      Thanks for the reviewer’s comments and we totally understand the reviewer's concern. The control means non-treated primary astrocytes cultured with traditional DMEM medium supplemented with 10% FBS. In fact, we have performed experiments to exclude the possibility that the observed effect of MOGsup on the activation of astrocytes is from cytokines secreted by splenocytes. Splenocytes from normal (non-EAE) mice were isolated, cultured in RPMI-1640 medium containing 10% FBS for 60 hours, and supernatant was collected. Immunofluorescence staining of PKM2 and GFAP were performed in non-treated primary astrocytes and astrocytes stimulated with supernatant from control splenocytes. As shown in Figure S1C, in both groups, no difference was observed in PKM2 expression and localization, PKM2 was located mainly in the cytoplasm in theses conditions. These results indicate that observed effect of PKM2 in MOGsup-stimulated condition is not due to the cytokines secreted from splenocytes. Thus, we used non-treated primary astrocytes as controls in our study. To clarify the control group, we have revised the description in the figure legend, The revised expression is as follows: “Immunofluorescence staining of PKM2 (green) with GFAP (red) in non-treated primary astrocytes (control) or primary astrocytes cultured with splenocytes supernatants of MOG35–55-induced EAE mice (MOGsup) for different time points (6 h, 12 h and 24 h). ”

      (2) Figure 3D: the presence of PMK2 in the nuclear fraction upon MOGSUP together with the DASA-58 (last lane of Figure 3D) is not supporting the hypothesis proposed and further may indicate that the reduction of pSTAT3, pp65, etc. observed is independent of PMK2 nuclear translocation/astrocyte activation being observed even in absence of MOGSUP.

      Thank you for pointing out this problem in manuscript. The representing image of nuclear level of PKM2 in Figure 3D is not obvious, as shown by figure 3D, which has raised doubts among the reviewers. To strengthen our conclusion that the reduction of STAT3 and p65 pathway is related to the inhibited nuclear level of PKM2 induced by DASA-58, nuclear PKM2 level was quantified and added in Figure S4B. From the quantification results, it is evident that DASA-58 administration decreased the nuclear level of PKM2 in MOGsup-stimulated astrocytes. To address this concern, we have updated the immunoblot image for PKM2 in figure 3D and incorporated quantification results in supplementary Figure S4.

      (3) Molecular docking indication and deletion co-immunoprecipitation reported in Figure 4 data are not concordant on TRIM21: N-terminal Phe23 and Thr87 (Figure 4E) predicted by MD to bind PMK2 are not in the PRY-SPRY domain suggested by the co-IP experiment (Figure 4I).

      The discrepancy between the molecular docking prediction and the co-immunoprecipitation can be explained as follows:

      Firstly, molecular docking is computational methods that predicts protein-protein interaction based on 3-D structures of the proteins. However, the accuracy of this predication can be influenced by the different models of 3D structures of TRIM21 and PKM2, as well as by factors such as post-translational modifications and flexibility of the proteins. Proteins in vivo are subject to post-translational modifications that can affect their interactions. These modifications are not fully captured in molecular docking analysis. For example, in our analysis, the predicted N-terminal Phe23 and Thr87 in TRIM21 hold the potential to interact with PKM2 by hydrogen bonds. However, such binding can be influenced by diverse biological environments, such as different cells and pathological conditions. Molecular docking predication may suggest the specific residues and binding pocked within the protein complex, however, the accuracy should be verified by experimental techniques such as immunoprecipitation. To address the predication results of molecular docking, the description has been revised as follows: “TRIM21 is predicted to bound to PKM2 via hydrogen bonds between the amino acids of the two molecules.”

      Co-immunoprecipitation that involves the use of truncated domains of TRIM21 and PKM2, is an experimental technique relies on the specific interaction between antibody and targeted proteins. This technique can provide insights into the precise binding domains between TRIM21 and PKM2. As demonstrated in our study, PRY-SPRY domain of TRIM21 is involved in this binding. In summary, while molecular docking and Co-IP are valuable tools for studying protein-protein interactions, their differing focus and limitations may result in discrepancies between the predicted interaction sites and the experimentally identified interaction domains.

      (4) The Authors state that PMK2 is a substrate of TRIM21 E3 ligase activity, however, this is not proved: i) interaction does not imply a ligase-substrate relationship; ii) the ubiquitination shown in Figure 6C is not performed in denaturing conditions thus the K63-Ub antibody can detect also interacting FLAG-IPed proteins (besides, only a single strong band is seen, not a chain; molecular weights in immunoblot should be indicated); iii) use of a catalytically inactive TRIM21 would be required as well.

      We appreciate the reviewer’s comments regarding the limitations of the immunoprecipitation and K63-antibody test, which could not lead to the conclusion that PKM2 is a substrate of TRIM21. To avoid any misunderstandings, we have revised the relevant sentence from “Hereby, we recognized PKM2 as a substrate of TRIM21” to “Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21”. Moreover, we have revised the title of the relevant part in the results section, the previous title, “TRIM21 ubiquitylates and promotes the nuclear translocation of PKM2” has been replaced with “TRIM21 promotes ubiquitylation and the nuclear translocation of PKM2”. Moreover, molecular weights for all proteins in western blotting were indicated.

      (5) As above, molecular weights should always be indicated in immunoblot.

      Thanks for pointing out this problem in the figures. Accordingly, we have added the molecular weights for every protein tested in immunoblot.

      (6) The authors should describe the EAE mouse model in the text and in the material and methods as it may not be so well known to the entire reader audience, and the basic principle of MOG35-55 stimulation, in order to understand the experimental plan meaning.

      We appreciate the reviewer’s comments highlighting the importance of clarifying EAE model for a broader understanding of the reader audience. In response, we have described the EAE model both in the text and in the materials and methods section. In the text, the description of EAE model was added at the beginning of the first paragraph in the Results section. The description is as follows: “EAE is widely used as a mouse model of multiple sclerosis, which is typically induced by active immunization with different myelin-derived antigens along with adjuvants such as pertussis toxin (PTX). One widely used antigen is the myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide (Nitsch et al., 2021), which was adopted in our current studies.”

      We have also added the detailed experimental procedures for EAE induction in the materials and methods section.

      (7) The authors should better explain and give the rationale for the use of splenocytes and why directly activated astrocytes (isolated from the EAE model) cannot be employed to confirm/prove some of the presented data.

      Firstly, splenocytes offer a heterogenous cell population, encompassing T cells and antigen presenting cells (APC), which may better mimic the microenvironment and complex immune responses observed in vivo.

      Myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide is one widely used antigen for EAE induction. MOG35-55 elicits strong T responses and is highly encephalitogenic. Moreover, MOG35-55 induces T cell-mediated phenotype of multiple sclerosis in animal models. Thus, by isolating splenocytes from the onset stage of EAE mice, which contains APC and effector T cells, followed by stimulation with antigen MOG35-55 in vitro for 60 hours, the T-cell response in the acute stage of EAE diseases could be mimicked in vitro. The supernatant from MOG35-55 stimulated splenocytes has high levels of IFN-γ and IL-17A, which in part mimic the pathological process and environment in EAE, and this technique has been documented in the references (Chen et al., 2009, Kozela et al., 2015).

      Correspondingly, we have revised sentence for the use of MOG35-55 stimulates splenocytes in EAE mice and add the relevant references: “Supernatant of MOG35-55-stimulated splenocytes isolated from EAE mice were previously shown to elicit a T-cell response in the acute stage of EAE and are frequently used as an in vitro autoimmune model to investigate MS and EAE pathophysiology (Chen et al., 2009, Du et al., 2019, Kozela et al., 2015).”

      Secondly, activated astrocytes (isolated from the EAE model) can not be employed for in vitro culture for the following reasons:

      (1) Low cell viability. Compared to embryonic or neonatal mice, adult mice yield a limited number of viable cells. The is mainly because that adult tissues possess less proliferative capacity.

      (2) Disease changes. Astrocytes in EAE mice are exposed to microenvironment including inflammatory cytokines, antigens and other pathological factors. Without this environment, the function and morphology of astrocytes undergo changes, which make it difficult to interpret the results in vitro.

      For these reasons, the in vitro cultured primary astrocytes used the neonatal mice.

      (8) The authors should indicate the phosphorylation sites they are referring to when analysing p-c-myc, pSTAT3, pp65, etc...

      According to the reviewer’s suggestions, we have added the phosphorylation sites for pSTAT3 (Y705), pp65 (S536), p-c-myc (S62) and pIKK (S176+S180) in the figure panels.

      (9) Reference of DASA-58 and TEPP-46 inhibitors and their specificity should be given.

      According to the reviewer’s comments, we have added the relevant references for the use of DASA-58 and TEPP-46 as inhibitors of PKM2 nuclear transport. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      To address the selectivity of TEPP-46 and add the references, the relevant sentence has been revised from “TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization” to “TEPP-46 is a selective allosteric activator for PKM2, showing little or no effect on other pyruvate isoforms. It promotes the tetramerization of PKM2, thereby diminishing its nuclear translocation (Anastasiou et al., 2012, Angiari et al., 2020).”

      Reviewing Editor (Recommendations For The Authors):

      The reviewing editor would appreciate it if the original blots from the western blot analysis, which were used to generate the final figures, could be provided.

      Thanks for the reviewing editor’s comment, accordingly, we will add the original blots for the western blots analysis.

      References

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      Escartin C, Guillemaud O, Carrillo-de Sauvage M-A. Questions and (some) answers on reactive astrocytes. Glia 2019;67(12):2221-47.

      Ferrara G, Benzi A, Sturla L, Marubbi D, Frumento D, Spinelli S, et al. Sirt6 inhibition delays the onset of experimental autoimmune encephalomyelitis by reducing dendritic cell migration. Journal of neuroinflammation 2020;17(1):228.

      Lin CC, Edelson BT. New Insights into the Role of IL-1β in Experimental Autoimmune Encephalomyelitis and Multiple Sclerosis. Journal of immunology (Baltimore, Md : 1950) 2017;198(12):4553-60.

      Palsson-McDermott Eva M, Curtis Anne M, Goel G, Lauterbach Mario AR, Sheedy Frederick J, Gleeson Laura E, et al. Pyruvate Kinase M2 Regulates Hif-1α Activity and IL-1β Induction and Is a Critical Determinant of the Warburg Effect in LPS-Activated Macrophages. Cell metabolism 2015;21(1):65-80.Rao J, Wang H, Ni M, Wang Z, Wang Z, Wei S, et al. FSTL1 promotes liver fibrosis by reprogramming macrophage function through modulating the intracellular function of PKM2. Gut 2022;71(12):2539-50.

      Wheeler MA, Clark IC, Tjon EC, Li Z, Zandee SEJ, Couturier CP, et al. MAFG-driven astrocytes promote CNS inflammation. Nature 2020;578(7796):593-9.

      Zhang J, Feng G, Bao G, Xu G, Sun Y, Li W, et al. Nuclear translocation of PKM2 modulates astrocyte proliferation via p27 and -catenin pathway after spinal cord injury. Cell Cycle 2015;14(16):2609-18.

    1. Author response:

      We thank the editor and reviewers for their supportive comments about our modeling approach and conclusions, and for raising several valid concerns; we address them briefly below.

      Concerns about model’s biological realism and impact on interpretations

      The goal of this paper was to use an interpretable and modular model to investigate the impact of varying sensorimotor delays. Aspects of the model (e.g. layered architecture, modularity) are inspired by biology; at the same time, necessary abstractions and simplifications (e.g. using an optimal controller) are made for interpretability and generalizability, and they reflect common approaches from past work. The hypothesized effects of certain simplifying assumptions are discussed in detail in Section 3.5. Furthermore, the modularity of our model allows us to readily incorporate additional biological realism (e.g. biomechanics, connectomics, and neural dynamics) in future work. In the revision, we will add citations and edits to the text to clarify these points.

      Concerns that the model is overly complex

      To investigate the impact of sensorimotor delays on locomotion, we built a closed-loop model that recapitulates the complex joint trajectories of fly walking. We agree that locomotion models face a tradeoff between simplicity/interpretability and realism — therefore, we developed a model that was as simple and interpretable as possible, while still reasonably recapitulating joint trajectories and generalizing to novel simulation scenarios. Along these lines, we also did not select a model that primarily recreates empirical data, as this would hinder generalizability and add unnecessary complexity to the model. We do not think these design choices are significant weaknesses of this model; in fact, few comparable models account for all joints involved in locomotion, and fewer explicitly compare model kinematics with kinematics from data. We will add citations and edits to the text to clarify these points in the revision.

      Concerns about the validity of the Kinematic Similarity (KS) metric to evaluate walking

      We chose to incorporate only the first two PCA modes dimensions in the KS metric because the kernel density estimator performs poorly for high dimensional data. Our primary use of this metric was to indicate whether the simulated fly continues walking in the presence of perturbations. For technical reasons, it is not feasible to perform equivalent experiments on real walking flies, which is one of the reasons we explore this phenomenon with the model. We note the dramatic shift from walking to non-walking as delay increases (Figure 5). To be thorough, in the revision, we will investigate the effect of incorporating additional PCA modes, and whether this affects the interpretation of our results. We will additionally edit the discussion and presentation of the KS metric to clarify its purpose in this study. We agree with the reviewers that the KS metric is too coarse to reflect fine details of joint kinematics; indeed, in the unperturbed case, we evaluate our model’s performance using other metrics based on comparisons with empirical data (Figures 2, 7, 8).

    1. Author response:

      We thank the reviewers for their engagement and constructive comments. This provisional response aims to clarify key misconceptions, address major criticisms, and outline our revision plans.

      A primary concern of the reviewers appears to be our model's limitations in addressing a broad range of empirical findings. This, however, misinterprets our core contribution. Our work centers on a cautionary tale that before advocating for newly discovered cell types and their purported special roles in spatial cognition—an approach prevalent in the field—such claims must be tested against alternative (null) hypotheses that may contradict intuitive expectations. We present such an alternative hypothesis regarding spatial cells and their assumed privileged roles. We show that key findings in the field - spatial “cell types”,  arise in a set of null models without spatial grounding (including untrained variants) despite the models not being a model for spatial processing, and we also found that they had no privileged role for representing spatial information.

      Our proposal is not a new model attempting to explain the brain, and therefore we do not aim to capture every empirical finding. Indeed, we would not expect an object recognition model (and its untrained variant) with no explicit spatial grounding to account for all phenomena in spatial cognition. This underscores our key point: if there exists a basic, spatially agnostic model that can explain certain degrees of empirical findings using criteria from the literature (i.e. place, head-direction and border cells), what implications does this have for the more complex theories and models proposed as underlying mechanisms of special cell types?

      Regarding concerns about the limited scope and generalizability of our setting, we will clarify that we considered multiple DNN architectures, both trained and untrained, on multiple decoding tasks (position, head direction, and nearest-wall distance). We plan to extend our experiments further as detailed in the revision plan below.

      Further, there was a methodological concern about using a linear decoder on a fixed DNN for spatial decoding tasks being a form of "hacking". However, linear readout is standard practice in neuroscience to characterize information available in a neural population. Moreover, our tests on untrained networks also showed spatial decoding capabilities, suggesting it's not solely due to the linear readout.

      For our full revision plan:

      (1) We will revise the manuscript to better reflect these above points, clarifying our paper's stance and improving the writing to reduce misconceptions.

      (2) We will address individual public reviews in more detail.

      (3) We intend to address key reviewer recommendations, focusing on better situating our work within the broader context of the existing literature whilst emphasizing the null hypothesis perspective.

      (4) In general, we will consider additional aspects of the literature and conduct new experiments to strengthen the relevance of our work to existing work. We highlight a number of potential experiments which we believe can address reviewer concerns:

      a. Blurring the visual inputs to DNNs to match rodent perception.

      b. Vary environmental settings to verify whether our findings are more

      generalizable (which we predict to be the case).

      c. Vary the environment to assess remapping effects, which will strengthen the

      connection of our work to the literature.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we have tempered our claims about such differences and use more caution in the interpretation of these data (Results p. 6 and Discussion p.10). Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of Injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g., if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at each volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, we kept all the other parameters across animals constant (see new Supplementary Table 1), all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we now provide results for each of the AAV-S5E2 injection case separately in a new Supplementary Table 2. The results in this table indicate the results are indeed rather consistent across cases with slightly greater specificity for injection volumes in the range of 105-180 nl.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these percentages are upper bound but that they vary by serotype and layer while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we have changed 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We have amended this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract (“86-90%) is also slightly exaggerated relative to the results: “Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl.”

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we now state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we now provide a new Supplementary Table 1 with titers and other information for each viral vector injection. We also provide information regarding viral preparation in a new sections in the Methods entitled “ Viral Preparation”  (p12).

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: “as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington),” and delete “Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato.” These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      These data are indeed anecdotal, but we felt this could be useful information, potentially preventing other primate labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent in our hands. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, following the reviewer’s suggestion, we have deleted this information.

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native/ non-amplified tdT signal was strong. This is now stated in the methods (p.12).

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brough up by Reviewer 1, which we have addressed in our reply-to-Reviewer 1. For clarity and convenience, below we copy our response to Reviewer 1.

      “We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      Authors’ reply. In the revised version of the manuscript, we have corrected ambiguous language throughout.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      My Public Review comments can be addressed by dialing down the interpretation of the data or providing appropriate caveats in the presentation of the relevant results and their discussion.

      We have done so. See text additions on p. 6 of the Results and p.10 of the Discussion.

      Minor comments:

      92% of PV+ neurons in the marmoset cortex were GABAergic. Can the authors speculate on the identity of the 8% PV+/GABA- neurons (e.g., on the basis of morphology)? Are they likely excitatory? Are they more likely to represent failures of GABA staining?

      We do not know what the other 8% of PV+/GABA- neurons are because we did not perform any other kind of IHC staining. Our best guess is that at least to some extent these represent failures of GABA staining, which is always challenging to perform in primate cortex. However, in mouse PV expression has been demonstrated in a minority of excitatory neurons.

      "Coverage of the PV-AAV was high, did not depend on injection volume.." The fact that the coverage did not depend on injection volume presumably depends, at least in part, on how ROIs were selected. Surely different volumes of injection transduce different numbers of neurons at different distances from the injection track. This should be clarified.

      The ROIs were selected at the center of the injected site/expression core from sections in which the expression region encompassed all cortical layers. Of course, larger volumes of injection resulted in larger transduced regions and therefore overall larger number of transduced neurons, but we counted cells only withing 100 µm wide ROIs at the center of the injection and the percent of transduced PV cells in this core region did not vary significantly across volumes. We have clarified the methods of ROI selection (see Methods pp. 13).

      Figure 2. What is meant by “absolute” in the legend for Figure 2? (How does “mean absolute density” differ from “mean density?”)

      We meant not relative, but this is obvious from the units, so we have removed the word “absolute” in the legend.

      Some non-significant p-values are indicated by "p>0.05" whereas others are given precisely (e.g., p = 1). Please provide precise p-values throughout. Also, the p-value from a surprisingly large number of comparisons in the first section of the results is "1". Is this due to rounding? Is it possible to get significance in a Bonferroni-corrected Kruskal-Wallis test with only 6 observations per condition?

      We now report exact p values throughout the manuscript (with a couple of exceptions where, in order to avoid reporting a large number of p values which interrupts the flow of the manuscript) we provide the upper bound value and state all those comparisons were below that value). The minimum sample size for Kruskall Wallis is 5, for each group being compared, and we our sample is 6 per group.

      Figure 3: The density of tdTomato-expressing cells appears to be greater at the AAV9 injection site than at the AAV1 injection site in the example sections shown. Might some of the differences between serotypes be due to this difference? I would imagine that resolving individual cells with certainty becomes more difficult as the amount of tdTomato expression increases.

      There was an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. Hence the density of tdTomato appeared lower than it is. Moreover, the tdT expression region shown in Fig. 3A is a merge of two sections, while it is only from a single section in panels B and C, leading to the impression of higher density of infected cells in panel A. The pipette used for the injection in panel A was not inserted perfectly vertical to the cortical surface, resulting in an injection site that did not span all layers in a single section; thus, to demonstrate that the injection indeed encompassed all layers (and that the virus infected cells in all layers), we collapsed label from two sections. We have now corrected the magnification of panel C so that it matches the scale bar in panel A, and specify in the figure legend that panel A label is from two sections.

      Text regarding Figure 3: The term “injection sizes” is confusing. I think it is intended to mean “the area over which tdTomato-expressing cells were found” but this should be clarified.

      Throughout the manuscript, we have changed the term injection site to “viral-expression region”.

      Figure 3: What were the titers of the three AAV-h56D vectors?

      Titers are now reported in the new Supplementary Table 1.

      Figure 3: The yellow box in Figure 3C is slightly larger than the yellow boxes in 3A and 3B. Is this an error or should the inset of Figure 3 have a scale bar that differs from the 50 µm scale bar in 3A?

      There were indeed errors in scale bars in this figure, which we have now corrected. Now all boxes have the same scale bar.

      Was MM423 one of the animals that received the AAV-h56D injections or one of the three that received AAV-S5E2 injection?

      This is an animal that received a 315nl injection of AAV-PHP.eB-S5E2.tdTomato. This is now specified in the Methods (see p. 12) and in the new Supplementary Table 1.

      Please provide raw cell counts and post-injection survival times for each animal.

      We now provide this information in Supplementary Tables 1 and 2.

      How were the different injection volumes of the AAV-S5E2 virus arranged by animal? Which volume of the AAV-S5E2 virus was injected into the two animals who received single injections?

      We now provide this information in Supplementary Table 1.

      Figure 6A: the point is made in the text that "[the distribution of tdT+ and PV+ neurons] did not differ significantly... peaking in L2/3 and 4C " Is the fact that the number of tdT+ and PV+ peak in layers 2/3 and 4C a consequence of these layers being thicker than the others? If so, this statement seems trivial.

      No, and this is the reason why we measured density in addition to percent of cells across layers in Figure 2. Figure 2B shows that even when measuring density, therefore normalizing by area, GABA+ and PV+ cell density still peaks in L2/3 and 4. Thus, these peaks do not simply reflect the greater thickness of these layers.

      Do the authors have permission to use data from Xu et al. 2010?

      Yes, we do.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      "Viral strategies to restrict gene expression to PV neurons have also been recently developed (Mehta et al., 2019; Vormstein-Schneider et al., 2020)." Mich et al. should also be cited here. Cell Rep. 2021;34(13):108754.

      We thank the reviewer for pointing out this missing references. This is now cited.

      “GABA density in L4C did not differ from any other layers, but the percent of GABA+ cells in L4C was significantly higher than in L1 (p=0.009) and 4A/B (p=<0.0001).” This and other similar observations depend on calculating the percentage of cells relative to the total number of DAPI-labeled cells in each layer. Since it is apparent that there must be considerable variability between layers, it would be helpful to add a histogram showing the densities of all DAPI-labeled cells for each layer.

      This is not how we calculated density. Density, as now clarified in the Results on p. 4, was defined as the number of cells per unit area. Counts in each layer were divided by each layers’ counting area. This corrects for differences in number of total labeled cells per layer. Therefore, reporting DAPI density is not necessary (we did not count DAPI cell density per layer).

      "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes, suggesting the different serotypes have different capacity of infecting cortical neurons. AAV7 produced the smallest injection site, which additionally was biased to the superficial and deep layers, with only few cells expressing tdT in the middle layers (Fig. 3B). AAV9 (Fig. 3A) and AAV1 (Fig. 3C) resulted in larger injection sites and infected all cortical layers." Differences noted here might reflect either differences related to the AAV serotype or to differences in titers. Please add details about titers for each vector and add comments as appropriate. Another interpretation would be that there are differences in viral spread within the tissue.

      We have now added Supplementary Table 1 which reports titers in addition to other information about injections. The titers and volumes used for AAV9 and AAV7 were identical, while the titer for AAV1 was higher. Therefore, the differences in infectivity, particularly the much smaller expression region obtained with AAV7 cannot be attributed to titer. Likely this is due to differences in tropism and/or viral spread among serotypes. This is now discussed (see Results p. 5bottom and 6 top).

      “Recently, several viral vectors have been identified that selectively and efficiently restrict gene expression to GABAergic neurons and their subtypes across several species, but a thorough validation and characterization of these vectors in primate cortex has lacked.” Is this really a fair statement, or is the characterization presented here also lacking? Methods used by others for quantifying specificity and efficiency are essentially the same as used here. See for example Mich et al. (which is not cited).

      The original validation in primates of the vectors examined in our study was based on small tissue samples and did not examine the laminar expression profile of transgene expression induced by these enhancer-AAVs. For example, the validation of the h56D-AAV in marmoset cortex in the original paper by Mehta et al (2019) was performed on a tissue biopsy with no knowledge of which cortical layers were included in the tissue sample. The only study that shows laminar expression in primate cortex (Mich et al., which is now cited), only shows qualitative images of viral expression across layers, reporting total specificity and coverage pooled across samples; moreover, the study by Mich et al.  deals with different PV-specific enhancers than the ones characterized in our study. Unlike any of the previous studies, here we have quantified specificity and coverage across layers.

      "Specifically, we have shown that the GABA-specific AAV9-h56D (Mehta et al., 2019) induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% coverage, and the PV-specific AAV-PHP.eB-S5E2 (Vormstein-Schneider et al., 2020) induces transgene expression in PV cells with up to 98% specificity and 86-90% coverage." These statements in the discussion repeat the somewhat exaggerated coverage numbers noted above for the Abstract.

      The averages across all layers are reported in the Results. The Discussion, abstract and discussion report upper limits, and this is made clear by stating “up to”, and now we have also added “depending on layer”.

      Reviewer #3 (Recommendations For The Authors):

      Abstract:

      • Ln 2: Can you be more specific about what you mean by the 'various functions of inhibition'? e.g. do you mean 'the various inhibitory influences on the local microcircuit' or similar?

      These are listed in the introduction to the paper but there is no space in the abstract to do so. Now the sentence reads: “various computational functions of…”.

      • Ln 5: 'has' to 'is'/'has been'.

      The grammar here is correct “has derived”.

      • Ln 6: humans are primates! Maybe change this to 'nonhuman primates'?

      We have added “non-human”

      • Ln n-1: 'viral vectors represent' -> 'viral vectors are'.

      We have changed it to “are”

      Intro:

      • Many readers may expect 'VIP' to be listed as the third major sub-class of interneurons. Could you note that the 5HT3a receptor-expressing group includes VIP cells?

      Done (p.3).

      • "Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans" - this seems close to being circular logic (not quite, but close). Could you modify this sentence to reflect why understanding cortical function and dysfunction in NHP may be of interest?

      This sentence now reads (p.3):” Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans, where cortical inhibitory neuron dysfunction has been implicated in many neurological and psychiatric disorders, such as epilepsy, schizophrenia and Alzheimer’s disease (Cheah et al., 2012; Verret et al., 2012; Mukherjee et al., 2019)”. We also note that this was already stated in the previous version of the paper but in the Discussion section which read (and still reads on p. 9 2nd paragraph): “It is important to study inhibitory neuron function in the primate, because it is unclear whether findings in mice apply to higher species, and inhibitory neuron dysfunction in humans has been implicated in several neurological and psychiatric disorders (Marin, 2012; Goldberg and Coulter, 2013; Lewis, 2014).”.

      • "In particular, two recent studies have developed recombinant adeno-associated viral vectors (AAV) that restrict gene expression to GABAergic neurons". This sentence places the emphasis on the wrong component of the technology. The fact that AAV was used is irrelevant; these constructs could equally have been packaged in a lenti, CAV, HSV, rabies, etc. The emphasis should be on the recently developed regulatory elements (the enhancers/promoters).

      Same problem with the following excerpts; this text implies that the serotype/vector confers cell-type selectivity, but the results presented do not support this assertion (the promoter/enhancer is what confers the selectivity).

      • "specifically, three serotypes of an AAV that restricts gene expression to GABAergic neurons".

      • "one serotype of an AAV that restricts gene expression to PV cells".

      • "GABA- and PV-specific AAVs".

      • "GABA-specific AAV" (in results).

      • "PV-specific AAVs".

      • "In this study, we have characterized several AAV vectors designed to restrict expression to GABAergic cells" (in discussion).

      • "GABA-virus". GABA is a NT, not a virus.

      We have modified the language in all these sections and throughout the manuscript.

      Results:

      • Enhancer specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      We agree, and in fact we are not making comparisons between different enhancers (i.e., S5E2 and h56D).

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      The authors need to either: (1) replicate the h56D virus injections in (at least) a second animal, or (2) rewrite the paper to focus on the AAV.PhP mDlx virus alone - for which they have adequate data - and mention the h56D data as an anecdotal result, with clear warnings about the preliminary nature of the observations due to lack of replication.

      We agree about the lack of sufficient data to make strong statements about the differences between serotypes for the h56D-AAV. In the revised version of the manuscript, following the Reviewers’ suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data. We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species. Our edits in regard to this point can be found in the Results on p. 6 and Discussion on p. 10.

      • Did the authors compare h56D vs mDlx? This would be a useful and interesting comparison.

      We did not.

      • 3 tissue sections were used for analysis. How were these selected? Did the authors use a stereological approach?

      For the analysis in Fig. 2, the 3 sections were randomly selected and for the positioning of the ROIs we selected a region in dorsal V1 anterior to the posterior pole  (to avoid laminar distortions due to the curvature of the brain). This is now specified (see p. 4).

      • "both GABA+ and PV+ cells peak in layers" revise for clarity (e.g., the counts peak).

      In now reads “GABA+ and PV+ cell percent and density” (see p.4).

      • "we refer to this virus as GABA-AAV" these are 3 different viruses!

      The idea here was to use an abbreviation instead of using the full viral name every single time. Clearly the reviewer does not like this, so we have removed this convention throughout the paper and now specify the entire viral name each time.

      • "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes". Do you mean 'resulted in different volumes of expression'?

      Yes. We have now rephrased this as follows: “…resulted in viral expression regions that differed in both size as well as laminar distribution” (p.5).

      • “suggesting the different serotypes have different capacity of infecting cortical neurons”. You can’t draw any firm conclusions from a single injection. The rest of this section of the results, along with the whole of Figure 4, and Figure 7a-d, is in danger of being misleading. Please remove. The best you can do here is to say ‘we injected 3 different viruses that express reporter under the h56D promoter. The results are shown in Figure 3, but these are anecdotal, as only a single injection of each virus was performed’. You could then note in the discussion to what extent these results are consistent with the existing literature (e.g., AAV9 often produces good coverage in NHP – anterograde and retrograde, AAV1 also works well in the CNS, although generally doesn’t infect as aggressively as AAV9. I’m not familiar with any attempts to use AAV7).

      With respect to Fig. 4, our approach in the revised version is detailed above. For convenience we copy it below here. With respect to Fig 7A-D, we feel the results are more robust as the data from the 3 serotypes here were pooled together, as the 3 serotype similarly downregulated GABA and PV expression at the injection site, and we do not make any statement about differences among serotypes for the data shown in Fig. 7A-D.

      “In the revised version of the manuscript, following the Reviewer ’s suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data (see revised text in the Results on p. 6 and in the Discussion on p. 10). We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      • Figure 3: why the large variation in tissue quality? Are the 3 upper images taken at the same magnification? If not, they need different scale bars. The cells in A (upper row) look much smaller than those in B and C, and the size of the 'inset' box varies.

      We thank the reviewer for noticing this. We discovered an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. We have now corrected the error in scale bars. We have also fixed the different box sizes.

      • "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl." Coverage didn't differ between layers, so revise this to: "Overall, across all layers coverage ranged from 78% to 81.6%." or give an overall mean (~80%).

      We have corrected the sentence as suggested by the Reviewer (see p. 8 first paragraph).

      • "extending farther from the borders" -> "extending beyond the borders".

      We have corrected the sentence as suggested by the Reviewer (see p. 8).

      • "The reduced GABA and PV immunoreactivity caused by the viruses implies that the specificity of the viruses we have validated in this study is likely higher than estimated". Yes, but for balance you should also note that they may harm the physiology of the cell.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Discussion:

      • "but a thorough validation and characterization of these vectors in primate cortex has lacked" better to say "has been limited", because Dimidschstein 2016 (marmoset V1) and Vormstein-schneider 2020 (macaque S1 and PFC) both reported expression in NHP.

      We have added the following sentence to this paragraph of the Discussion. “In particular, previous studies have not characterized the specificity and coverage of these vectors across cortical layers.”(see p. 8).

      • "whether finding in mice" -> 'whether findings in mice'.

      Corrected, thanks.

      • The discussion re: species differences is missing reference to Kreinen 2020 (10.1038/s41586-020-2781-z).

      This reference has been added. Thanks.

      • “Injections of about 200nl volume resulted in higher specificity (95% across layers) and coverage” – this is misleading. The coverage was not statistically different among injection volumes.

      We have added the following sentence: ”although coverage did not differ significantly across volumes.” (see p. 10).

      • "it is possible that subtle alteration of the cortical circuit upon parenchymal injection of viruses (including AAVs) leads to alteration of activity-dependent expression of PV and GABA." Or (and I would argue, more likely) the expression of large quantities of your big reporter protein compromised the function of the cell, leading to reduced expression of native proteins. You don't mention any IHC to amplify the RFP signal, so I'm assuming that your images are of direct expression. If so, you are expressing A LOT of reporter protein.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Methods:

      • It's difficult to piece together which viruses were injected in which monkeys, at what volumes, and at what titer. Please compile this info into a table for ease of reference (including any other relevant parameters).

      We now provide a Supplementary Table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this manuscript characterize new anion conducting that is more red-shifted in its spectrum than prior variants called MsACR1. An additional mutant variant of MsACR1 that is renamed raACR has a 20 nm red-shifted spectral response with faster kinetics. Due to the spectral shift of these variants, the authors proposed that it is possible to inhibit the expression of MsACR1 and raACR with lights at 635 nm in vivo and in vitro. The authors were able to demonstrate some inhibition in vitro and in vivo with 635 nm light. Overall the new variants with unique properties should be able to suppress neuronal activities with red-shifted light stimulation.

      Strengths:

      The authors were able to identify a new class of anion conducting channelrhodopsin and have variants that respond strongly to lights with wavelength >550 nm. The authors were able to demonstrate this variant, MsACR1, can alter behavior in vivo with 635 nm light. The second major strength of the study is the development of a red-shifted mutant of MsACR1 that has faster kinetics and 20 nm red-shifted from a single mutation.

      Weaknesses:

      The red-shifted raACR appears to work much less efficiently than MsACR1 even with 635 nm light illumination both in vivo (Figure 4) and in vitro (Figure 3E) despite the 20 nm red-shift. This is inconsistent with the benefits and effects of red-shifting the spectrum in raACR. This usually would suggest raACR either has a lower conductance than MsACR1 or that the membrane/overall expression of raACR is much weaker than MsACR1. Neither of these is measured in the current manuscript.

      Thank you for addressing this crucial issue. We posit that the diminished efficiency of raACR in comparison to MsACR1 WT can be attributed to the tenfold acceleration of its photocycle. As noted by Reviewer 1, the anticipated advantages associated with a red-shifted opsin, particularly in in vivo preparations, are offset by its accelerated off-kinetics. Consequently, the shorter dwell time of the open state leads to a reduced number of conducted ions per photon. Nevertheless, the operational light sensitivity is not drastically altered compared to MsACR WT (Fig. 3C). We believe that the rapid kinetics offer interesting applications, such as the precise inhibition of single action potentials through holography.

      There are limited comparisons to existing variants of ACRs under the same conditions in the manuscript overall. There should be more parallel comparison with gtACR1, ZipACR, and RubyACR in identical conditions in cultured cell lines, cultured neurons, and in vivo. This should be in terms of overall performance, efficiency, and expression in identical conditions. Without this information, it is unclear whether the effects at 635 nm are due to the expression level which can compensate for the spectral shift.

      We compared MsACR1 and raACR with GtACR1 in ND cells in supplemental figure 4. We concur that further comparisons could be useful to emphasise both the strengths of MsACRs and applications where they may not be as suitable. We are currently in the process of outlining a separate article. We firmly believe that each ACR variant occupies a distinct application niche, which necessitates a more comprehensive electrophysiological comparison to provide valuable insights to the scientific community.

      There should be more raw traces from the recordings of the different variants in response to short pulse stimulation and long pulse stimulation to different wavelengths. It is difficult to judge what the response would be like when these types of information are missing.

      We appreciate Reviewer 1's feedback and have compiled a collection of raw photoresponses, encompassing various pulse widths and wavelengths, which can be found in the Supplementary materials (Supplementary Figures 4 and 5).

      Despite being able to activate the channelrhodopsin with 635 nm light, the main utility of the variant should be transcranial stimulation which was not demonstrated here.

      We concur with Reviewer 1's assessment that MsACR prime application is indeed transcranial stimulation. However, it's worth emphasising that the full advantages of transcranial optical stimulation become most apparent when animals are truly freely moving without any tethered patch cords. Our ongoing research in the laboratory is dedicated to the development of a wireless LED system that can be securely affixed to the animal's skull. We aim to demonstrate the potential of these novell optogenetic approaches in the field of behavioural neuroscience in the coming year.

      Figure 3B is not clearly annotated and is difficult to match the explanation in the figure legend to the figure. The action potential spikings of neurons expressing raACR in this panel are inhibited as strongly as MsACR1.

      We have enhanced the figure caption and annotations for clarity. The traces presented in Figure 3B are intended to demonstrate the overall effectiveness of each variant. However, it is in the population data analysis, as depicted in Figure 3E, where the meaningful insights are revealed.

      For many characterizations, the number of 'n's are quite low (3-7).

      We acknowledge Reviewer 1's suggestion regarding the in vivo data and agree with the importance of including more animals, as well as control animals. However, we are committed to adhering to the principles of the 3Rs (Replacement, Reduction, Refinement) in animal research, and given the robustness of our observed effects, we will add animals to reach the minimal number of animals per condition (n = 2) to minimise unnecessary animal usage while ensuring statistical power.

      We will continue to adhere to the established standards in the field, aiming for a range of 3 to 7 cells per condition, sourced from at least two independent preparations, to ensure the robustness and reliability of our in vitro data.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified a new chloride-conducting Channelrhodopsin (MsACR1) that can be activated at low light intensities and within the red part of the visible spectrum. Additional engineering of MsACR1 yielded a variant (raACR1) with increased current amplitudes, accelerated kinetics, and a 20nm red-shifted peak excitation wavelength. Stimulation of MsACR1 and raACR1 expressing neurons with 635nm in mice's primary motor cortices inhibited the animals' locomotion.

      Strengths:

      The in vitro characterization of the newly identified ACRs is very detailed and confirms the biophysical properties as described by the authors. Notably, the ACRs are very light sensitive and allow for efficient in vitro inhibition of neurons in the nano Watt/mm^2 range. These new ACRs give neuroscientists and cell biologists a new tool to control chloride flux over biological membranes with high temporal and spatial precision. The red-shifted excitation peaks of these ACRs could allow for multiplexed application with blue-light excited optogenetic tools such as cation-conducting channelrhodopsins or green-fluorescent calcium indicators such as GCaMP.

      Weaknesses:

      The in-vivo characterization of MsACR1 and raACR1 lacks critical control experiments and is, therefore, too preliminary. The experimental conditions differ fundamentally between in vitro and in vivo characterizations. For example, chloride gradients differ within neurons which can weaken inhibition or even cause excitation at synapses, as pointed out by the authors. Notably, the patch pipettes for the in vitro characterization contained low chloride concentrations that might not reflect possible conditions found in the in vivo preparations, i.e., increasing chloride gradients from dendrites to synapses.

      We appreciate Reviewer 2’s feedback regarding missing control experiments. We will respond to these concerns in another section of our manuscript, as suggested.

      Regarding the chloride gradient, we understand the concerns of Reviewer 2, yet we chose these ionic conditions, particularly as they were used in the initial electrical characterization of GtACR1 in a neuronal context (Mahn et al., 2016). We will make sure to provide this context in our manuscript to justify our choice of ionic conditions.

      Interestingly, the authors used soma-targeted (st) MsACR1 and raACR1 for some of their in vitro characterization yielding more efficient inhibition and reduction of co-incidental "on-set" spiking. Still, the authors do not seem to have utilized st-variants in vivo.

      At the time of submission, due to the long-term absence of our lab technician, we were not able to produce purified viruses. Therefore, we decided to move on with the submission. We now produced the virus externally, and will provide the experiments.

      Most importantly, critical in vivo control experiments, such as negative controls like GFP or positive controls like NpHR, are missing. These controls would exclude potential behavioral effects due to experimental artifacts. Moreover, in vivo electrophysiology could have confirmed whether targeted neurons were inhibited under optogenetic stimulations.

      We have several non-injected control animals that we used to calibrate this particular paradigm and never saw similar responses. However, we acknowledge the suggestion of Reviewer 2 and will include the GFP-injected control as recommended.

      Some of these concerns stem from the fact that the pulsed raACR stimulation at 635 nm at 10Hz (Fig. 3E) was far less efficient compared to MsACR1, yet the in vivo comparison yielded very similar results (Fig. 4D).

      As outlined previously, the accelerated photocycle of raACR results in a reduction in photocurrent amplitude, consequently diminishing the potency of inhibition per photon. In the context of in vitro stimulation, where single action potentials are recorded, this reduction in inhibition efficiency is resolved. However, in the realm of in vivo behavioural analysis, the observed effect is not contingent on single action potentials but rather stems from the disruption of the entire M1 motor network. In this context, despite the reduced efficiency of the fast-cycling raACR, it still manages to interrupt the M1 network, leading to similar behavioural outcomes.

      Also, the cortex is highly heterogeneous and comprises excitatory and inhibitory neurons. Using the synapsin promoter, the viral expression paradigm could target both types and cause differential effects, which has not been investigated further, for example, by immunohistochemistry. An alternative expression system, for example, under VGLUT1 control, could have mitigated some of these concerns.

      Indeed, we acknowledge the limitations of our current experimental approach. We are in the process of planning and conducting additional experiments involving cre-dependent expression of st-MSACR and st-raACR in PV-Cre mice.

      Furthermore, the authors applied different light intensities, wavelengths, and stimulation frequencies during the in vitro characterization, causing varying spike inhibition efficiencies. The in vivo characterization is notably lacking this type of control. Thus, it is unclear why the 635nm, 2s at 20Hz every 5s stimulation protocol, which has no equivalent in the in vitro characterization, was chosen.

      We appreciate the valuable comment from the reviewer. The objective of our in vitro characterization is to elucidate the general effects of specific stimulation parameters on the efficiency of neuronal inhibition. For instance, we aim to demonstrate that lower light intensities result in less efficient inhibition, or that pulse stimulation may lead to a less complete inhibition, albeit significantly reducing the energy input into the system.

      In the in vivo characterization, we face constraints such as animal welfare considerations and limitations in available laser lines, which prevent us from exploring the entire parameter space as comprehensively as in the in vitro preparation. Additionally, it is important to note that membrane capacitance tends to be higher in vivo compared to dissociated hippocampal neurons. Consequently, we have opted for a doubled stimulation frequency from 10 Hz to 20 Hz and the stimulation pattern of 2 seconds ”on” and 5 seconds “off”. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      In summary, the in vivo experiments did not confirm whether the observed inhibition of mouse locomotion occurred due to the inhibition of neurons or experimental artifacts.

      In addition, the author's main claim of more efficient neuronal inhibition would require them to threshold MsACR1 and raACR1 against alternative methods such as the red-shifted NpHR variant Jaws or other ACRs to give readers meaningful guidance when choosing an inhibitory tool.

      The light sensitivity of MsACR1 and raACR1 are impressive and well characterized in vitro. However, the authors only reported the overall light output at the fiber tip for the in vivo experiments: 0.5 mW. Without context, it is difficult to evaluate this value. Calculating the light power density at certain distances from the light fiber or thresholding against alternative tools such as NpHR, Jaws, or other ACRs would allow for a more meaningful evaluation.

      We thank the reviewers for their comments.

      Reviewer #1 (Recommendations For The Authors):

      The study would be much strengthened if the authors can perform more experiments and characterization to support their claims, in addition to showing more raw electrophysiological traces/results and not just summary charts and graphs.

      As outlined above, further experiments are planned. We appreciate the suggestion to include more raw electrophysiological traces. Photocurrent traces of all included mutants of MsACR1 measured in ND cells and traces of hippocampal neuronal measurements of non- and soma-targeted MsACR1 and raACR will be included as supplemental figures.

      Reviewer #2 (Recommendations For The Authors):

      Major concern:

      It is unclear if the optogenetic light stimulation in Fig. 4 caused direct inhibition of neuronal activity in M1, which cell types were targeted, and how MsACR1 and raACR1 compare to other optogenetic inhibitors.

      Also, the rationale for the light stimulation (635 nm, 2s, 20Hz, every 5s) is not clear.

      I would suggest the following to address these concerns:

      (1) M1 expression and stimulation of a negative control such as GFP to exclude that experimental artifacts cause the observed behavioral outcomes.

      We are now preparing the required GFP control, and will add it to the new version of the manuscript.

      (2) Expression and stimulation of NpHR as a positive control.

      We will use st-GtACR1 as a positive control.

      (3) Electrophysiological measurements of neuronal activity under optogenetic stimulation to confirm the effectiveness of neuronal inhibition, i.e. suppression of spontaneous firing under light etc.

      We concur with Reviewer 2 regarding the potential value of incorporating such in vivo optrode recordings into our manuscript to enable readers to assess the effectiveness of MsACR. As part of our plan for the next version of the manuscript, we intend to conduct these experiments.

      (4) ChR2 or other cation-conducting channelrhodopsins with the same expression paradigm could be used to observe diametrically opposite effects.

      As Reviewer 2 has already pointed out, the complex interactions that can occur in our viral strategy when an inhibitory opsin is expressed in both excitatory and inhibitory neurons make us sceptical about the possibility of an excitatory opsin leading to opposing effects.

      Considering the non-linear input-output function of cortical circuits, optogenetic activation of neurons, even when expressed in either inhibitory or excitatory neurons, is likely to result in the perturbation of the cortical network, which will likely also lead to locomotor arrest.

      (5) The authors should confirm whether the expression under synapsin preferentially targeted excitatory and inhibitory cells because inhibiting inhibitory cells could lead to the disinhibition of the principal cells. Synapsin promoters can drive expression in glutamatergic and GABAergic neurons. An alternative expression system under VGLUT1 promoter could yield better targeting.

      We concur with Reviewer 2 and will conduct the next set of experiments using the PV-Cre mouse line. Additionally, we will employ in vivo electrophysiology to further confirm the inhibition of the motor cortex network.

      (6) Titrating of optogenetic stimulation: The author should test whether increasing or decreasing light intensities and stimulation frequencies as well as different wavelengths (550 nm vs 635 nm) cause differences in inhibiting locomotion in vivo as it did for inhibiting the neuronal firing in vitro (Fig. 3B-E).

      The non-linear input-output function within cortical networks, coupled with our sole reliance on behaviour as a readout, will pose challenges in resolving subtle effects on locomotion arrest across various stimulation parameters.

      For our planned in vivo electrophysiology recordings, we will measure cortical firing rates as a proxy rather than relying solely on behavioural observations. This approach will allow us to map the fundamental axes of our parameter space in vivo, considering factors such as wavelength, light intensity, and frequency

      (7) Explanation of why the 20Hz/2s light stimulation protocol was chosen.

      As outlined above, considering animal welfare and increased membrane capacitance in vivo, we opted for the outlined stimulation protocol. This approach allows the animals to spend less time in an arrested state while still demonstrating the effect of MsACR and variants.

      (8) In vivo thresholding against other inhibitory tools, such as RubyACRs, Jaws, etc would provide critical guidance for the audience and potential users. It would be particularly important to compare the necessary light intensities for reaching similar behavioral outcomes.

      We concur with Reviewer 2 and will prepare data using GtACR1 as a reference.

      (9) The author should calculate or reasonably estimate the in vivo light intensity during optogenetic stimulation to provide a meaningful comparison to their in vitro characterization. Ideally, they can provide an estimated volume for efficient stimulation of MsACR1 and raACR1 and compare it to other optogenetic tools.

      We will conduct a Monte Carlo simulation and offer a comparison of the effective activation volume across various classes of optogenetic tools.

      Minor concerns:

      (1) Why were st- MsACR1 and raACR1 used in vitro but not in vivo? The viral constructs were described as AAV/DJ-hSyn1-MsACR-mCerulean and AAV/DJ-hSyn1-raACR-mCerulean.

      As mentioned earlier, we were unable to produce purified soma-targeted MsACR variants before the manuscript submission. We will now provide these measurements.

      (2) Light intensities for the spectral measurements are missing.

      During action spectra measurements, a motorised neutral density filter wheel is used to have equal photon flux for all tested wavelengths. Additionally, the light intensity is further reduced by using additional neutral density filters to ensure sufficiently low photocurrents to determine the spectral maximum. Therefore, the light intensity varied between constructs and sometimes measurements. We added the following line to the respective methods section to further clarify this: “(typically in the low µW-range at 𝜆max)”.

      (3) MsACR1 is slower and probably more light-sensitive than raACR1, which is faster but has larger photocurrents. These are complementary tradeoffs, and the audience might wonder how MsACR1 and raACR1 photocurrents compare under similar conditions. Therefore, I suggest an alternative representation in Fig. 2C. That is, the presentation of the excitation spectra under similar light intensities and with absolute photocurrent values.

      Unfortunately, due to the reasons stated above, MsACR1 and raACR action spectra were not recorded with the same light intensity. However, MsACR1 and raACR are compared under the same conditions for Fig. 2B, E, and F (560 nm light at ~3.2 mW/mm2) as well as in Supp. Fig. 4C.

      (4) Figure legends for figures 3F and G are missing details for describing the stimulation paradigm.

      We added more details about the stimulation paradigm.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      This work sets out to elucidate mechanistic intricacies in inflammatory responses in pneumonia in the context of the aging process (Terc deficiency - telomerase functionality).

      Strengths:

      Very interesting, conceptually speaking, approach that is by all means worth pursuing. An overall proper approach to the posited aim.

      We want to thank the reviewer for taking the time to review our manuscript and for providing positive feedback regarding our research question.

      Weaknesses:

      The work is heavily underpowered and may have statistical deficits. This precludes it in its current state from drawing unequivocal conclusions.

      Thank you for this essential and valuable comment. We fully accept that the small sample size of the Tercko/ko mice is a major limitation of our study and transparently discuss this in our manuscript.

      However, due to Animal Welfare regulations, only a reduced number of mice were approved because of the strong burden of disease. Consequently, only three non-infected and five infected mice were available to us. This reduced number of mice presents a clear limitation to our study. However, due to ethical considerations related to animal welfare and sustainability, as well as compliance with German animal welfare regulations, it is not possible to obtain additional Tercko/ko mice to increase the dataset. The animal studies are an important aspect of our study; however, our hypothesis was also investigated at multiple levels, including in an in vitro co-culture model (Figure 5), to ensure comprehensive analysis.

      Thus, we clearly demonstrated that S. aureus pneumonia in Tercko/ko mice leads to a more severe phenotype, orchestrated by the dysregulation of both innate and adaptive immune response.

      Reviewer #2 (Public Review):

      Summary:

      The authors demonstrate heightened susceptibility of Terc-KO mice to S. aureus-induced pneumonia, perform gene expression analysis from the infected lungs, find an elevated inflammatory (NLRP3) signature in some Terc-KO but not control mice, and some reduction in T cell signatures. Based on that, They conclude that disregulated inflammation and T-cell dysfunction play a major role in these phenomena.

      Strengths:

      The strengths of the work include a problem not previously addressed (the role of the Terc component of the telomerase complex) in certain aspects of resistance to bacterial infection and innate (and maybe adaptive) immune function.

      We would like to thank the reviewer for the positive feedback regarding our aim to investigate the impact of Terc deletion on the pulmonary immune response to S. aureus.

      Weaknesses:

      The weaknesses outweigh the strengths, dominantly because conclusions are plagued by flaws in experimental design, by lack of rigorous controls, and by incomplete and inadequate approaches to testing immune function. These weaknesses are as follows

      (1)  Terc-KO mice are a genomic knockout model, and therefore the authors need to carefully consider the impact of this KO on a wide range of tissues. This, however, is not the case. There are no attempts to perform cell transfers or use irradiation chimera or crosses that would be informative.

      We thank the reviewer for bringing up this important point. The aim of our study, however; was to investigate the impact of Terc deletion in the lung and on the response to bacterial pneumonia, rather than to provide a comprehensive characterization of the Tercko/ko model itself. This characterization of different tissues and cell types has already been conducted by previous studies. For instance, studies that characterize the general phenotype of the model (Herrera et al., 1999; Lee et al., 1998; Rudolph et al., 1999) but also investigations that shed light on the impact of Terc deletion on specific cell types such as microglia (Khan et al., 2015) or T cells (Matthe et al., 2022). The impact of Terc deletion on T cells is also discussed in our manuscript in lines 89 to 105. Furthermore, a section about the general phenotype of the Terc deletion model is included in the introduction in lines 126 to 138. Thus we discussed the relevant literature regarding Tercko/ko mice in our manuscript and attempted to provide a more in-depth characterization of the lung by investigating the inflammatory response to infection as well as changes in the gene expression (Figure 2-4).

      (2)  Throughout the manuscript the authors invoke the role of telomere shortening in aging, and according to them, their Terc-KO mice should be one potential model for aging. Yet the authors consistently describe major differences between young Terc-KO and naturally aging old mice, with no discussion of the implications. This further confuses the biological significance of this work as presented.

      Thank you for mentioning this relevant point. We want to apologize for the confusion regarding this matter. While Tercko/ko mice are a well-established model for premature aging, these effects become more apparent with increasing generations (G) and thus, G5 and 6 mice are the most affected by Terc deletion (Lee et al., 1998; Wong et al., 2008).

      Thus, while Tercko/ko mice are a common model for premature aging, this accelerated aging phenotype is predominantly apparent in later-generation Tercko/ko (G5 and 6) or aged Tercko/ko mice (Lee et al., 1998; Wong et al., 2008). Since the aim of this study was to analyze the impact of Terc deletion on the lung and its immune response to bacterial infections instead of the impact of telomere shortening and telomerase dysfunction, young G3 Tercko/ko mice (8 weeks) were used in this study. This is also mentioned in the lines 131-134. In this study, Tercko/ko mice were used not as a model of aging, but rather as a model specifically for Terc deletion. The old WT mice function as a control cohort to observe possible common but also deviating effects between aging and Terc deletion. In our sequencing data, we observe that uninfected young WT mice are very similar to uninfected Tercko/ko mice. Other studies have also reported this lack of major differences between uninfected WT and Tercko/ko mice in the G3 knockout mice (Kang et al., 2018). Conversely, uninfected young WT and Tercko/ko mice exhibited great differences, for instance, regarding the numbers of differentially expressed genes (Supplemental Figure 1H). Thus, differences between naturally aged mice and young G3 Tercko/ko mice are not surprising. To clarify this aspect we reconstructed the paragraph discussing the Tercko/ko mice (lines 126-134). Additionally we added a paragraph explaining the purpose of the naturally aged mice to the lines 134 to 138:

      “As control cohort age-matched young WT mice were utilized. To investigate whether Terc deletion, beyond critical telomere shortening, impacts the pulmonary immune response, we used young Tercko/ko mice. Additionally, naturally aged mice (2 years old) were infected to explore the potential link to a fully developed aging phenotype.”

      (3)  Related to #2, group design for comparisons lacks a clear rationale. The authors stipulate that Terc- KO will mimic natural aging, but in fact, the only significant differences seen between groups in susceptibility to S. aureus are, contrary to the authors' expectation, between young Terc-KO and naturally old mice (Figures 1A and B, no difference between young Terc-KO and young wt); or there are no significant differences at all between groups (Figures 1, C, D,).

      We thank the reviewer for this essential comment. As mentioned above the Tercko/ko mice in this study are not selected to model natural aging. To model telomerase dysfunction and accelerated aging selection of later generation or aged Tercko/ko mice would have been more suitable.

      The lack of statistical significance in some figures is likely due to the heterogeneity of disease phenotype of S. aureus infection in mice, which is a limitation of our study that we discuss in our discussion section in lines 577-583. The phenotype of S. aureus infection can vary greatly within a mouse population, highlighting the limitations of mice as a model for S. aureus infections. To account for this heterogeneity we divided the infected Tercko/ko mice cohort into different degrees of severity based on the clinical score and the presence of bacteria in organs other than the lung (mice with systemic infection).

      Despite the heterogeneity especially within the Tercko/ko mice cohort the differences between the knockout and young as well as old WT mice were striking. Including the fatal infections, 80% of the Tercko/ko mice had a severe course of disease, while none of the WT mice displayed a severe course (Figure 1A, B and Supplemental Figure 1A, B). This hints towards a clear role of Terc in the response to S. aureus infection in mice. Thus while in some figures the differences are not significant, strong trends towards a more severe phenotype of S. aureus infection in the Tercko/ko mice regarding bacterial load, score and inflammatory response could be observed in our study.

      Another example of inadequate group design is when the authors begin dividing their Terc-KO groups by clinical score into animals with or without "systemic infection" (the condition where a bacterium spreads uncontrollably across the many organs and via blood, which should be properly called sepsis), and then compare this sepsis group to other groups (Supplementary Figures 1G; Figure 2; lines 374-376 and 389- 391). This gives them significant differences in several figures, but because they did not clearly indicate where they applied this stratification in the figure legends, the data are somewhat confusing. Most importantly, methodologically it is highly inappropriate to compare one mouse with sepsis to another one without. If Terc-KO mice with sepsis are a comparator group, then their controls have to be wild-type mice with sepsis, who are dealing with the same high bacterial load across the body and are presumably forced to deploy the same set of immune defenses.

      We sincerely appreciate the significant time and effort you have invested in reviewing our manuscript. However, with all due respect, we must point out that the definition of sepsis you have referenced is considered outdated. According to the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3), sepsis is defined as "a life-threatening organ dysfunction caused by a dysregulated host response to infection" (Marvin Singer, 2016, JAMA). Given this fundamental misunderstanding of our findings, we find the comment regarding the inadequacy of our groups to be both dismissive and lacking in scientific merit. We would like to emphasize that the group size used in our study is consistent with accepted standards in infection research. We strongly reject any insinuations of inadequacy that have been repeatedly mentioned throughout the review.

      In order to provide a nuanced investigation of disease severity in Tercko/ko mice, we added the term “systemic infection” to the figures whenever the mice were divided into groups of mice with and without systemic infection. This is the case for Figure 2A and Supplemental Figure 1C-E. The division into mice with and without systemic infection is also mentioned in the figure legend of Figure 2A in lines 933 to 936 and for Supplemental Figure 1 in lines 1053-1054. We agree that Supplemental Figure 1G is somewhat confusing as the mice with systemic infection are highlighted in this graph but not included as a separate group within our sequencing analysis. We added a sentence to the figure legend clarifying this (lines 1042-1045):

      “Nevertheless, the infected Tercko/ko mice were considered one group for the expression analysis and not split into separate groups for the subsequent analysis.”

      Additionally, we revised the section regarding this grouping in different degrees of severity in our Material and Methods section to clarify that this division was only performed for specific analysis (line 191):

      “…for the indicated analysis.”

      Furthermore, the mice which were classified as systemically infected mice were not septic mice, as mentioned above. Those mice were classified by us as systemically infected based on their clinical score and the presence of bacteria in other organs than the lung as stated in the lines 188-191 and 377-382.

      Bacteremia is a symptom of very severe cases of hospital-acquired pneumonia with a very high mortality (De la Calle et al., 2016).

      Therefore, the systemically infected mice or rather mice with bacteremia display an especially severe pneumonia phenotype, which is distinct from sepsis. The presence of this symptom in our Tercko/ko mice further highlights the clinical relevance of our study. This aspect was added to the manuscript in the lines 569-571.

      “The detection of bacteria in extra pulmonary organs is of particular interest, as bacteremia is a symptom of severe pneumonia and is associated with high mortality (De la Calle et al., 2016).”

      (4)  The authors conclude that disregulated inflammation and T-cell dysfunction play a major role in S. aureus susceptibility. This may or may not be an important observation, because many KO mice are abnormal for a variety of reasons, and until such reasons are mechanistically dissected, the physiological importance of the observation will remain unclear.

      Two points are important here. First, there is no natural counterpart to a Terc-KO, which is a complete loss of a key non-enzymatic component of the telomerase complex starting in utero.

      Second, the authors truly did not examine the key basic features of their model, including the features of basic and induced inflammatory and immune responses. This analysis could be done either using model antigens in adjuvants, defined innate immune stimuli (e.g. TLR, RLR, or NLR agonists), or microbial challenge. The only data provided along these lines are the baseline frequencies of total T cells in the spleen of the three groups of mice examined (not statistically significant, Figure 4B). We do not know if the composition of naïve to memory T cell subsets may have been different, and more importantly, we have no data to evaluate whether recruitment of the immune response (including T cells) to the lung upon microbial challenge is similar or different. So, what are the numbers and percentages of T cells and alveolar macrophages in the lung following S. aureus challenge and are they even comparable or are there issues in mobilizing the T cell response to the site of infection? If, for example, Terc-KO mice do not mobilize enough T cells to the lung during infection, that would explain the paucity in many T-cell- associated genes in their transcriptomic set that the authors report. That in turn may not mean dysfunction of T cells but potentially a whole different set of defects in coordinating the response in Terc-KO mice.

      We thank the reviewer for highlighting these important aspects. Regarding the first point, indeed there is no naturally occurring deletion of Terc in humans. However, studies reported reduced expression of Terc and Tert in the tissues of aged mice and rats (Tarry-Adkins et al., 2021; Zhang et al., 2018). Terc itself has been found to have several important immunomodulatory functions such as the activation of the NF- κB or PI3-kinase pathway (Liu et al., 2019; Wu et al., 2022). As those aforementioned pathways are relevant for the immune response to S. aureus infections, the authors were interested in exploring the impact of Terc deletion on the pulmonary immune response. The potential immunomodulatory functions of Terc are discussed in lines 106-121. To further clarify our rationale we added a sentence to the introduction in lines 121-125.

      “Interestingly, downregulation of Terc and Tert expression in tissues of aged mice and rats has been found (Tarry-Adkins, Aiken, Dearden, Fernandez-Twinn, & Ozanne, 2021; Zhang et al., 2018).

      Therefore, as a potential immunomodulatory factor reduced Terc expression could be connected to age- related pathologies.”

      Regarding the second point, as we focused on the effect of Terc deletion in the lung and its role in S. aureus infection, we investigated inflammatory and immune response parameters relevant to this setting. For instance, inflammation parameters in the lungs of all three mice cohorts were measured to investigate differences in the inflammatory response in the non-infected and infected mice (Figure 2A). Those measurements showed no baseline difference in key inflammatory parameters between young WT and Tercko/ko mice, which is consistent with previous findings (Kang et al., 2018). The inflammatory response to infection with S. aureus in the Tercko/ko mice cohort differed significantly from the other cohorts (Figure 2A), hinting towards a dysregulated inflammatory response due to Terc deletion. Furthermore, we investigated general immune cell frequencies such as dendritic cells, macrophages, and B cells in the spleen of all three mice cohorts to gather a baseline understanding of the general immune cell populations. In our manuscript only total T cell frequencies were included due to its relevance for our data regarding T cells (Figure 4B). This data could show that there was no difference of total amount of T cells in the spleen of all three mice cohorts. For a more detailed insight into our analysis we added the frequencies of the other immune cell populations analyzed in the spleen as a Supplemental Figure 3B-F. Additionally, a figure legend for the graphs was added.

      Therefore, while we did not analyze baseline frequencies of specific populations of T cells, we analyzed and characterized the inflammatory and immune response of our model in a way relevant to our research question.

      The differences observed in T cell marker and TCR gene expression was also partly present between the uninfected and infected Tercko/ko mice such as the complete absence of CD247 expression in infected Tercko/ko, which is however expressed in uninfected mice of this cohort (Figure 4A, C and D). Thus, this effect cannot be solely attributed to an inadequate mobilization of T cells to the lung after infectious challenge. However, we agree that a more detailed insight into recruited immune cells to the lung or frequencies of different T cell populations could contribute to a better understanding of the proposed mechanism and would be an interesting experiment to conduct in further studies. We accept this as a limitation of our study and included it in our discussion section in lines 720-724:

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      (5)  Related to that, immunological analysis is also inadequate. First, the authors pull signatures from the total lung tissue, which is both imprecise and potentially skewed by differences, not in gene expression but in types of cells present and/or their abundance, a feature known to be affected by aging and perhaps by Terc deficiency during infection. Second, to draw any conclusions about immune responses, the authors would have to track antigen-specific T cells, which is possible for a wide range of microbial pathogens using peptide-MHC multimers. This would allow highly precise analysis of phenomena the authors are trying to conclude about. Moreover, it would allow them to confirm their gene expression data in populations of physiological interest

      We thank the reviewer for highlighting this important and relevant point. In our study, we aimed to investigate the role of Terc expression in modulating inflammation and the immune response to S. aureus infection in the lung. To address this, we examined the overall impact of age, genotype, and infection on lung inflammation and gene expression. Therefore, sequencing of total lung tissue was essential for addressing the research question posed. Our findings demonstrate that Tercko/ko mice exhibit a more severe phenotype following S. aureus infection, characterized by an increased bacterial load and heightened lung inflammation (Figures 1 and 2). Furthermore, our data suggest that Terc plays a role in regulating inflammation through activation of the NLRP3 inflammasome, along with the dysregulation of several T cell marker genes (Figures 2, 4, and 5). However, this study lacks a detailed analysis of distinct T cell populations, including antigen-specific T cells, as noted earlier. Investigating these aspects in future studies would be valuable to validate and expand upon our findings. We have incorporated these suggestions into the discussion section (lines 720-724)

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      Nevertheless, our study provides first evidence of a potential connection between T cell functionality and Terc expression.

      Third, the authors co-incubate AM and T cells with S. aureus. There is no information here about the phenotype of T cells used. Were they naïve, and how many S. aureus-specific T cells did they contain? Or were they a mix of different cell types, which we know will change with aging (fewer naïve and many more memory cells of different flavors), and maybe even with a Terc-KO? Naïve T cells do not interact with AM; only effector and memory cells would be able to do so, once they have been primed by contact with dendritic cells bringing antigen into the lymphoid tissues, so it is unclear what the authors are modeling here. Mature primed effector T cells would go to the lung and would interact with AM, but it is almost certain that the authors did not generate these cells for their experiment (or at least nothing like that was described in the methods or the text).

      Thank you for bringing up this important question. For the co-cultivation experiment of T cells and alveolar macrophages, total CD4+ T cells of both young WT and Tercko/ko were used. We did not select for a specific population of T cells. Our sequencing data indicated the complete downregulation of CD247 expression, which is an important part of the T cell receptor, in the lungs of infected Tercko/ko mice (Figure 4A, C and D). Given that this factor is downregulated under chronic inflammatory conditions, we investigated the impact of the inflammatory response in alveolar macrophages on the expression of various T cell-derived cytokines, as well as CD247 expression (Figure 5D, E) (Dexiu et al., 2022). This aspect is also highlighted in the discussion in lines 623-637. Therefore, a co-cultivation model of T cells and alveolar macrophages was established and confronted with heat-killed S. aureus to elicit an inflammatory response of the macrophages. To emphasize this purpose, we have revised our statement about the model setup in lines 517-519 of the manuscript:

      “An overactive inflammatory response could be a potential explanation for the dysregulated TCR signaling.”

      The authors hope this will clarify the intent behind the model setup.

      (6)  Overall, the authors began to address the role of Terc in bacterial susceptibility, but to what extent that specifically involves inflammation and macrophages, T cell immunity, or aging remains unclear at present.

      We thank the reviewer for the helpful and relevant comments. The authors accept the limitations of the presented study such as the reduced number of Tercko/ko mice and the limitations of murine models for S. aureus infection itself and discuss those in the discussion section in the lines 559-561; 577-583; 690-692 and 720-726. However, we hope that our responses have provided sufficient evidence to convince the reviewer that our data supports a clear role for Terc expression in regulating the immune response to bacterial infections, particularly with respect to inflammation and its potential connection to T cell functionality.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:<br /> I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by

      different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pittfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.<br /> I believe this manuscript has a potential to advance our knowledge on litter decomposition.

      Strengths:

      Well design study with combination of different approaches (methods) and consideration of seasonality to generalize pattern.

      The study expands to current understanding of litter decomposition and interaction between factors affecting the process (here climate and decomposers).

      Weaknesses:

      The study was only based on a single litter species.

      We now discuss the advantages and limitations of this approach in the methods and devote a completely new paragraph to this important point in the discussion (lines 394-401).

      Reviewer #2 (Public Review):

      Summary: Torsekar et al. use a leaf litter decomposition experiment across seasons, and in an aridity gradient, to provide a careful test of the role of different-sized soil invertebrates in shaping the rates of leaf litter decomposition. The authors found that large-sized invertebrates are more active in the summer and small-sized invertebrates in the winter. The summed effects of all invets then translated into similar levels of decomposition across seasons. The system breaks down in hyper-arid sites.

      Strengths: This is a well-written manuscript that provides a complete statistical analysis of a nice dataset. The authors provide a complete discussion of their results in the current literature.

      Weaknesses:

      I have only three minor comments. Please standardize the color across ALL figures (use the same color always for the same thing, and be friendly to color-blind people).

      Thank you for this important suggestion. We have now changed all figures to standardize all colors and chose a more color-blind friendly pallete.

      Fig 1 may benefit from separating the orange line (micro and meso) into two lines that reflect your experimental setup and results. I would mention the dryland decomposition conundrum earlier in the Introduction.

      We based our novel hypotheses on a thorough literature search. Accordingly, decomposition is expected to be positively associated with moisture, regardless of the decomposer body size. Our contribution to theory was to suggest that macro-detritivores may respond very differently to climatic conditions and dominate litter decomposition in warm arid-lands (we listed the reasons in the text). Consequently, we did not distinguish between microorganisms and mesofauna. We assumed that both groups inhabit the litter substrate and have limited adaptation to dry conditions. Our results provide strong evidence that this presumption is likely wrong and that mesofauna respond to climate very differently from micro-decomposers. Yet, we cannot use hindsight understanding to improve our original hypothesis. We now emphasize this important point at the discussion as important future direction. 

      Although we are very appreciative and pleased with the reviewer enthusiasm to highlight the importance of our work as a possible solution to the longstanding dryland decomposition conundrum, we decided not to move it to the introduction. This is because we think that our work is not centred on resolving the DDC but provides more general principles that may lead to a paradigm shift in the way ecologists study nutrient cycling across ecosystems.

      And the manuscript is full of minor grammatical errors. Some careful reading and fixing of all these minor mistakes here and there would be needed.

      We apologize and did our best to find and fix those mistakes

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pitfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.

      I believe this manuscript has the potential to advance our knowledge on litter decomposition. Below i provide my general and specific comments.

      General comments:

      (1) Study in general is well designed and well thought beforehand,

      (2) Study aims to expand the current understanding of the dryland decomposition conundrum

      (3) The should put a caveat to the fact they only use one litter species and call for examining litter mixture in the same gradient.

      (4) Please check the way you reduce the random effects from your initial model, I have provided a better way to do so in my specific comments

      (5) For Figure 1, authors can check my comment on this and see if they could revise the figure.

      Thank you for the positive feedback and your valuable comments. We have tried to best address all comments and suggestions for improvement and clarification

      Specific comments

      Line # 57 Please write "Theory suggests" instead of "Theory suggest"

      We changed the text as suggested

      Line # 70, please write "Indeed, handful evidence shows" instead of "Indeed, handful evidence show"

      We changed the text as suggested

      Figure 1: I like this conceptual framework. I have a silly question, why is it that the slopes of the whole community at the beginning (between Hyperarid and Arid) is the same as the Macro fauna, I would think the slope should be higher as this is adding up right? and also the same goes for the decomposition of whole community later on. For me this should reflect the adding or summing up (if i am right) then the authors should think about how this could be reflected in the figure.

      We agree with your interpretation that the whole community decomposition reflects the addition by constituent decomposers. The slope of the whole community decomposition between hyper-arid and arid is slightly higher than the one of macro decomposition to reflect the additive effect of macro with meso+micro decomposition. We have now changed the figure slightly to make this point more visible (Line 106).

      Line # 111 Please make "Methods" bold as well to be consistent with others headings.

      We changed the formatting as suggested

      Line #125 and in other lines as well please replace "X" by "x" to denote multiplication.

      We changed the formatting as suggested

      Table 1 Please add "*" to climate like this "Climate*" so that the end note of the table could make sense

      Thank you for this suggestion. We have now added the asterisk referring to the note below the Table.

      Figure 2, please consider putting at line #133, mean annual precipitation (MAP), as such for line # 135 You can directly says The precipitation map ....

      We made both changes as suggested.

      Line # 138 I would not use the different units for the same values. I do understand that you want to emphasize the accuracy but i would write instead 3 +- 0.001 g

      We changed the units as suggested.

      Line # 145, how is the litter basket customized to rest at 1 cm above ground level?

      We have now clarified –that we cut-open windows one centimeter above the cage floor. The cages were positioned on the soil (line 144).

      Lines # 181-183, I like the approach of checking the necessity of having the random effects. However, it has been reported that likelihood ratio test (LRT) are not really reliable to test for random effects. I will suggest you rather use permutations instead. I think the function is confint(MODEL) you need to specify the number of permutation the higher the better but you should start with 99 first and see how the results look like if promising then you can even go to 9999. But it will need computation power and and time.

      Thank you for the suggestion. We now used a simulation-based exact test, instead of a LRT, to examine the random effect, as recommended by the authors from the “lme4” package. As recommended, we used 9999 simulations. The simulation test yielded a similar result to those originally reported (see lines 181-183).

      Line # 187, 188, 188, please do not use capital letter to start mesofauna, macrofauna and whole-community

      We changed the formatting as suggested

      Line # 205 Please add the version number of R in the text.

      We now included the version number as suggested.

      Line # 209-211, could you please check whether "then" is the word you want to use or "than"

      Our bad- we indeed meant “than” and have made the appropriate changes.

      Line # 227 and in other places as well please provide the second degree of freedom of the F test.

      Thank you for this important comment. We have now added the second degree of freedom to the relevant results (lines 229, 232).

      Figure 3 and Figure 4 show some results that are negative, can you please explain what might be the reasons behind this?

      We now explain this important point in the figures’ captions.

      Figure 5 Please add label to the x-axis.

      Thank you-we have now included a label.

      Line # 357, the sentence "... meso-decomposition, like microbial decomposition,...", I don't understand which criteria authors used to classify microbial decomposition as "meso-decomposition"?

      We now remove this potential cause of confusion by using the term ‘meso-decomposition’ to distinguish from microbial decomposition (Line 366).

      Line # 380 Kindly put "per se" in italic.

      We changed the formatting as suggested

      References

      The references format are not consistent. For example for the same journal (say Trends in Ecology and Evolution) the authors sometimes wrote the full name like at line # 36 (and also realize that "vol" should not be written as such) but wrote the abbreviations at line #42

      Our bad- we apologize and carefully checked all references to make sure the style is consistent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) Combined Public Reviews:

      Strengths:

      This work investigates the role of DNAH3 in sperm mobility and male infertility and utilised gold-standard molecular biology techniques, showing strong evidence of its role in male infertility. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.

      We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.

      Weaknesses:

      (1.1) The manuscript lacks a comparison with previous studies on DNAH3 in the Discussion section.

      We thank the reviewers' comments.

      Recently, Meng et al. identified bi-allelic variants in DNAH3 from patients diagnosed with asthenoteratozoospermia, revealing multiple morphological defects and a disrupted "9+2" arrangement in the patients' sperm (https://doi.org/10.1093/hropen/hoae003, PMID: 38312775). Furthermore, they generated Dnah3 KO mice, which were infertile, and exhibited moderate morphological abnormalities with a normally structured “9 + 2” microtubule arrangement. In our study, we also observed similar phenotypic differences between the phenotypes of DNAH3-deficient patients and Dnah3 KO mice. These findings indicate that DNAH3 may play crucial yet distinct roles in human and mouse male reproduction. Additionally, our TEM analysis demonstrated a notable absence of IDAs in sperm from both DNAH3-deficent patients and Dnah3 KO mice, resembling the findings of Meng et al. To further investigate, we conducted immunofluorescent staining and western blotting to assess the levels of IDA-associated proteins (DNAH1, DNAH6 and DNALI1) and ODA-associated proteins (DNAH8, DNAH17 and DNAI1) in sperm samples from both our DNAH3-deficient patients and Dnah3 KO mice. Our data revealed a reduction in IDA-associated protein levels and comparable ODA-associated protein levels in comparison to normal controls and WT mice, respectively, thus corroborating the TEM observations. These results suggest that DNAH3 is involved in sperm flagellar development in human and mice, specifically through its role in the assembly of IDAs.

      Intriguingly, in our study, none of the patients with DNAH3 deficiency reported experiencing any of the principal symptoms associated with PCD. Additionally, our Dnah3 KO mice exhibited normal ciliary development in the lung, brain, eye, and oviduct. Similarly, Meng et al. did not mention any PCD symptoms in their DNAH3-deficient patients, and their Dnah3 KO mice also demonstrated normal ciliary morphology in the trachea and brain. These combined observations suggest that DNAH3 may play a more significant role in sperm flagellar development than in other motile cilia functions. Given that DNAH3 is expressed in ciliary tissues, its role in these tissues remains intriguing and could be elucidated through sequencing of larger cohorts of individuals with PCD.

      We have added these discussions in line 267 to 283, and line 300 to 303.

      (1.2) The variants of DNAH3 in four infertile men were identified through whole-exome sequencing. Providing an overview of the WES data would be beneficial to offer additional insights into whether other variants may contribute the infertility. This could also help explain why ICSI only works for two out of four patients with DNAH3 variants.

      We thank the reviewer's helpful suggestions.

      We have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467). The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed below (Table R1). A summary of WES has been presented in Table S1.

      Author response table 1.

      Quality of whole exome sequencing on infertile men.

      The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.

      Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.

      We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.

      Additionally, we did not identify any pathogenic variants that associated with fertilization failure and early embryonic development in the two patients with failed ICSI outcomes. Therefore, these different ICSI outcomes might be attributed to additional unexplained factors from the female partners.

      (1.3) Quantification of images would help substantiate the conclusions, particularly in Figures 2, 3, 4, and 6. Improved images in Figures 3A, 4B, and 4C, would help increase confidence in the claims made.

      In response to reviewer’s valuable suggestions. We presume that the reviewer means quantification of images in Figure S6, but not Figure 6.

      We have compiled statistics for results shown in Figures 2, 3, 4, and S6. Specifically:

      - The percentages of abnormal flagellar morphology in normal control and patients, associated with the observations in Figure 2A, have been shown in Figure S1A.

      - The percentages of aberrant axonemal ultrastructure in different cross-sections of sperm from in normal control and patients, correspond to the findings in Figure 3A, have been presented in Figure S1B.

      - The percentages of abnormal flagellar morphology in WT mice and Dnah3 KO mice have been shown in Figure S7A.

      - The percentages of aberrant axonemal arrangement in different cross-sections of sperm from WT mice and Dnah3 KO mice, corresponding to the findings in Figure 4B, have been presented in Figure S7C.

      - The percentages of microtubule doublets presenting IDAs in sperm from WT mice and Dnah3 KO mice, related to Figure 4B, have been detailed in Figure S7D.

      - The percentages of malformed mitochondria in the midpiece of sperm from WT mice and Dnah3 KO mice, associated with the observations in Figure 4C, have been presented in Figure S7E.

      Moreover, we have revised Figures 3A, 4B, and 4C by replacing the unclear TEM images.

      (2) Reviewer #1 (Recommendations for The Authors):

      (2.1) Please add reference(s) that support what is claimed in lines 83-84.

      We are very grateful for the reviewer's careful comments, we have added a reference that describing the homology and expression of DNAH3.

      (2.2) In line 286, change "suggested" to "suggest".

      Thanks for the reviewer's comments. We have corrected the grammar.

      (2.3) Please add reference(s) that support what is claimed in lines 359-360.

      According to the reviewer’s suggestions, we have included references detailing the STA-PUT velocity sedimentation for isolation of single human and mouse testicular cells.

      (2.4) In line 365, change "in" to "into".

      Thanks for the reviewer’s careful comments, we have corrected this word.

      (2.5) In Figure 7, I suggest changing "patients" to "wife or partners of patient". Given that the results are indeed from the spouses of the infertile men, I suggest making this small change to keep the consistency and clarity of what the authors did.

      In response to reviewer’s kind suggestions, we have replaced “Patient” by “partners of Patient” and revised Figure 7.

      (3) Reviewer #2 (Recommendations for The Authors):

      (3.1) A summary of the WES data would be needed (i.e. number of reads, mapping quality, etc). As mentioned in the public review, it would be beneficial to present a summary of all variants identified in the data and clarify whether DNAH3 is the only gene that contains variants and whether these variants have been validated.

      Many thanks for reviewer’s kind suggestions.

      The clean reads, sequencing depth, sequencing coverage, and mapping quality of the WES on the patients are listed (see author response table 1) A summary of WES has been presented in Table S1.

      The variants identified through WES were annotated and filtered using Exomiser. Next, the variants were screened to obtain candidate variants based on the following criteria: (1) the allele frequency in the East Asian population was less than 1% in any database, including the ExAC Browser, gnomAD, and the 1000 Genomes Project; (2) the variants affected coding exons or canonical splice sites; (3) the variants were predicted to be possibly pathogenic or damaging.

      Following filtering and screening, the numbers of candidate variants obtained were as follows: Patient 1: 98, Patient 2: 101, Patient 3: 67, and Patient 4: 91(Table S1). Subsequently, we utilized the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/) and Mouse Genome Informatics (MGI) database (https://informatics.jax.org/) to analyze the expression patterns of corresponding genes. Variants whose corresponding genes were not expressed in the human or mouse testis were excluded from further consideration. We also consulted OMIM database and reviewed relevant literature to exclude variants associated with diseases unrelated to male infertility. Additionally, considering the assumption of a recessive inheritance pattern, we excluded all monoallelic variants. Ultimately, only bi-allelic variants in DNAH3 (NG_052617.1, NM_017539.2, NP_060009.1) remained, suggesting as the pathogenic variants responsible for the infertility of the patients (Table S1). These DNAH3 variants were verified by Sanger sequencing on DNA from the patients' families.

      We have added the overview of the WES in Table S1 and supplemented the analysis process of WES data in line 100 to 106, and line 348 to 360.

      (3.2) It would be beneficial to the scientific community if the raw data of WES could be uploaded to a public data repository, such as GEO.

      According to the reviewer's suggestion, we have deposited the raw whole-exome sequencing data in the National Genomics Data Center (NGDC) (https://ngdc.cncb.ac.cn/, accession number: HRA007467) and described its availability in the "Data Availability" section.

      (3.3) In line 115, it is not clear how the prediction was made. Clarifying them by adding citations or describing methods that predict these pathways/functions would help strengthen it.

      Thanks for the reviewer's comments.

      SIFT, PolyPhen-2, MutationTaster and CADD assess the deleteriousness of genetic variants by considering genomic features and evolutionary constraint of the surrounding sequence or structural and chemical property altercations by the amino acid substitutions. We have added websites and references of these tools in the manuscript (line 116 to 118).

      Here are the principles of these tools.

      - The SIFT considers the position at which the change occurred and the type of amino acid change, and then to predict whether an amino acid substitution in a protein will affect protein function [https://sift.bii.a-star.edu.sg/, PMID: 12824425].

      - The PolyPhen-2 predicts the impact of an amino acid substitution on a human protein by considering several features, including sequence, phylogenetic, and structural information [http://genetics.bwh.harvard.edu/pph2/, PMID: 20354512].

      - The MutationTaster utilizes a Bayes classifier to predict the functional consequences of amino acid substitutions, intronic and synonymous changes, short insertions/deletions (indels), etc. [https://www.mutationtaster.org/, PMID: 24681721].

      - The CADD scores are based on diverse genomic features derived from surrounding sequence context, gene model annotations, evolutionary constraint, epigenetic measurements, and functional predictions [https://cadd.gs.washington.edu/, PMID: 30371827].

      (4) Reviewer #3 (Recommendations for The Authors):

      (4.1) Please ensure that all gene names used in your manuscript have been approved by the HUGO nomenclature committee. For example, "c.3590C>T (p.P1197L)" should be described as "c.3590C>T (Pro1197Leu)".

      In response to the reviewer's suggestion, we have improved all the names of gene and variants according to the HUGO nomenclature committee and HGVS Variant Nomenclature Committee, respectively.

      (4.2) For Table 1, the authors should provide the rates of abnormal sperm morphologies using the sperm cells from normal male controls.

      Thanks for the reviewer’s careful comments. Consistent with the WHO laboratory manual (World Health Organization. WHO laboratory manual for the examination and processing of human semen. World Health Organization, 2021.), our routine semen analysis establishes 4% as the minimum rate of sperm with normal morphology but does not define the maximum rate of various tail defects. However, we reviewed the routine semen analysis on the normal controls in our study, and the approximate distribution of sperm with various flagellar in the normal controls was as follows: normal flagella, 78.6%; absent flagella, 1.7%; short flagella, 0.6%; coiled flagella, 12.5%; bent flagella, 7.9%; irregular flagella, 1.8%.

      (4.3) In Table 2, "Mutation Tester" or "Mutation Taster"?

      We thank the reviewer’s comments. It should be "MutationTaster", and we have corrected this mistake in Table 2 and the manuscript.

      (4.4) In Figure 2B, the bars for patient 1 should be aligned. 

      Following the reviewer's valuable suggestion, we have ensured consistent scar bar alignment in Figure 2B and implemented this alignment throughout all other figures.

      (4.5) In Figure 3A, what about the ultrastructure for sperm heads in DNAH3 deficient sperm cell? The authors previously mentioned abnormalities in sperm head morphologies (Figure 2B) in patients with DNAH3 mutations.

      We thank the reviewers for their kind comments. A small fraction of abnormal sperm head of our patients was captured under TEM, manifested by round head with loose chromatin (Author response image 1)

      Author response image 1.

      Ultrastructure of sperm head from DNAH3-deficient infertile men. TEM analysis revealed a fraction of round head with loose chromatin in patients harboring DNAH3 variants. Scale bars, 200 nm.

      (4.6) In Figure S6, the authors should provide the rates of abnormal sperm morphologies for Dnah3 KO male mice.

      In response to the reviewer's valuable suggestion, we have quantified morphological defects in spermatozoa from both Dnah3 KO and WT mice. Compared to about 17% morphological abnormalities in sperm from WT mice, the morphological abnormalities in sperm from Dnah3 KO mice were about 37%. The results are presented in the revised Figure S7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study provides solid evidence that both psychiatric dimensions (e.g. anhedonia, apathy, or depression) and chronotype (i.e., being a morning or evening person) influence effort-based decision-making. Notably, the current study does not elucidate whether there may be interactive effects of chronotype and psychiatric dimensions on decision-making. This work is of importance to researchers and clinicians alike, who may make inferences about behaviour and cognition without taking into account whether the individual may be tested or observed out-of-sync with their phenotype.

      We thank the three reviewers for their comments, and the Editors at eLife. We have taken the opportunity to revise our manuscript considerably from its original form, not least because we feel a number of the reviewers’ suggested analyses strengthen our manuscript considerably (in one instance even clarifying our conclusions, leading us to change our title)—for which we are very appreciative indeed. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affects decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

      We appreciate the Reviewer’s positive view of our research and agree with their assessment of its weaknesses; the study was not designed to assess chronotype-mental health interactions. We hope that our new title and contextualisation makes this clearer. We respond in more detail point-by-point below.

      Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy and anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. One potential concern is that the range of models which were tested was narrow, and other models might have been considered. For example, the Authors might have also tried to fit models with an overall inverse temperature parameter to capture decision noise. One reason for doing so is that some variance in the bias parameter might be attributed to noise, which was not modeled here. Another concern is that the manuscripts discuss effort-based choice as a transdiagnostic feature - and there is evidence in other studies that effort deficits are a transdiagnostic feature of multiple disorders. However, because the present study does not investigate multiple diagnostic categories, it doesn't provide evidence for transdiagnosticity, per se.

      We appreciate Reviewer 2’s assessment of our research and agree generally with its weaknesses. We have now addressed the Reviewer’s comments regarding transdiagnosticity in the discussion of our revised version and have addressed their detailed recommendations below (see point-by-point responses).

      In addition to the below specific changes, in our Discussion section, we now have also added the following (lines 538 – 540):

      “Finally, we would like to note that as our study is based on a general population sample, rather than a clinical one. Hence, we cannot speak to transdiagnosticity on the level of multiple diagnostic categories.”

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to potential chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria. I have some suggestions for improvements.

      Weaknesses:

      (1) The novel findings in this manuscript are those pertaining to transdiagnostic and circadian phenotypes. The authors report two separate but "overlapping" effects: individuals high on anhedonia/apathy are less willing to accept offers in the task, and similarly, individuals tested off their chronotype are less willing to accept offers in the task. The authors claim that the latter has implications for studying the former. In other words, because individuals high on anhedonia/apathy predominantly have a late chronotype (but might be tested early in the day), they might accept less offers, which could spuriously look like a link between anhedonia/apathy and choices but might in fact be an effect of the interaction between chronotype and time-of-testing. The authors therefore argue that chronotype needs to be accounted for when studying links between depression and effort tasks.

      The authors argue that, if X is associated with Y and Z is associated with Y, X and Z might confound each other. That is possible, but not necessarily true. It would need to be tested explicitly by having X (anhedonia/apathy) and Z (chronotype) in the same regression model. Does the effect of anhedonia/apathy on choices disappear when accounting for chronotype (and time-of-testing)? Similarly, when adding the interaction between anhedonia/apathy, chronotype, and time-of-testing, within the subsample of people tested off their chronotype, is there a residual effect of anhedonia/apathy on choices or not?

      If the effect of anhedonia/apathy disappeared (or got weaker) while accounting for chronotype, this result would suggest that chronotype mediates the effect of anhedonia/apathy on effort choices. However, I am not sure it renders the direct effect of anhedonia/apathy on choices entirely spurious. Late chronotype might be a feature (induced by other symptoms) of depression (such as fatigue and insomnia), and the association between anhedonia/apathy and effort choices might be a true and meaningful one. For example, if the effect of anhedonia/apathy on effort choices was mediated by altered connectivity of the dorsal ACC, we would not say that ACC connectivity renders the link between depression and effort choices "spurious", but we would speak of a mechanism that explains this effect. The authors should discuss in a more nuanced way what a significant mediation by the chronotype/time-of-testing congruency means for interpreting effects of depression in computational psychiatry.

      We thank the Reviewer for pointing out this crucial weakness in the original version of our manuscript. We have now thought deeply about this and agree with the Reviewer that our original results did not warrant our interpretation that reported effects of anhedonia and apathy on measures of effort-based decision-making could potentially be spurious. At the Reviewer’s suggestion, we decided to test this explicitly in our revised version—a decision that has now deepened our understanding of our results, and changed our interpretation thereof.  

      To investigate how the effects of neuropsychiatric symptoms and the effects of circadian measures relate to each other, we have followed the Reviewer’s advice and conducted an additional series of analyses (see below). Surprisingly (to us, but perhaps not the Reviewer) we discovered that all three symptom measures (two of anhedonia, one of apathy) have separable effects from circadian measures on the decision to expend effort (note we have also re-named our key parameter ‘motivational tendency’ to address this Reviewer’s next comment that the term ‘choice bias’ was unclear). In model comparisons (based on leave-one-out information criterion which penalises for model complexity) the models including both circadian and psychiatric measures always win against the models including either circadian or psychiatric measures. In essence, this strengthens our claims about the importance of measuring circadian rhythm in effort-based tasks generally, as circadian rhythm clearly plays an important role even when considering neuropsychiatric symptoms, but crucially does not support the idea of spurious effects: statistically, circadian measures contributes separably from neuropsychiatric symptoms to the variance in effort-based decision-making. We think this is very interesting indeed, and certainly clarifies (and corrects the inaccuracy in) our original interpretation—and can only express our thanks to the Reviewer for helping us understand our effect more fully.

      In response to these new insights, we have made numerous edits to our manuscript. First, we changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”. In the remaining manuscript we now refrain from using the word ‘overlapping’ (which could be interpreted as overlapping in explained variance), and instead opted to describe the effects as parallel. We hope our new analyses, title, and clarified/improved interpretations together address the Reviewer’s valid concern about our manuscript’s main weakness.

      We detail these new analyses in the Methods section as follows (lines 800 – 814):

      “4.5.2. Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the results section as follows (lines 356 – 383):

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      We have now edited parts of our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      (2) It seems that all key results relate to the choice bias in the model (as opposed to reward or effort sensitivity). It would therefore be helpful to understand what fundamental process the choice bias is really capturing in this task. This is not discussed, and the direction of effects is not discussed either, but potentially quite important. It seems that the choice bias captures how many effortful reward challenges are accepted overall which maybe captures general motivation or task engagement. Maybe it is then quite expected that this could be linked with questionnaires measuring general motivation/pleasure/task engagement. Formally, the choice bias is the constant term or intercept in the model for p(accept), but the authors never comment on what its sign means. If I'm not mistaken, people with higher anhedonia but also higher apathy are less likely to accept challenges and thus engage in the task (more negative choice bias). I could not find any discussion or even mention of what these results mean. This similarly pertains to the results on chronotype. In general, "choice bias" may not be the most intuitive term and the authors may want to consider renaming it. Also, given the sign of what the choice bias means could be flipped with a simple sign flip in the model equation (i.e., equating to accepting more vs accepting less offers), it would be helpful to show some basic plots to illustrate the identified differences (e.g., plotting the % accepted for people in the upper and lower tertile for the SHAPS score etc).

      We apologise that this was not made clear previously: the meaning and directionality of “choice bias” is indeed central to our results. We also thank the Reviewer for pointing out the previousely-used term “choice bias” itself might not be intuitive. We have now changed this to ‘motivational tendency’ (see below) as well as added substantial details on this parameter to the manuscript, including additional explanations and visualisations of the model as suggested by the Reviewer (new Figure 3) and model-agnostic results to aid interpretation (new Figure S3). Note the latter is complex due to our staircasing procedure (see new figure panel D further detailing our staircasing procedure in Figure 2). This shows that participants with more pronounced anhedonia are less likely to accept offers than those with low anhedonia (Fig. S3A), a model-agnostic version of our central result.

      Our changes are detailed below:

      After careful evaluation we have decided to term the parameter “motivational tendency”, hoping that this will present a more intuitive description of the parameter.

      To aid with the understanding and interpretation of the model parameters, and motivational tendency in particular, we have added the following explanation to the main text:

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Further, we have included a new figure, visualizing the model. This demonstrates how the different model parameters contribute to the model (A), and how different values on each parameter affects the model (B-D).

      We agree that plotting model agnostic effects in our data may help the reader gain intuition of what our task results mean. We hope to address this with our added section on “Model agnostic task measures relating to questionnaires”. We first followed the reviewer’s suggestion of extracting subsamples with higher and low anhedonia (as measured with the SHAPS, highest and lowest quantile) and plotted the acceptance proportion across effort and reward levels (panel A in figure below). However, due to our implemented task design, this only shows part of the picture: the staircasing procedure individualises which effort-reward combination a participant is presented with. Therefore, group differences in choice behaviour will lead to differences in the development of the staircases implemented in our task. Thus, we plotted the count of offered effort-reward combinations for the subsamples of participants with high vs. low SHAPS scores by the end of the task, averaged across staircases and participants.

      As the aspect of task development due to the implemented staircasing may not have been explained sufficiently in the main text, we have included panel (D) in figure 2.

      Further, we have added the following figure reference to the main text (lines 189 – 193):

      “The development of offered effort and reward levels across trials is shown in figure 2D; this shows that as participants generally tend to accept challenges rather than reject them, the implemented staircasing procedure develops toward higher effort and lover reward challenges.”

      To statistically test effects of model-agnostic task measures on the neuropsychiatric questionnaires, we performed Bayesian GLMs with the proportion of accepted trials predicted by SHAPS and AES. This is reported in the text as follows.

      Supplement, lines 172 – 189:

      “To explore the relationship between model agnostic task measures to questionnaire measures of neuropsychiatric symptoms, we conducted Bayesian GLMs, with the proportion of accepted trials predicted by SHAPS scores, controlling for age and gender. The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].<br /> A visualisation of model agnostic task measures relating to symptoms is given in Fig. S4, comparing subgroups of participants scoring in the highest and lowest quartile on the SHAPS. This shows that participants with a high SHAPS score (i.e., more pronounced anhedonia) are less likely to accept offers than those with a low SHAPS score (Fig. S4A). Due to the implemented staircasing procedure, group differences can also be seen in the effort-reward combinations offered per trial. While for both groups, the staircasing procedure seems to devolve towards high effort – low reward offers, this is more pronounced in the subgroup of participants with a lower SHAPS score (Fig S4B).”

      (3) None of the key effects relate to effort or reward sensitivity which is somewhat surprising given the previous literature and also means that it is hard to know if choice bias results would be equally found in tasks without any effort component. (The only analysis related to effort sensitivity is exploratory and in a subsample of N=56 per group looking at people meeting criteria for MDD vs matched controls.) Were stimuli constructed such that effort and reward sensitivity could be separated (i.e., are uncorrelated/orthogonal)? Maybe it would be worth looking at the % accepted in the largest or two largest effort value bins in an exploratory analysis. It seems the lowest and 2nd lowest effort level generally lead to accepting the challenge pretty much all the time, so including those effort levels might not be sensitive to individual difference analyses?

      We too were initially surprised by the lack of effect of neuropsychiatric symptoms on reward and effort sensitivity. To address the Reviewer’s first comment, the nature of the ‘choice bias’ parameter (now motivational tendency) is its critical importance in the context of effort-based decision-making: it is not modelled or measured explicitly in tasks without effort (such as typical reward tasks), so it would be impossible to test this in tasks without an effort component. 

      For the Reviewer’s second comment, the exploratory MDD analysis is not our only one related to effort sensitivity: the effort sensitivity parameter is included in all of our central analyses, and (like reward sensitivity), does not relate to our measured neuropsychiatric symptoms (e.g., see page 15). Note most previous effort tasks do not include a ‘choice bias’/motivational tendency parameter, potentially explaining this discrepancy. However, our model was quantitatively superior to models without this parameter, for example with only effort- and reward-sensitivity (page 11, Fig. 3).

      Our three model parameters (reward sensitivity, effort sensitivity, and choice bias/motivational tendency) were indeed uncorrelated/orthogonal to one another (see parameter orthogonality analyses below), making it unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity. As per the Reviewer’s suggestion, we also examined whether the lowest two effort levels might not be sensitive to individual differences; in fact, we found out proportion of accepted trials on the lowest effort levels alone was nevertheless predicted by anhedonia (see ceiling effect analyses below).

      Specifically, in terms of parameter orthogonality:

      When developing our task design and computational modelling approach we were careful to ensure that meaningful neurocomputational parameters could be estimated and that no spurious correlations between parameters would be introduced by modelling. By conducting parameter recoveries for all models, we showed that our modelling approach could reliably estimate parameters, and that estimated parameters are orthogonal to the other underlying parameters (as can be seen in Figure S1 in the supplement). It is thus unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity.

      And finally, regarding the possibility of a ceiling effect for low effort levels:

      We agree that visual inspection of the proportion of accepted results across effort and reward values can lead to the belief that a ceiling effect prevents the two lowest effort levels from capturing any inter-individual differences. To test whether this is the case, we ran a Bayesian GLM with the SHAPS sum score predicting the proportion of accepted trials (controlling for age and gender), in a subset of the data including only trials with an effort level of 1 or 2. We found the SHAPS has a predictive value for the proportion of accepted trials in the lowest two effort levels: M=-0.05; 95%HDI=[-0.07,-0.02]). This is noted in the text as follows.

      Supplement, lines 175 – 180:

      “The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].”

      (4) The abstract and discussion seem overstated (implications for the school system and statements on circadian rhythms which were not measured here). They should be toned down to reflect conclusions supported by the data.

      We thank the Reviewer for pointing this out, and have now removed these claims from the abstract and Discussion; we hope they now better reflect conclusions supported by these data directly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Suggestions for improved or additional experiments, data or analyses.

      - For a non-computational audience, it would be useful to unpack the influence of the choice bias on behavior, as it is less clear how this would affect decision-making than sensitivity to effort or reward. Perhaps a figure showing accept/reject decisions when sensitivities are held and choice bias is high would be beneficial.

      We thank the Reviewer for suggesting additional explanations of the choice bias parameter to aid interpretation for non-computational readers; as per the Reviewer’s suggestion, we have now included additional explanations and visualisations (Figure 3) to make this as clear as possible. Please note also that, in response to one of the other Reviewers and after careful considerations, we have decided to rename the “choice bias” parameter to “motivational tendency”, hoping this will prove more intuitive.

      To aid with the understanding and interpretation of this and the other model parameters, we have added the following explanation to the main text.

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Additionally, we add the following explanation to the Methods section.

      Lines 698 – 709:

      First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and ℛ and 𝐸 for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and 𝛼 for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).

      Our new figure (panels A-D in figure 3) visualizes the model. This demonstrates how the different model parameters come at play in the model (A), and how different values on each parameter affects the model (B-D).

      - The early and late chronotype groups have significant differences in ages and gender. Additional supplementary analysis here may mitigate any concerns from readers.

      The Reviewer is right to notice that our subsamples of early and late chronotypes differ significantly in age and gender, but it important to note that all our analyses comparing these two groups take this into account, statistically controlling for age and gender. We regret that this was previously only mentioned in the Methods section, so this information was not accessible where most relevant. To remedy this, we have amended the Results section as follows.

      Lines 317 – 323:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46]) and motivational tendency (higher in early chronotypes; M=-0.248, 95% HDI=[-0.37,-0.11]), as well as an interaction between chronotype and time-of-day on motivational tendency (M=0.309, 95% HDI=[0.15,0.48]).”

      (2) Recommendations for improving the writing and presentation.

      - I found the term 'overlapping' a little jarring. I think the authors use it to mean both neuropsychiatric symptoms and chronotypes affect task parameters, but they are are not tested to be 'separable', nor is an interaction tested. Perhaps being upfront about how interactions are not being tested here (in the introduction, and not waiting until the discussion) would give an opportunity to operationalize this term.

      We agree with the Reviewer that our previously-used term “overlapping” was not ideal: it may have been misleading, and was not necessarily reflective of the nature of our findings. We now state explicitly that we are not testing an interaction between neuropsychiatric symptoms and chronotypes in our primary analyses. Additionally, following suggestions made by Reviewer 3, we ran new exploratory analyses to investigate how the effects of neuropsychiatric symptoms and circadian measures on motivational tendency relate to one another. These results in fact show that all three symptom measures have separable effects from circadian measures on motivational tendency. This supports the Reviewer’s view that ‘overlapping’ was entirely the wrong word—although it nevertheless shows the important contribution of circadian rhythm as well as neuropsychiatric symptoms in effort-based decision-making. We have changed the manuscript throughout to better describe this important, more accurate interpretation of our findings, including replacing the term “overlapping”. We changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”.

      To clarify the intention of our primary analyses, we have added the following to the last paragraph of the introduction.

      Lines 107 – 112:

      “Next, we pre-registered a follow-up experiment to directly investigate how circadian preference interacts with time-of-day on motivational decision-making, using the same task and computational modelling approach. While this allows us to test how circadian effects on motivational decision-making compare to neuropsychiatric effects, we do not test for possible interactions between neuropsychiatric symptoms and chronobiology.”

      We detail our new analyses in the Methods section as follows.

      Lines 800 – 814:

      “4.5.2 Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the Results section as follows.

      Lines 356 – 383:

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      In addition to the title change, we edited our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      - A minor point, but it could be made clearer that many neurotransmitters have circadian rhythms (and not just dopamine).

      We agree this should have been made clearer, and have added the following to the Introduction.

      Lines 83 – 84:

      “Bi-directional links between chronobiology and several neurotransmitter systems have been reported, including dopamine47.

      (47) Kiehn, J.-T., Faltraco, F., Palm, D., Thome, J. & Oster, H. Circadian Clocks in the Regulation of Neurotransmitter Systems. Pharmacopsychiatry 56, 108–117 (2023).”

      - Making reference to other studies which have explored circadian rhythms in cognitive tasks would allow interested readers to explore the broader field. One such paper is: Bedder, R. L., Vaghi, M. M., Dolan, R. J., & Rutledge, R. B. (2023). Risk taking for potential losses but not gains increases with time of day. Scientific reports, 13(1), 5534, which also includes references to other similar studies in the discussion.

      We thank the Reviewer for pointing out that we failed to cite this relevant work. We have now included it in the Introduction as follows.

      Lines 97 – 98:

      “A circadian effect on decision-making under risk is reported, with the sensitivity to losses decreasing with time-of-day66.

      (66) Bedder, R. L., Vaghi, M. M., Dolan, R. J. & Rutledge, R. B. Risk taking for potential losses but not gains increases with time of day. Sci Rep 13, 5534 (2023).”

      (3) Minor corrections to the text and figures.

      None, clearly written and structured. Figures are high quality and significantly aid understanding.

      Reviewer #2 (Recommendations For The Authors):

      I did have a few more minor comments:

      - The manuscript doesn't clarify whether trials had time limits - so that participants might fail to earn points - or instead they did not and participants had to continue exerting effort until they were done. This is important to know since it impacts on decision-strategies and behavioral outcomes that might be analyzed. For example, if there is no time limit, it might be useful to examine the amount of time it took participants to complete their effort - and whether that had any relationship to choice patterns or symptomatology. Or, if they did, it might be interesting to test whether the relationship between choices and exerted effort depended on symptoms. For example, someone with depression might be less willing to choose effort, but just as, if not more likely to successfully complete a trial once it is selected.

      We thank the Reviewer for pointing out this important detail in the task design, which we should have made clearer. The trials did indeed have a time limit which was dependent on the effort level. To clarify this in the manuscript, we have made changes to Figure 2 and the Methods section. We agree it would be interesting to explore whether the exerted effort in the task related to symptoms. We explored this in our data by predicting the participant average proportion of accepted but failed trials by SHAPS score (controlling for age and gender). We found no relationship: M=0.01, 95% HDI=[-0.001,0.02]. However, it should be noted that the measure of proportion of failed trials may not be suitable here, as there are only few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting. As an alternative measure, we explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”. We have now added this to the manuscript.

      In Figure 2, which explains the task design in the results section, we have added the following to the figure description.

      Lines 161 – 165:

      “Each trial consists of an offer with a reward (2,3,4, or 5 points) and an effort level (1,2,3, or 4, scaled to the required clicking speed and time the clicking must be sustained for) that subjects accept or reject. If accepted, a challenge at the respective effort level must be fulfilled for the required time to win the points.”

      In the Methods section, we have added the following.

      Lines 617 – 622:

      “We used four effort-levels, corresponding to a clicking speed at 30% of a participant’s maximal capacity for 8 seconds (level 1), 50% for 11 seconds (level 2), 70% for 14 seconds (level 3), and 90% for 17 seconds (level 4). Therefore, in each trial, participants had to fulfil a certain number of mouse clicks (dependent on their capacity and the effort level) in a specific time (dependent on the effort level).”

      In the Supplement, we have added the additional analyses suggested by the Reviewer.

      Lines 195 – 213:

      “3.2 Proportion of accepted but failed trials

      For each participant, we computed the proportion of trial in which an offer was accepted, but the required effort then not fulfilled (i.e., failed trials). There was no relationship between average proportion of accepted but failed trials and SHAPS score (controlling for age and gender): M=0.01, 95% HDI=[-0.001,0.02]. However, there are intentionally few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting.”

      “3.3 Exertion of “extra effort”

      We also explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”.”

      - Perhaps relatedly, there is evidence that people with depression show less of an optimism bias in their predictions about future outcomes. As such, they show more "rational" choices in probabilistic decision tasks. I'm curious whether the Authors think that a weaker choice bias among those with stronger depression/anhedonia/apathy might be related. Also, are choices better matched with actual effort production among those with depression?

      We think this is a very interesting comment, but unfortunately feel our manuscript cannot properly speak to it: as in our response to the previous comment, our exploratory analysis linking the proportion of accepted but failed trials to anhedonia symptoms (i.e. less anhedonic people making more optimistic judgments of their likelihood of success) did not show a relationship between the two. However, this null finding may be the result of our task design which is not laid out to capture such an effect (in fact to minimize trials of this nature). We have added to the Discussion section.

      Lines 442 – 445:

      “It is possible that a higher motivational tendency reflects a more optimistic assessment of future task success, in line with work on the optimism bias95; however our task intentionally minimized unsuccessful trials by titrating effort and reward; future studies should explore this more directly.

      (95) Korn, C. W., Sharot, T., Walter, H., Heekeren, H. R. & Dolan, R. J. Depression is related to an absence of optimistically biased belief updating about future life events. Psychological Medicine 44, 579–592 (2014).”

      - The manuscript does not clarify: How did the Authors ensure that each subject received each effort-reward combination at least once if a given subject always accepted or always rejected offers?

      We have made the following edit to the Methods section to better explain this aspect of our task design.

      Lines 642 – 655:

      “For each subject, trial-by-trial presentation of effort-reward combinations were made semi-adaptively by 16 randomly interleaved staircases. Each of the 16 possible offers (4 effort-levels x 4 reward-levels) served as the starting point of one of the 16 staircase. Within each staircase, after a subject accepted a challenge, the next trial’s offer on that staircase was adjusted (by increasing effort or decreasing reward). After a subject rejected a challenge, the next offer on that staircase was adjusted by decreasing effort or increasing reward. This ensured subjects received each effort-reward combination at least once (as each participant completed all 16 staircases), while individualizing trial presentation to maximize the trials’ informative value. Therefore, in practice, even in the case of a subject rejecing all offers (and hence the staircasing procedures always adapting by decreasing effort or increasing reward), the full range of effort-reward combinations will be represented in the task across the startingpoints of all staircases (and therefore before adaption takeplace).”

      - The word "metabolic" is misspelled in Table 1

      - Figure 2 is missing panel label "C"

      - The word "effort" is repeated on line 448.

      We thank the Reviewer for their attentive reading of our manuscript and have corrected the mistakes mentioned.

      Reviewer #3 (Recommendations For The Authors):

      It is a bit difficult to get a sense of people's discounting from the plots provided. Could the authors show a few example individuals and their fits (i.e., how steep was effort discounting on average and how much variance was there across individuals; maybe they could show the mean discount function or some examples etc)

      We appreciate very much the Reviewer's suggestion to visualise our parameter estimates within and across individuals. We have implemented this in Figure .S2

      It would be helpful if correlations between the various markers used as dependent variables (SHAPS, DARS, AES, chronotype etc) could plotted as part of each related figure (e.g., next to the relevant effects shown).

      We agree with the Reviewer that a visual representation of the various correlations between dependent variables would be a better and more assessable communication than our current paragraph listing the correlations. We have implemented this by adding a new figure plotting all correlations in a heat map, with asterisks indicating significance.

      The authors use the term "meaningful relationship" - how is this defined? If undefined, maybe consider changing (do they mean significant?)

      We understand how our use of the term “(no) meaningful relationship” was confusing here. As we conducted most analyses in a Bayesian fashion, this is a formal definition of ‘meaningful’: the 95% highest density interval does not span across 0. However, we do not want this to be misunderstood as frequentist “significance” and agree clarity can be improved here, To avoid confusion, we have amended the manuscript where relevant (i.e., we now state “we found a (/no) relationship / effect” rather than “we found a meaningful relationship”.

      The authors do not include an inverse temperature parameter in their discounting models-can they motivate why? If a participant chose nearly randomly, which set of parameter values would they get assigned?

      Our decision to not include an inverse temperature parameter was made after an extensive simulation-based investigation of different models and task designs. A series of parameter recovery studies including models with an inverse temperature parameter revealed the inverse temperature parameter could not be distinguished from the reward sensitivity parameter. Specifically, inverse temperature seemed to capture the variance of the true underlying reward sensitivity parameter, leading to confounding between the two. Hence, including both reward sensitivity and inverse temperature would not have allowed us to reliably estimate either parameter. As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space. We have now added these simulations to our supplement.

      Nevertheless, we believe our models can capture random decision-making. The parameters of effort and reward sensitivity capture how sensitive one is to changes in effort/reward level. Hence, random decision-making can be interpreted as low effort and reward sensitivity, such that one’s decision-making is not guided by changes in effort and reward magnitude. With low effort/reward sensitivity, the motivational tendency parameter (previously “choice bias”) would capture to what extend this random decision-making is biased toward accepting or rejecting offers.

      The simulation results are now detailed in the Supplement.

      Lines 25 – 46:

      “1.2.1 Parameter recoveries including inverse temperature

      In the process of task and model space development, we also considered models incorportating an inverse temperature paramater. To this end, we conducted parameter recoveries for four models, defined in Table S3.

      Parameter recoveries indicated that, parameters can be recovered reliably in model 1, which includes only effort sensitivity ( ) and inverse temperature as free parameters (on-diagonal correlations: .98 > r > .89, off-diagonal correlations: .04 > |r| > .004). However, as a reward sensitivity parameter is added to the model (model 2), parameter recovery seems to be compromised, as parameters are estimated less accurately (on-diagonal correlations: .80 > r > .68), and spurious correlations between parameters emerge (off-diagonal correlations: .40 > |r| > .17). This issue remains when motivational tendency is added to the model (model 4; on-diagonal correlations: .90 > r > .65; off-diagonal correlations: .28 > |r| > .03), but not when inverse temperature is modelled with effort sensitivity and motivational tendency, but not reward sensitivity (model 3; on-diagonal correlations: .96 > r > .73; off-diagonal correlations: .05 > |r| > .003).

      As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space.”

      And we now discuss random decision-making specifically in the Methods section.

      Lines 698 – 709:

      “First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and  and  for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and  for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).”

      The pre-registration mentions effects of BMI and risk of metabolic disease-those are briefly reported the in factor loadings, but not discussed afterwards-although the authors stated hypotheses regarding these measures in their preregistration. Were those hypotheses supported?

      We reported these results (albeit only briefly) in the factor loadings resulting from our PLS regression and results from follow-up GLMs (see below). We have now amended the Discussion to enable further elaboration on whether they confirmed our hypotheses (this evidence was unclear, but we have subsequently followed up in a sample with type-2 diabetes, who also show reduced motivational tendency).

      Lines 258 – 261:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no relationship with motivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression.”

      We have added the following paragraph to our discussion.

      Lines 491 – 502:

      “To our surprise, we did not find statistical evidence for a relationship between effort-based decision-making and measures of metabolic health (BMI and risk for type-2 diabetes). Our analyses linking BMI to motivational tendency reveal a numeric effect in line with our hypothesis: a higher BMI relating to a lower motivational tendency. However, the 95% HDI for this effect narrowly included zero (95%HDI=[-0.19,0.01]). Possibly, our sample did not have sufficient variance in metabolic health to detect dimensional metabolic effects in a current general population sample. A recent study by our group investigates the same neurocomputational parameters of effort-based decision-making in participants with type-2 diabetes and non-diabetic controls matched by age, gender, and physical activity105. We report a group effect on the motivational tendency parameter, with type-2 diabetic patients showing a lower tendency to exert effort for reward.”

      “(105) Mehrhof, S. Z., Fleming, H. A. & Nord, C. A cognitive signature of metabolic health in effort-based decision-making. Preprint at https://doi.org/10.31234/osf.io/4bkm9 (2024).”

      R-values are indicated as a range (e.g., from 0.07-0.72 for the last one in 2.1 which is a large range). As mentioned above, the full correlation matrix should be reported in figures as heatmaps.

      We agree with the Reviewer that a heatmap is a better way of conveying this information – see Figure 1 in response to their previous comment.  

      The answer on whether data was already collected is missing on the second preregistration link. Maybe this is worth commenting on somewhere in the manuscript.

      This question appears missing because, as detailed in the manuscript, we felt that technically some data *was* already collected by the time our second pre-registration was posted. This is because the second pre-registration detailed an additional data collection, with the goal of extending data from the original dataset to include extreme chronotypes and increase precision of analyses. To avoid any confusion regarding the lack of reply to this question in the pre-registration, we have added the following disclaimer to the description of the second pre-registration:

      “Please note the lack of response to the question regarding already collected data. This is because the data collection in the current pre-registration extends data from the original dataset to increase the precision of analyses. While this original data is already collected, none of the data collection described here has taken place.”

      Some referencing is not reflective of the current state of the field (e.g., for effort discounting: Sugiwaka et al., 2004 is cited). There are multiple labs that have published on this since then including Philippe Tobler's and Sven Bestmann's groups (e.g., Hartmann et al., 2013; Klein-Flügge et al., Plos CB, 2015).

      We agree absolutely, and have added additional, more recent references on effort discounting.

      Lines 67 – 68:

      “Higher costs devalue associated rewards, an effect referred to as effort-discounting33–37.”

      (33) Sugiwaka, H. & Okouchi, H. Reformative self-control and discounting of reward value by delay or effort1. Japanese Psychological Research 46, 1–9 (2004).

      (34) Hartmann, M. N., Hager, O. M., Tobler, P. N. & Kaiser, S. Parabolic discounting of monetary rewards by physical effort. Behavioural Processes 100, 192–196 (2013).

      (35) Klein-Flügge, M. C., Kennerley, S. W., Saraiva, A. C., Penny, W. D. & Bestmann, S. Behavioral Modeling of Human Choices Reveals Dissociable Effects of Physical Effort and Temporal Delay on Reward Devaluation. PLOS Computational Biology 11, e1004116 (2015).

      (36) Białaszek, W., Marcowski, P. & Ostaszewski, P. Physical and cognitive effort discounting across different reward magnitudes: Tests of discounting models. PLOS ONE 12, e0182353 (2017).

      (37) Ostaszewski, P., Bąbel, P. & Swebodziński, B. Physical and cognitive effort discounting of hypothetical monetary rewards. Japanese Psychological Research 55, 329–337 (2013).

      There are lots of typos throughout (e.g., Supplementary martial, Mornignness etc)

      We thank the Reviewer for their attentive reading of our manuscript and have corrected our mistakes.

      In Table 1, it is not clear what the numbers given in parentheses are. The figure note mentions SD, IQR, and those are explicitly specified for some rows, but not all.

      After reviewing Table 1 we understand the comment regarding the clarity of the number in parentheses. In our original manuscript, for some variables, numbers were given per category (e.g. for gender and ethnicity), rather than per row, in which case the parenthetical statistic was indicated in the header row only. However, we now see that the clarity of the table would have been improved by adding the reported statistic for each row—we have corrected this.

      In Figure 1C, it would be much more helpful if the different panels were combined into one single panel (using differently coloured dots/lines instead of bars).

      We agree visualizing the proportion of accepted trials across effort and reward levels in one single panel aids interpretability. We have implemented it in the following plot (now Figure 2C).

      In Sections 2.2.1 and 4.2.1, the authors mention "mixed-effects analysis of variance (ANOVA) of repeated measures" (same in the preregistration). It is not clear if this is a standard RM-ANOVA (aggregating data per participant per condition) or a mixed-effects model (analysing data on a trial-by-trial level). This model seems to only include within-subjects variable, so it isn't a "mixed ANOVA" mixing within and between subjects effects.

      We apologise that our use of the term "mixed-effects analysis of variance (ANOVA) of repeated measures" is indeed incorrectly applied here. We aggregate data per participant and effort-by-reward combination, meaning there are no between-subject effects tested. We have corrected this to “repeated measures ANOVA”.

      In Section 2.2.2, the authors write "R-hats>1.002" but probably mean "R-hats < 1.002". ESS is hard to evaluate unless the total number of samples is given.

      We thank the Reviewer for noticing this mistake and have corrected it in the manuscript.

      In Section 2.3, the inference criterion is unclear. The authors first report "factor loadings" and then perform a permutation test that is not further explained. Which of these factors are actually needed for predicting choice bias out of chance? The permutation test suggests that the null hypothesis is just "none of these measures contributes anything to predicting choice bias", which is already falsified if only one of them shows an association with choice bias. It would be relevant to know for which measures this is the case. Specifically, it would be relevant to know whether adding circadian measures into a model that already contains apathy/anhedonia improves predictive performance.

      We understand the Reviewer’s concerns regarding the detail of explanation we have provided for this part of our analysis, but we believe there may have been a misunderstanding regarding the partial least squares (PLS) regression. Rather than identifying a number of factors to predict the outcome variable, a PLS regression identifies a model with one or multiple components, with various factor loadings of differing magnitude. In our case, the PLS regression identified a model with one component to best predict our outcome variable (motivational tendency, which in our previous various we called choice bias). This one component had factor loadings of our questionnaire-based measures, with measures of apathy and anhedonia having highest weights, followed by lesser weighted factor loadings by measures of circadian rhythm and metabolic health. The permutation test tests whether this component (consisting of the combination of factor loadings) can predict the outcome variable out of sample.

      We hope we have improved clarity on this in the manuscript by making the following edits to the Results section.

      Lines 248 – 251:

      “Permutation testing indicated the predictive value of the resulting component (with factor loadings described above) was significant out-of-sample (root-mean-squared error [RMSE]=0.203, p=.001).”

      Further, we hope to provide a more in-depth explanation of these results in the Methods section.

      Lines 755 – 759:

      “Statistical significance of obtained effects (i.e., the predictive accuracy of the identified component and factor loadings) was assessed by permutation tests, probing the proportion of root-mean-squared errors (RMSEs) indicating stronger or equally strong predictive accuracy under the null hypothesis.”

      In Section 2.5, the authors simply report "that chronotype showed effects of chronotype on reward sensitivity", but the direction of the effect (higher reward sensitivity in early vs. late chronotype) remains unclear.

      We thank the Reviewer for pointing this out. While we did report the direction of effect, this was only presented in the subsequent parentheticals and could have been made much clearer. To assist with this, we have made the following addition to the text.

      Lines 317 – 320:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46])”

      In Section 4.2, the authors write that they "implemented a previously-described procedure using Prolific pre-screeners", but no reference to this previous description is given.

      We thank the Reviewer for bringing our attention to this missing reference, which has now been added to the manuscript.

      In Supplementary Table S2, only the "on-diagonal correlations" are given, but off-diagonal correlations (indicative of trade-offs between parameters) would also be informative.

      We agree with the Reviewer that off-diagonal correlations between underlying and recovered parameters are crucial to assess confounding between parameters during model estimation. We reported this in figure S1D, where we present the full correlation matric between underlying and recovered parameters in a heatmap. We have now noticed that this plot was missing axis labels, which have been added now.

      I found it somewhat difficult to follow the results section without having read the methods section beforehand. At the beginning of the Results section, could the authors briefly sketch the outline of their study? Also, given they have a pre-registration, could the authors introduce each section with a statement of what they expected to find, and close with whether the data confirmed their expectations? In the current version of the manuscript, many results are presented without much context of what they mean.

      We agree a brief outline of the study procedure before reporting the results would be beneficial to following the subsequently text and have added the following to the end of our Introduction.

      Lines 101 – 106:

      “Here, we tested the relationship between motivational decision-making and three key neuropsychiatric syndromes: anhedonia, apathy, and depression, taking both a transdiagnostic and categorical (diagnostic) approach. To do this, we validate a newly developed effort-expenditure task, designed for online testing, and gamified to increase engagement. Participants completed the effort-expenditure task online, followed by a series of self-report questionnaires.”

      We have added references to our pre-registered hypotheses at multiple points in our manuscript.

      Lines 185 – 187:

      “In line with our pre-registered hypotheses, we found significant main effects for effort (F(1,14367)=4961.07, p<.0001) and reward (F(1,14367)=3037.91, p<.001), and a significant interaction between the two (F(1,14367)=1703.24, p<.001).”

      Lines 215 – 221:

      “Model comparison by out-of-sample predictive accuracy identified the model implementing three parameters (motivational tendency a, reward sensitivity , and effort sensitivity ), with a parabolic cost function (subsequently referred to as the full parabolic model) as the winning model (leave-one-out information criterion [LOOIC; lower is better] = 29734.8; expected log posterior density [ELPD; higher is better] = -14867.4; Fig. 31ED). This was in line with our pre-registered hypotheses.”

      Lines 252 – 258:

      “Bayesian GLMs confirmed evidence for psychiatric questionnaire measures predicting motivational tendency (SHAPS: M=-0.109; 95% highest density interval (HDI)=[-0.17,-0.04]; AES: M=-0.096; 95%HDI=[-0.15,-0.03]; DARS: M=-0.061; 95%HDI=[-0.13,-0.01]; Fig. 4A). Post-hoc GLMs on DARS sub-scales showed an effect for the sensory subscale (M=-0.050; 95%HDI=[-0.10,-0.01]). This result of neuropsychiatric symptoms predicting a lower motivational tendency is in line with our pre-registered hypothesis.”

      Lines 258 – 263:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no meaningful relationship with choice biasmotivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression. This null finding for dimensional measures of circadian rhythm and metabolic health was not in line with our pre-registered hypotheses.”

      Lines 268 – 270:

      “For reward sensitivity, the intercept-only model outperformed models incorporating questionnaire predictors based on RMSE. This result was not in line with our pre-registered expectations.”

      Lines 295 – 298:

      “As in our transdiagnostic analyses of continuous neuropsychiatric measures (Results 2.3), we found evidence for a lower motivational tendency parameter in the MDD group compared to HCs (M=-0.111, 95% HDI=[ -0.20,-0.03]) (Fig. 4B). This result confirmed our pre-registered hypothesis.”

      Lines 344 – 355:

      “Late chronotypes showed a lower motivational tendency than early chronotypes (M=-0.11, 95% HDI=[-0.22,-0.02])—comparable to effects of transdiagnostic measures of apathy and anhedonia, as well as diagnostic criteria for depression. Crucially, we found motivational tendency was modulated by an interaction between chronotype and time-of-day (M=0.19, 95% HDI=[0.05,0.33]): post-hoc GLMs in each chronotype group showed this was driven by a time-of-day effect within late, rather than early, chronotype participants (M=0.12, 95% HDI=[0.02,0.22], such that late chronotype participants showed a lower motivational tendency in the morning testing sessions, and a higher motivational tendency in the evening testing sessions; early chronotype: 95% HDI=[-0.16,0.04]) (Fig. 5A). These results of a main effect and an interaction effect of chronotype on motivational tendency confirmed our pre-registered hypothesis.”

      Lines 390 – 393:

      “Participants with an early chronotype had a lower reward sensitivity parameter than those with a late chronotype (M=0.27, 95% HDI=[0.16,0.38]). We found no effect of time-of-day on reward sensitivity (95%HDI=[-0.09,0.11]) (Fig. 5B). These results were in line with our pre-registered hypotheses.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths: 

      Overall the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we have revised the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We have included the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      - 3xTg mice show early Aß accumulation in VIP-positive interneurons.

      - 3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      - 3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      - 3xTg mice show increased O/A interneuron activity during specific behavioral conditions. - 3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Limitations:

      (1) The authors should describe their model and state the age at which these mice start depositing amyloid plaques and neurofibrillary tangles. Readers might not be familiar with this model. It is also important to mention that circuit disruptions are assessed prior to plaque and tangle formation.

      We have included a detailed description of the 3xTg-AD mouse model in the Introduction section, including information on the age at which amyloid plaques and neurofibrillary tangles begin to appear. Additionally, we have clarified that circuit disruptions were assessed before the formation of plaques and tangles. These details have been added to both the Introduction and the Results sections to ensure clarity for readers unfamiliar with the model.

      (2) Ns are presented in Supplemental Table 1. Units are presented in a note to Supplementary Table 1. It would be advisable to specify Ns and units as the data is being presented in the results section or figure legends for easy access.

      We have now included the Ns (sample sizes), specifying the number of cells or sections and the number of experimental animals, directly within the Results section and in the figure legends. This ensures that readers have immediate access to this information without needing to refer to the supplementary materials.

      (3) Several typos require correction:

      a. "mamory" - Line 22, page 5.

      b. The term "Interneurons" is abbreviated as both "INs" and "IN" throughout the manuscript. The author should consistently choose one abbreviation.

      We have corrected the typo "mamory" to "memory" on line 22, page 5. Additionally, we have standardized the abbreviation for "Interneurons" to "INs" throughout the manuscript for consistency.

      (4) Note 2 in Supplementary Table 1 states that animals of both sexes with equal distribution were used throughout the study. It would be best for the reader to assess the data distribution based on sex. Thus, it is advisable for the authors to depict male and female data points as distinct symbols throughout the figures.

      Unfortunately, we do not have detailed sex-disaggregated data for all datasets, which limits our ability to depict male and female data points separately across all figures. Therefore, we have opted to pool data from both sexes for a more comprehensive analysis. We believe this approach maintains the robustness of our findings.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      - To keep the logical line of reasoning and to be able to interpret the results, it would be important to use the same metrics when comparing the population activity of O/A interneurons and principal cells in the different behavioral conditions.

      We have revised Figures 4 and 5 to enhance the coherence in data presentation. This includes using consistent metrics for comparing the population activity of both O/A interneurons and principal cells across different behavioral conditions. These changes ensure a clearer and more logical interpretation of the results.

      - Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality. Would it be possible to test if manipulating VIP neurons one could obtain such specific results? Alternatively, it could be discussed more in detail how the decrease in disinhibition could lead to the changes in network activity demonstrated here.

      We agree with the reviewer that establishing causality between VIP neuron deficits and changes in network activity would be very important. However, demonstrating causality would require a new line of investigation, involving the use of specific mouse models to selectively manipulate VIP neurons. This is an exciting direction that we plan to prioritize in our future research. For this study, we have included a discussion on the potential mechanisms by which decreased disinhibition might lead to the observed changes in network activity. Specifically, we propose that in young adult 3xTg-AD mice, the altered firing of I-S3 cells may lead to enhanced inhibition of principal cells. This could shift the excitation/inhibition balance, input integration and firing output of principal cells thereby impacting overall network activity. These points are discussed in detail in the revised Discussion section.

      - On the same lines the correlations showed in the manuscript, would be more robust if there was an in vivo demonstration that 3xTg mice indeed show decreased activity in vivo. The same experiments could also clarify if VIP cells in control animals are more active at the time of decision-making and during object exploration as suggested in the manuscript.

      Thank you for your comment. In response to the point raised, we would like to highlight that we have recently documented the increased activity of VIP-INs in the D-zone of the T-maze and during object exploration in a study published in Cell Reports (Tamboli et al., 2024). This publication is now referenced in our manuscript to support our findings. Regarding the in vivo activity of 3xTg mice, our observations indicated no significant differences in major behavioral patterns such as locomotion, rearing, and exploration of the T-maze when comparing Tg and non-Tg mice. These findings are presented in detail in Figure 4c and Supplementary Fig. 5. We believe these data support the robustness of our correlations by demonstrating that the overall behavioral activity of 3xTg mice is comparable to that of non-transgenic controls, thus focusing attention on the specific roles of VIP-INs in early prodromal state of AD pathology.

      Minor Points:

      - Figure 1c: Heading of VIP-Tg should have capital letters.

      Thank you for pointing that out. We have corrected the heading to "VIP-Tg" with capital letters in Figure 1c.

      - Figure 1d: The finding that no change was observed in the percentage of VIP+/CR+ is based on three animals and 3-4 slices per mouse. However, the result of VIP+CR+ in tg-mice has an outlier that might bias the results. I would suggest increasing the number of animals to confirm these results.

      Thank you for your insightful suggestion. We addressed the potential impact of the outlier in the VIP+/CR+ cell density analysis by recalculating the results after removing the outlier using the interquartile range method. This reanalysis revealed a statistically significant difference in the VIP+/CR+ cell density between non-Tg and Tg mice, which we have now detailed in the Results section. Despite this, we have chosen to retain the outlier in our final presentation to accurately represent the biological variability observed in our sample. We agree that increasing the number of animals would further validate these findings and will consider this in future studies.

      - Figure 3d: Would it be possible to identify the recorded interneurons? Is it expected that most of those are OLM cells?

      Thank you for your question. We were unable to fully recover all recorded cells using biocytin staining. However, for those cells with preserved axonal structures, we identified both OLM and bistratified cells, which are the primary targets of I-S3 cells. We have now included this information in the Results section to clarify the types of interneurons identified.

      - Figure 3: Why quantify VGat terminals instead of quantification of VIP-GFP terminals? Combined with the Calretinine labeling it would be more useful to indicate that no changes were observed at the morphological bouton level specifically in disinhibitory interneurons. Please also describe which imageJ plugin was used for the quantification.

      Thank you for your question. Our primary objective was to quantify the synaptic terminals of CR+ INs in the CA1 O/A region, which are predominantly formed by I-S3 cells. Therefore, VGaT and CR co-localization was used to guide this analysis. GFP expression in axonal boutons can sometimes be inconsistent and less reliable for precise quantification. For this analysis, we utilized the “Analyze Particles” function in ImageJ, combined with watershed segmentation, which is now specified in the Methods section.

      -  Figure 4g: How was the statistical test performed? If data was averaged across mice, please add error bars and data points in the figure.

      Thank you for your question. To compare the alternation percentage between non-Tg and Tg mice, we used Fisher’s Exact test as detailed in Supplementary Table 1. In this analysis, we considered each animal's choice individually, comparing the preference for correct versus incorrect choices between the two groups. Since Fisher’s Exact test is designed for analyzing qualitative data rather than quantitative data, averaging across mice was not applicable, and therefore, we did not include error bars or data points in the figure.

      - Figure 4h: To conclude that the increase in activity is larger in the 3xTg mice, there should be a statistical comparison for the magnitude of change between the decision and the stem zone for control and 3xTg mice. To show that there is no significant difference in this measurement in the control mice is insufficient.

      Thank you for your suggestion. We performed a statistical comparison of the magnitude of change in activity between the stem zone and the D-zone for non-Tg and 3xTg mice, as recommended. Our analysis showed no significant difference in this magnitude of change between the two genotypes. These results have now been included in the Results section. However, we would like to highlight an important finding regarding the nature of these changes. In the 3xTg mice, there was a consistent increase in the activity of O/A INs when entering the Dzone. In contrast, non-Tg mice displayed a range of responses, including both increases and decreases in activity. This indicates a higher reliability in the firing of O/A INs in the D-zone of 3xTg mice. Our recent study suggests that VIP-INs are particularly active in the D-zone (Tamboli et al., 2024). Therefore, the absence or reduced input from VIP-INs in 3xTg mice may lead to the observed higher engagement of O/A INs in this zone. We believe this observation is crucial for understanding the differential yet nuanced changes in neural dynamics in these mice.

      - In the methods, it is stated that there was a pre-selection of animals depending on learning performance. Would it be possible to also show the data from animals that did not properly learn? Alternatively, it would be useful to plot the correlation between performance in this test and the difference between activity in the stem and the decision-making zone. The reason to ask for this is that there is a trend for control animals to show reduced alternations (50 vs 80%, although not significant, it is a big difference). Considering that there is also a trend in control animals to show increased activity in the decision-making zone, it would be important to confirm that this is not only due to differences in performance. The current statistical procedure does not allow discarding this.

      In this study, we excluded from the analysis the animals that refused to explore the T-maze and spent all their time in the stem corner, or refused to explore the objects and stayed in the open field maze (OFM) corner. These exclusions applied to both non-Tg (n = 6) and Tg (n = 5) groups, indicating that low exploratory activity is not necessarily linked to AD-related mutations. During the T-maze test, we also observed several animals that made incorrect choices (4 out of 9 non-Tg and 1 out of 6 Tg mice). However, due to the low number of animals making incorrect choices, we were unable to form a separate group for analysis based on incorrect choices. These details are now provided in the Methods section.

      - Figure 4i. It is not clear when exactly cell activity was measured. If it was during the entire recording time, I think it would be interesting to see if the activity of O/A interneurons is different specifically during interaction with the object in 3xTg mice.

      Cell activity was indeed measured throughout the entire recording session and analyzed in relation to animal behavior (immobility to walking; Fig. 4d,e), and periods specifically related to interaction with objects were extracted for analysis (Figure 4i).

      - Why was the object modulation measured during a different task in which both objects were the same? The figure is misleading in that sense, as it suggests the experiment was the same as for the other panels with two different objects. It would be important to correct this if the authors want to correlate the deficits in NOR in 3xTg mice and changes in IN activity.

      The study specifically investigated object-modulated neural activity during the Sampling phase. Therefore, two identical objects were placed in the arena for animal exploration. As mentioned above, due to several animals failing to explore the OFM and objects on the second day, they were excluded from the analysis, preventing the conduct of the novel-object exploration Test Trial. Both non-Tg and Tg mice showed a lack of exploration in the OFM and Tmaze, for reasons that remain unclear. Consequently, we opted to present robust data on neural activity during the initial sampling of two identical objects. However, further investigation is needed to understand how this activity relates to deficits observed in the classical NOR test.

      - Figure. 5c-f. I would strongly suggest performing the same quantification and displaying similar figures for the fiber photometry experiments in interneurons and principal cells. It would help to interpret the data.

      We have taken the reviewer's suggestion into account and standardized the data analysis and presentation. Figures 4d, e and 5c, d now depict the walk-induced activity in INs and PCs, respectively. Figures 4h and 5f compare activity between the stem and D-zone in the T-maze. Additionally, Figures 4j and 5h illustrate the object modulation of INs and PCs, respectively.

      - Although velocity and mobility were quantified, it would be important to show also that they are not different during those times when activity was dissimilar, as in the decision zone.

      We have analyzed these data and found no significant differences between the two genotypes in terms of velocity and mobility during these periods. This analysis is now presented in Supplementary Figure 5e, f and detailed in the Results section.

      - Figure 5g-h. Similarly, I would suggest using the same metrics in order to correlate the results from interneuron and principal cell activity photometry.

      We have updated this figure to align with the presentation of interneurons (Figure 4j) and included RMS analysis to emphasize lower variance in object modulation of PCs as an indicator of increased network inhibition.

      - Was object modulation variance also different for INs depending on the mouse phenotype?

      We conducted this additional analysis but did not find any significant difference.

      - Figure S4: would it be possible to identify the postsynaptic partners?

      As mentioned above, for those cells with preserved axonal structures, we identified both OLM and bistratified cells. We have now included this information in the Results section to clarify the types of interneurons identified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors address a fundamental unresolved question in cerebellar physiology: do synapses between granule cells (GCs) and Purkinje cells (PCs) made by the ascending part of the axon (AA) have different synaptic properties from those made by parallel fibers? This is an important question, as GCs integrate sensorimotor information from numerous brain areas with a precise and complex topography.

      Summary:

      The authors argue that CGs located close to PCs essentially contact PC dendrites via the ascending part of their axons. They demonstrate that joint high-frequency (100 Hz) stimulation of distant parallel fibers and local CGs potentiates AA-PC synapses, while parallel fiber-PC synapses are depressed. On the basis of paired-pulse ratio analysis, they concluded that evoked plasticity was postsynaptic. When individual pathways were stimulated alone, no LRP was observed. This associative plasticity appears to be sensitive to timing, as stimulation of parallel fibers first results in depression, while stimulation of the AA pathway has no effect. NMDA, mGluR1 and GABAA receptors are involved in this plasticity.

      Strengths:

      Overall, the associative modulation of synaptic transmission is convincing, and the experiments carried out support this conclusion. However, weaknesses limit the scope of the results.

      Weaknesses:

      One of the main weaknesses of this study is the suggestion that high-frequency parallel-fiber stimulation cannot induce long term potentiation unless combined with AA stimulation. Although we acknowledge that the stimulation and recording conditions were different from those of other studies, according to the literature (e.g. Bouvier et al 2016, Piochon et al 2016, Binda et al, 2016, Schonewille et al 2021 and others), high-frequency stimulation of parallel fibers leads to long-term postsynaptic potentiation under many different experimental conditions (blocked or unblocked inhibition, stimulation protocols, internal solution composition). Furthermore, in vivo experiments have confirmed that high-frequency parallel fibers are likely to induce long-term potentiation (Jorntell and Ekerot, 2002; Wang et al, 2009).

      This article provides further evidence that long-term plasticity (LTP and LTD) at this connection is a complex and subtle mechanism underpinned by many different transduction pathways. It would therefore have been interesting to test different protocols or conditions to explain the discrepancies observed in this dataset.

      Even though this is not the main result of this study, we acknowledge that the control experiments done on PF stimulation add a puzzling result to an already contradictory literature. High frequency parallel fibre stimulation (in isolation) has been shown to induce long term potentiation in vitro, but not always, and most importantly, this has been shown in vivo. This was the reason for choosing that particular stimulation protocol. Examination of in vitro studies, however, show that the results are variable and even contradictory. Most were done in the presence of GABAA receptor antagonists, including the SK channel blocker Bicuculline, whereas in the study by Binda (2016), LTP was blocked by GABAA receptor inhibition. In some studies also, LTP was under the control of NMDAR activation only, whereas in Binda (2016), it was under the control of mGluR activation. Moreover, most experiments were done in mice, whereas our study was done in rats. Our results reveal multiple mechanisms working together to produce plasticity, which are highly sensitive to in vitro conditions. We designed our experiments to be close to the physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to reproduce PF-LTP, but it was not the aim of this study to dissect the subtleties of the different experimental protocols and models.

      We have modified the Discussion to cover that point fully.

      Another important weakness is the lack of evidence that the AAs were stimulated. Indeed, without filling the PC with fluorescent dye or biocytin during the experiment, and without reconstructing the anatomical organization, it is difficult to assess whether the stimulating pipette is positioned in the GC cluster that is potentially in contact with the PC with the AAs. According to EM microscopy, AAs account for 3% of the total number of synapses in a PC, which could represent a significant number of synapses. Although the idea that AAs repeatedly contact the same Purkinje cell has been propagated, to the best of the review author's knowledge, no direct demonstration of this hypothesis has yet been published. In fact, what has been demonstrated (Walter et al 2009; Spaeth et al 2022) is that GCs have a higher probability of being connected to nearby PCs, but are not necessarily associated with AAs.

      We fully agree with the reviewer that we have not identified morphologically ascending axon synapses, and we stress this fact both in the first paragraph of the Results section, and again at the beginning of Discussion. Our point is mainly topographical, given the well documented geometrical organisation of the cerebellar cortex. Strictly speaking, inputs are local (including AAs) or distal (PFs). Similarly, the studies by Isope and Barbour (2002) and Walter et al. (2009), just like Sims and Hartell (2005 and 2006), have coined the term ‘ascending axon’ when drawing conclusions about locally stimulated inputs. Moreover, our results do not rely on or assume multiple contacts, stronger connections, or higher probability of connections between ascending axons and Purkinje cells. Our results only demonstrate a different plasticity outcome for the two types of inputs. Therefore, our manuscript could be rephrased with the terms ‘local’ and ‘distal’ granule cell inputs, but this would have no more implication for the results or the computation performed in Purkinje cells. However, in our experience, these terms are more confusing, and consistent with the literature, we do not wish to make this modification. However, we have modified the abstract of the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a form of synaptic plasticity at synapses from granule cells onto Purkinje cells in the mouse cerebellum, which is specific to synapses proximal to the cell body but not to distal ones. This plasticity is induced by the paired or associative stimulation of the two types of synapses because it is not observed with stimulation of one type of synapse alone. In addition, this form of plasticity is dependent on the order in which the stimuli are presented, and is dependent on NMDA receptors, metabotropic glutamate receptors and to some degree on GABAA receptors. However, under all experimental conditions described, there is a progressive weakening or run-down of synaptic strength. Therefore, plasticity is not relative to a stable baseline, but relative to a process of continuous decline that occurs whether or not there is any plasticity-inducing stimulus.

      As highlighted by the reviewer, we observed a postsynaptic rundown of the EPSC amplitude for both input pathways. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation, and the progressive decrease of the EPSC amplitude during the course of an experiment leads to an underestimate of the absolute potentiation. We have taken the view to provide a strong set of control data rather than selecting experiments based on subjective criteria or applying a cosmetic compensation procedure. We have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown. Comparison shows a highly significant potentiation of the ascending axon EPSC. Depression of the parallel fibre EPSC, on the other hand, was not significantly different from rundown, and we have not spoken of parallel fibre long term depression. The data show thus very clearly that ascending axon and parallel fibre synapses behave differently following the costimulation protocol.

      Strengths:

      The focus of the authors on the properties of two different synapse-types on cerebellar Purkinje cells is interesting and relevant, given previous results that ascending and parallel fiber synapses might be functionally different and undergo different forms of plasticity. In addition, the interaction between these two synapse types during plasticity is important for understanding cerebellar function. The demonstration of timing and order-dependent potentiation of only one pathway, and not another, after associative stimulation of both pathways, changes our understanding of potential plasticity mechanisms. In addition, this observation opens up many new questions on underlying intracellular mechanisms as well as on its relevance for cerebellar learning and adaptation.

      Weaknesses and suggested improvements:

      A concern with this study is that all recordings demonstrate "rundown", a progressive decrease in the amplitude of the EPSC, starting during the baseline period and continuing after the plasticity-induction stimulus. In the absence of a stable baseline, it is hard to know what changes in strength actually occur at any set of synapses. Moreover, the issues that are causing rundown are not known and may or may not be related to the cellular processes involved in synaptic plasticity. This concern applies in particular to all the experiments where there is a decrease in synaptic strength.

      We have provided an answer to that point directly below the summary paragraph. We will just add here that if the phenomenon causing rundown was involved in plasticity, it should affect plasticity of both inputs, which was not the case, clearly distinguishing the ascending axon and parallel fibre inputs.

      The authors should consider changes in the shape of the EPSC after plasticity induction, as in Fig 1 (orange trace) as this could change the interpretation.

      Figure 1 shows an average response composed of evoked excitatory and inhibitory synaptic currents. The third section of Supplementary material (supplementary figure 3) shows that this complex shape is given by an EPSC followed by a delayed disynaptic IPSC. We would like to point out that while separating EPSC from IPSC might appear difficult from average traces due to the averaged jitter in the onset of the synaptic currents, boundaries are much clearer when analysing individual traces. In the same section we discuss the results of experiments in which transient applications of SR 95531 before and after the induction protocol allowed us to measure the EPSC, while maintaining the same experimental conditions during induction. Analysis of the kinetics of the EPSCs during SR application at the beginning and end of experiments, showed that there is no change in the time to peak of both AA and PF response. The decay time of AA- and PF-EPSCs are slightly longer at the end of the experiment, even if the difference is not significant for AA inputs. This analysis has been added to the Supplementary material. Our analysis, that uses as template the EPSCs kinetics measured at the beginning and at the end of the experiments, takes directly into account these changes. The results show clearly that the presence of disynaptic inhibition doesn’t significantly affect the measure of the peak EPSC after the induction protocol nor the estimate of plasticity.

      In addition, the inconsistency with previous results is surprising and is not explained; specifically, that no PF-LTP was induced by PF-alone repeated stimulation.

      In our experimental conditions, PF-LTP was not induced when stimulating PF only, the condition that reproduces experiments in the literature. As discussed in our response to reviewer 1, a close look at the literature, however, reveals variabilities and contradictions behind seemingly similar results. They reveal intricate mechanisms working together to produce plasticity, which are sensitive to in vitro conditions. We designed our experiments to be close to physiological conditions, with inhibition preserved and a physiological chloride gradient. It is likely that experimental differences have given rise to the variability of the results and our inability to observe PF-LTP. We have modified the Discussion section to cover that point thoroughly in the context of past results. 

      The authors test the role of NMDARs, GABAARs and mGluRs in the phenotype they describe. The data suggest that the form of plasticity described here is dependent on any one of the three receptors. However, the location of these receptors varies between the Purkinje cells, granule cells and interneurons. The authors do not describe a convincing hypothetical model in which this dependence can be explained. They suggest that there is crosstalk between AA and PF synapses via endocannabinoids downstream of mGluR or NO downstream of NMDARs. However, it is not clear how this could lead to the long-term potentiation that they describe. Also, there is no long-lasting change in paired-pulse ratio, suggesting an absence of changes in presynaptic release.

      We suggest in the result section that the transient change in paired pulse ratio (PPR) is linked to a transient presynaptic effect, but there was no significant long term change of the PPR, suggesting that the long term effects observed are linked to postsynaptic changes. We now stress this point in the Results and Discussion sections.

      Concerning the involvement of multiple molecular pathways, investigators often tested for the involvement of NMDAR or mGluRs in cerebellar plasticity, rarely both. Here we showed that both pathways are involved. The conjunctive requirement for NMDAR and mGluR activation could easily be explained based on the dependence of cerebellar LTP and LTD on the concentrations of both NO and postsynaptic calcium (Coesman et al., 2004; Safo and Regehr, 2005; Bouvier et al., 2016; Piochon et al., 2016).

      We also observed an effect of GABAergic inhibition. GABAergic inhibition was elegantly shown by Binda (2016) to regulate calcium entry together with mGluRs, and control plasticity induction. A similar mechanism could contribute to our results, although inhibition might have additional effects. We have modified the Discussion of the manuscript to clarify the pathways involved in plasticity and added a diagram to highlight the links between the different molecular pathways, potential cross talk mechanisms, and the location of receptors.

      Is the synapse that undergoes plasticity correctly identified? In this study, since GABAergic inhibition is not blocked for most experiments, PF stimulation can result in both a direct EPSC onto the Purkinje cell and a disynaptic feedforward IPSC. The authors do address this issue with Supplementary Fig 3, where the impact of the IPSC on the EPSC within the EPSC/IPSC sequence is calculated. However, a change in waveform would complicate this analysis. An experiment with pharmacological blockade will make the interpretation more robust. The observed dependence of the plasticity on GABAA receptors is an added point in favor of the suggested additional experiments.

      We did consider that due to long recording times there might be kinetic changes, and that’s the reason why the experiments of Supplementary figure 3 were done with pharmacological blockade of GABAAR with SR, both before and again after LTP induction. The estimate of the amplitude of the EPSC is based on the actual kinetics of the response at both times.

      A primary hypothesis of this study is that proximal, or AA, and distal, or PF, synapses are different and that their association is specifically what drives plasticity. The alternative hypothesis is that the two synapse-types are the same. Therefore, a good control for pairing AA with PF would be to pair AA with AA and PF with PF, thereby demonstrating that pairing with each other is different from pairing with self.

      Pairing AA with AA would be difficult because stimulation of AA can only be made from a narrow band below the PC and we would likely end up stimulating overlapping sets of synapses. However, Figure 5 shows the effect of stimulating PF and PF, while also mimicking the sparse and dense configuration of the control experiment. It shows that sparse PF do not behave like AA. Sims and Hartell (2006) also made an experiment with sparse PF inputs and observed clear differences between sparse local (AA) and sparse distal (PF) synapses.

      It is hypothesized that the association of a PF input with an AA input is similar to the association of a PF input with a CF input. However, the two are very different in terms of cellular location, with the CF input being in a position to directly interact with PF-driven inputs. Therefore, there are two major issues with this hypothesis: 1) how can subthreshold activity at one set of synapses affect another located hundreds of micrometers away on the same dendritic tree? 2) There is evidence that the CF encodes teaching/error or reward information, which is functionally meaningful as a driver of plasticity at PF synapses. The AA synapse on one set of Purkinje cells is carrying exactly the same information as the PF synapses on another set of Purkinje cells further up and down the parallel fiber beam. It is suggested that the two inputs carry sensory vs. motor information, which is why this form of plasticity was tested. However, the granule cells that lead to both the AA and PF synapses are receiving the same modalities of mossy fiber information. Therefore, one needs to presuppose different populations of granule cells for sensory and motor inputs or receptive field and contextual information. As a consequence, which granule cells lead to AA synapses and which to PF synapses will change depending on which Purkinje cell you're recording from. And that's inconsistent with there being a timing dependence of AA-PF pairing in only one direction. Overall, it would be helpful to discuss the functional implications of this form of plasticity.

      We do not hypothesise that association of the AA and PF inputs is similar to the association of PF and climbing fibre inputs. We compare them because it is the other known configuration triggering associative plasticity in Purkinje cells. It is indeed interesting to observe that even if the inputs are very small compared to the powerful climbing fibre input, they can be effective at inducing plasticity. Physiologically, the climbing fibre signal has been linked to error and reward signals, but reward signals are also encoded by granule cell inputs (Wagner et al., 2017). We have modified the discussion to make sure that we do not suggest equivalence with CF induced LTD.

      Moreover, we fully agree that AA and PF synapses made up by a given granule cell carry the same information, and cannot encode sensory and motor information at the same time. AA synapses from a local granule cell deliver information about the local receptive field, but PF synapses from the same granule cell will deliver contextual information about that receptive field to distant Purkinje cells. In the context of sensorimotor learning, movement is learnt with respect to a global context, not in isolation, therefore learning a particular association must be relevant. The associative plasticity we describe here could help explain this functional association. We have clarified the discussion.

      Reviewer #3 (Public Review):

      Granule cells' axons bifurcate to form parallel fibers (PFs) and ascending axons (AAs). While the significance of PFs on cerebellar plasticity is widely acknowledged, the importance of AAs remains unclear. In the current paper, Conti and Auger conducted electrophysiological experiments in rat cerebellar slices and identified a new form of synaptic plasticity in the AA-Purkinje cell (PC) synapses. Upon simultaneous stimulation of AAs and PFs, AA-PC EPSCs increased, while PFs-EPSCs decreased. This suggests that synaptic responses to AAs and PFs in PCs are jointly regulated, working as an additional mechanism to integrate motor/sensory input. This finding may offer new perspectives in studying and modeling cerebellum-dependent behavior. Overall, the experiments are performed well. However, there are two weaknesses. First, the baseline of electrophysiological recordings is influenced significantly by run-down, making it difficult to interpret the data quantitatively. The amplitude of AA-EPSCs is relatively small and the run-down may mask the change. The authors should carefully reexamine the data with appropriate controls and statistics. Second, while the authors show AA-LTP depends on mGluR, NMDA receptors, and GABA-A receptors, which cell types express these receptors and how they contribute to plasticity is not clarified. The recommended experiments may help to improve the quality of the manuscript.

      As highlighted by the reviewer and developed above in response to reviewer 2, we observed a postsynaptic rundown of the EPSC amplitude. Rundown could be mistaken for a depression of synaptic currents, not for a potentiation. Moreover, we have conducted control experiments with no induction (n = 17), which give a good indication of the speed and amplitude of the rundown, and provide a baseline. Comparison shows a highly significant potentiation of the ascending axon EPSC, relative to baseline and relative to these control experiments. Depression of the parallel fibre EPSC on the other hand was not significantly different from rundown. For that reason we have not spoken of parallel fibre long term depression. The data, however, show that ascending axon and parallel fibre synapses behave very differently following the costimulation protocol.

      We have discussed above in our response to reviewer 2 the potential involvement of mGluRs, NMDARs and GABAARs. We have clarified the discussion of the pathways involved in plasticity and added a diagram to highlight the links between the different molecular pathways, potential cross talk mechanisms, and the location of receptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - If Chloride concentration cannot be modified, recordings should be performed at the Chloride reversal potential to avoid strong bias in amplitude measurements (e.g. in Figures 3 and 5 outward current was observed while not visible in Figures 1 and 4.

      The balance between excitation and inhibition dictates whether there is a visible outward component, and this varies with the connections tested. Careful control experiments with SR application presented in supplementary figure 3 show that the delay of the IPSC does not significantly affect measurement of the peak amplitude of the EPSC. The reversal potential for Clin our study (-85 mV), chosen to reproduce the physiological gradient in Purkinje cells, is too low to record from Purkinje cells at this potential in good conditions as it activates the hyperpolarisation activated cation current Ih, generating huge inward currents.

      - It is not clear whether, during the current clamp, the potential was maintained at -65 mV throughout the induction protocol.

      The potential was set and maintained around -65mV during the induction protocol. The method section has been amended to specify that point.

      - Experiments using GABAB or endocannabinoid antagonists would have been interesting to assess the role of presynaptic plasticity occluding postsynaptic plasticity.

      We are not sure why the reviewer suggested these particular experiments to test for the role of presynaptic plasticity. GABAB and endocannabinoid receptor activation both have presynaptic effects at granule cell to Purkinje cell synapses. They decrease release probability, and as a result increase the paired pulse ratio (Dittman and Regehr, 1997; Safo and Regehr, 2005). Here we only observed a transient decrease of the paired pulse ratio. Additionally, presynaptic endocannabinoid receptor activation, linked to postsynaptic mGluR1 activation and release of endocannabinoids, was shown to be required for induction of postsynaptic PF-LTD (Safo and Regehr, 2005). This effect required climbing fibre stimulation and mGluR activation. Here we show that mGluR1 inhibition did not inhibit the PF depression nor affect the transient change in PPR. Therefore there is no indication that activation of these receptors could induce a pre-synaptic depression occluding postsynaptic plasticity.

      - To give credit to this new plasticity in contradiction with many previous studies, induction pathways should be addressed more deeply.

      As developed earlier in response to the public review, this study does not contradict previous studies, expect maybe that by Binda et al., (2016), conducted on mice. From our point of view, our study in fact reconciles past results which have alternatively involved the mGluR or NMDAR pathways, whereas the molecular downstream pathways they recruit can easily cooperate. We aim to describe a new phenomenon and we cannot cover the mechanistic dissection which has been performed to date on plasticity in the cerebellar cortex.

      - The quality of the figures could be enhanced by modifying the dashed line.

      We have made the dashed line more discrete.

      Reviewer #2 (Recommendations For The Authors):

      - Is there cross-talk between the two synaptic pathways?

      In order to explain the associative nature of AA-LTP we suggest that a signal is generated at the AA input during the induction protocol only when the PF input is also stimulated, i.e. a form of cross-talk takes place between the two synaptic territories. We have not tested for cross-talk during control conditions but we discuss the fact that given the size of the Purkinje cell dendritic tree, the size of the inputs and their geometrical configuration, it is highly unlikely. We discuss possible cross-talk mechanisms.

      - Clarification question: "While the peak amplitude of the first response in the pair of stimulations showed a progressive decline, the peak amplitude of the second response of both AA and PF underwent either LTP or LTD respectively..." Does this mean that all LTP/LTD figures show the amplitude of the second EPSC in the paired pulse stimulation, and that the first EPSC has a different response? If so, this should be mentioned in the Methods section and implications discussed.

      All figures show both the amplitude of the first and second EPSCs in the pair of stimulations. In Figure 1A, 3A, 4A and 5B the paired stimulation protocol is depicted with colours and symbols used in the associated graphs, with closed symbols for the first and open symbols for the second EPSC. Figure legends have been amended to clarify this point. The average values given in the Results section and figure legends relate to the first EPSC only for clarity. As can be seen from the figures, long term plasticity affected the first and second EPSC in a very similar manner. However, individual symbols show that during a transient period, the first and second EPSCs are differentially affected by the induction protocol, resulting in a transient change of the PPR.

      Minor suggestions:

      - It would be helpful to have a reference for the statement that 1-2% of stimulated fibers come from nearby GCs when stimulation is distal.

      We have modified the text to explain our calculation based on the data of Pichitpornchai et al., 1994. P4 result section.

      - Does the shading over the plasticity time course traces come from the standard error of the mean?

      Shading over the plasticity time course plots shows the standard error of the mean. This is now clearly stated in figure legends.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Whether the plasticity between AAs and PCs is regulated by the post-synaptic or pre-synaptic mechanisms should be addressed or discussed. Based on the results of PPR (mostly unchanged after induction), the post-synaptic mechanism may be more significant. Supplemental Figure 2C shows a trend toward a positive correlation between AALTP and the number of spikes, suggesting intracellular calcium levels in the post-synaptic Purkinje cells may be important. Whether this is true or not can be directly tested by the addition of BAPTA in the recording pipettes.

      The absence of a long lasting effect on the paired pulse ratio (PPR) indicates that postsynaptic mechanisms are involved in long term changes. This is in line with the dependence of plasticity induced with similar protocols on the concentrations of NO and postsynaptic calcium, both affecting postsynaptic targets, as developed in our response to reviewer 2. BAPTA interferes with calcium and mGluR signalling, and could be used to further confirm the involvement of a postsynaptic mechanism, however, we did not wish to pursue further the dissection of the signalling cascade. We have modified the Results and Discussion sections to include a discussion of pre and postsynaptic mechanisms.

      (2) Most results from the plasticity experiments are shown as average/sem and do not include individual data, making ithard to appreciate the magnitude of the changes. The authors could show the individual data at some time points (e.g. 5 min before and 30 min after induction), plot bar-graphs (Figure 2C with individual data), or boxplots to compare different conditions and perform statistics.

      Individual data points are now visible for plasticity induction in Figure 2C and Supplementary Figure 2 for a number of conditions. Statistics have been performed as detailed in the text and legend of Fig 2.

      (3) In addressing point #2, it is strongly recommended that the authors include the values for controls without inductionbecause AA/PF-EPSCs undergo significant run-down. In most experiments, the authors compare the magnitude of plasticity with baseline changes in Supplemental Figure 1. This should not be appropriate for some experiments, such as Figures 3 & 4, where pharmacological treatments are performed. The authors should carefully consider including the appropriate controls from baseline recording to rule out significant confound by the run-down.

      We agree that control experiments without stimulation (no Stim) are only appropriate controls for the initial synchronous stimulation and AA and PF only experiments (Fig 1). All the other experiments were compared to the synchronous stimulation experiments, not to control No Stim. The synchronous stimulation protocol is strictly the same as that applied in experiments with pharmacological treatments and the appropriate control to test whether treatments affected plasticity. This is now systematically specified in the Results section.

      (4) The authors recorded mixed EPSC/IPSCs and used a fitting approach to extract EPSCs. Applying AMPA-receptor blockers to check that extracted IPSCs are correctly predicted may solidify the reliability of the approach. An additional concern is that this approach can only be used if the waveform of EPSC/IPSC does not change with plasticity. The authors should compare the waveforms between conditions to address this point.

      Fits were not used to extract EPSCs. EPSCs were isolated by blocking IPSCs with SR95531, and the IPSCs were then extracted by subtraction from the mixed EPSC/IPSC. Fits were then done of the isolated EPSC and the extracted IPSC. This procedure was applied both at the start of the experiment and at the end to avoid changes in kinetics that would influence measurements. A section of supplementary material is devoted to this analysis. Isolating IPSCs using AMPAR blockers is not possible as IPSCs are disynaptic. AMPAR blockers would fully suppress inhibition.

      (5) While the AA-LTP depends on NMDA-Rs, which cell type is responsible is not clear. Recording NMDA components in AA/PF-EPSCs should be informative in addressing this point. Cesana et al suggested that AA induces significant activation of NMDA-Rs in Golgi cells (PMID: 23884948). Whether AA stimuli could significantly evoke NMDA current in the experimental condition used in this paper could provide essential information.

      The granule cell to Purkinje cell EPSCs are devoid of an NMDAR component (Llano et al., 1991), and there is no postsynaptic NMDARs at granule cell to PC synapses, but a proportion of presynaptic boutons show the presence of NMDARs (Bidoret et al, 2009). This is now stated clearly on p8.  Presynaptic NMDAR have been involved in LTP and LTD of parallel fibre synapses (Casado et al., 2002; Bouvier et al., 2016; Schonewille et al., 2021), and linked to the activation of NOS in granule cell axons. However, we do not know whether presynaptic NMDARs are also present at AA synapses. NMDAR and NOS are also expressed by molecular layer interneurons, and have sometimes been involved in LTD induction (Kono et al., 2019), although this is disputed. In the paper by Cesana (2013), white matter stimulation activated mossy fibre inputs to granule cells, and as a consequence, granule cell to Golgi cell disynaptic EPSCs. The authors identified AA synapses on the basolateral dendrites of Golgi cells, and showed NMDAR activation associated with the mossy fibre to granule cell EPSC. Granule cell to Golgi cell synapses were shown to activate both postsynaptic AMPA and NMDA receptors (Dieudonné, 1999). But to our knowledge, Golgi cells do not express NOS. Therefore it is unlikely that activation of NMDARs in Golgi cells is linked to synaptic plasticity in Purkinje cells.

      (6) Pharmacological experiments in Figure 3 show that AA-LTP is dependent on mGluR. The authors mentioned that it could be explained by the presence and absence of mGluRs in PFs and AAs, respectively. This is an important and reasonable possibility and should be tested. The authors could simply check whether slow EPSCs can be recorded by the AA activation.

      Activation of the mGluR slow EPSC by AA stimulation would reveal the presence of mGluRs at AA inputs. We know, however, that sparse PF stimulation does not activate the mGluR slow EPSC nor endocannabinoid release unless glutamate transporters are blocked (Marcaggi and Attwell., 2005). This is thought to reflect insufficient glutamate buildup in the sparse configuration to activate mGluR1s. AA inputs are sparsely distributed and are not expected to activate the slow EPSC either, and this is confirmed by our own experiments (CA personal communication). However, mGluR1 mediated Ca2+ release from stores shows a higher sensitivity to glutamate than the slow EPSC (Canepari and Ogden, 2006) and might take place with sparse inputs, but Ca2+ signals have not been investigated in this configuration. Therefore the absence of the slow EPSC is not sufficient proof that mGluR1s are not activated and not present at AA synapses. This is now further discussed p12.

      Minor points:

      (1) The authors should describe how they adjusted the stimulation strength for both AAs and PFs.

      Adjustment of the stimulation intensity is now described in the Methods section.

      (2) A rationale explaining why the authors chose the current induction protocol (synchronous stimulation of both inputs) should be included. This will help the readers to understand the background of the study.

      Papers by Sims and Hartell (2005, 2006) and experimental evidence indicated that AA and PF inputs may have different properties, and as a result may play different roles. Moreover, based on the morphology of the cerebellar granule cell and Purkinje cell, AA and PF inputs can carry different information to a given Purkinje cell. We reasoned that co-presentation of the inputs might represent an important piece of information for the circuit, signalling functional association, and lead to plasticity, as seen for motor command and sensory feedback in cerebellar-like structures, or for PF and climbing fibre. We have tried to convey that rational in the abstract and introduction.

      (3) Supplemental Figure 2B: the x-axis may be labeled incorrectly, Is the x-axis of the top graph for PF PF-EPSC? Thex-axis for the bottom graphs should be the summation of AA- and PF-EPSCs.

      This has been corrected.

      (4) "mglur1" on page 10 should be mGluR1.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please reorder the supplementary figures in the order they are referred to in the Results section for ease of reading. Supp Fig 5 b - should read 'Mean normalized fluorescence of LC ROIs (n = 87) during immobile periods aligned to the switch from familiar to novel environment.’

      We thank the reviewer for highlighting these issues and have reordered the supplementary figures and edited the figure legends appropriately.

      Reviewer #2 (Recommendations For The Authors):

      The authors should include sample size justifications (e.g. based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors).

      In response to this concern, we have added a statement to the “Imaging Sessions” section of the methods. Here we highlight sample sizes were largely based on previous studies and/or limited by the difficulty of recordings and the limited number of visible axons per imaging session.

      Reviewer #3 (Recommendations For The Authors):

      The addition of Supp. Fig 5 partially addresses my previous point 3. However, the claim of dissociation between VTA-CA1 and LC-CA1 would be strengthened by showing that VTA-CA1 axons do not respond to the darkness -> familiar environment in Supp Fig 5. This is particularly important given that (1) the additional 2 VTA-CA1 axons in the revision were not recorded during transitions to novel environments and (2) the overall concern of the reviewers that the low n and heterogeneity of the VTA-CA1 dataset may lead to a false negative. Providing VTA-CA1 data for the darkness -> familiar environment would provide a within-manuscript replication that these axons are not responding to environment changes; a major claim of this manuscript.

      While we agree that data of VTA-CA1 axons during the switch from darkness to the familiar environment would provide additional evidence that these axons are not responding to environment changes, unfortunately, VTA axons were not recorded during the switch from familiar to novel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. Because the anatomic features in the new specimens were neither properly revealed nor correctly interpreted, the evidence for several conclusions is inadequate. 

      We thank the Senior Editor, Reviewing Editor and three reviewers for their work, and for their comments aimed at improving this project and manuscript. We have engaged with all the comments in detail, in order to strengthen our work. This includes adding additional data to support that all Acanthomeridion specimens belong to a single species, running further phylogenetic analyses including more trilobite terminals to test the specific hypothesis and interpretation raised by Reviewer 2, and visualising our results in treespace in order to determine support for the different interpretations of the ventral structures and their implications for the evolution of Artiopoda. We have also greatly expanded the introduction, which we feel adds clarity to areas misunderstood by some reviewers in the previous version of the manuscript.

      Our point-by-point response to the public reviews of the reviewers are outlined below. We have also made changes resulting from the additional suggestions which are not public, which we have not reproduced below. We submit a new version of the main text, and can provide a tracked changes version if required. The new main text includes 9 figures and is 8624 words including captions and reference list.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dorsal and ventral anatomies of a potential new taxon of artipodeans that are closely related to trilobites. Authors assigned their specimens to Acanthomeridion serratum and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda. 

      We thank Reviewer 1 for their comments on the strengths and weaknesses of the previous version of the manuscript. We hope that the revised version strengthens our conclusions that Acanthomeridion anacanthus is a junior synonym of A. serratum.

      Strengths: 

      New specimens are highly qualified and informative. The morphology of the dorsal exoskeleton, except for the supposed free cheek, was well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses. 

      Weaknesses: 

      The weaknesses of this work are obvious in a number of aspects. Technically, ventral morphology is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by the authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphometric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. 

      We appreciate that the reviewer was not convinced by our synonimisation in the first version of the manuscript. The recommendation of the reviewer to provide linear morphometric support for our synonymisation was much appreciated. We have provided measurements of the length and width of the thorax (Figure 6 in the new version), visualising the position of specimens previously assigned to A. anacanthus, to show this morphological continuity. These act as a complement to Figure 5, which shows the fossils in an ontogenetic trend.

      I am confused by the author's description of the free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of the cephalic shield, e.g. hypostome, and fixgena? Critically, the homology of cephalic slits (eye slits, eye notch, dorsal suture, facial suture) is not extensively discussed either morphologically or functionally.

      We appreciate that the brevity of the introduction in the previous version led to some misunderstandings and some confusion. We have provided a greatly expanded introduction, including a new Figure 1, which outlines the possible homologies of the ventral plates and the three hypotheses considered in this study. The function of the cephalic and dorsal suture are now discussed in more detail both in introduction and discussion.

      Finally, the authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can explain a deep homology of the cephalic suture at molecular level and multiple co-options within the Atiopoda. 

      A deep molecular origin is difficult to demonstrate using solely fossil material from an extinct group such as Artiopoda. Thus our study focuses on morphological origins. The number of losses required for a deep morphological origin means that we favour multiple independent morphological origins.

      Reviewer #2 (Public Review): 

      Overall: This paper describes new material of Acanthomeridion serratum that the authors claim supports its synonymy with Acanthomeridion anacanthus. The material is important and the description is acceptable after some modification. In addition, the paper offers thoughts and some exploration of the possibility of multiple origins of the dorsal facial suture among artiopods, at least once within Trilobita and also among other non-trilobite artiopods. Although this possibility is real and apparently correct, the suggestions presented in this paper are both surprising and, in my opinion, unlikely to be true because the potential homologies proposed with regard to Acanthomeridion and trilobite-free cheeks are unconventional and poorly supported. 

      What to do? I can see two possibilities. One, which I recommend, is to concentrate on improving the descriptive part of the paper and omit discussion and phylogenetic analysis of dorsal facial suture distribution, leaving that for more comprehensive consideration elsewhere. The other is to seek to improve both simultaneously. That may be possible but will require extensive effort. 

      We thank the reviewer for their detailed comments and suggestions for multiple ways in which we might revise the manuscript. We have taken the option that is more effort, but we hope more reward, in interrogating the larger question alongside improving the descriptive part of the paper. This has taken a long time and incorporation of new techniques, but has in our opinion greatly strengthened the work.

      Major concerns 

      Concern 1 - Ventral sclerites as free cheek homolog, marginal sutures, and the trilobite doublure 

      Firstly, a couple of observations that bear on the arguments presented - the eyes of A. serratum are almost marginal and it is not clear whether a) there is a circumocular suture in this animal and b) if there was, whether it merged with the marginal suture. These observations are important because this animal is not one in which an impressive dorsal facial suture has been demonstrated - with eyes that near marginal it simply cannot do so. Accordingly, the key argument of this paper is not quite what one would expect. That expectation would be that a non-trilobite artiopod, such as A. serratum, shows a clear dorsal facial suture. But that is not the case, at least with A. serratum, because of its marginal eyes. Rather, the argument made is that the ventral doublure of A. serratum is the homolog of the dorsal free cheeks of trilobites. This opens up a series of issues. 

      We appreciate that the reviewer disagrees with both interpretations we offered for the ventral plates, and has offered a third interpretation for the homology of this feature with the doublure of trilobites. Support for our original interpretation comes from the position of the eye stalks in Acanthomeridion, which fall very close to the suture between ventral plate rest of the cephalon. However, we appreciate that the reviewer has a valid interpretation, that the ventral plates might be homologues of the doublure alone.

      To clarify the (two, now three) hypotheses of homology for the ventral plates considered in this study, we provide a new summary figure (Figure 1). In addition, the introduction has been greatly lengthened with further discussion of the different suture types in trilobites, their importance for trilobite classification schemes, and extensive references to older literature are now included. Further, we add background to the hypotheses around the origins of dorsal ecdysial sutures. 

      We add that the interpretation of A. serratum as having features homologous to the dorsal sutures of trilobites is already present in the literature, and so while the reviewer may disagree with it, it is certainly a hypothesis that requires testing.

      The paper's chief claim in this regard is that the "teardrop" shaped ventral, lateral cephalic plates in Acanthomeridion serratum are potential homologs of the "free cheeks" of those trilobites with a dorsal facial suture. There is no mention of the possibility that these ventral plates in A. serratum could be homologs of the lateral cephalic doublure of olenelloid trilobites, which is bound by an operative marginal suture or, in those trilobites with a dorsal facial suture, that it is a homolog of only the doublure portions of the free cheeks and not with their dorsal components. 

      We include this third possibility in our revised analyses and manuscript. To test this properly required adding in an olenelloid trilobite to our matrix, as we needed a terminal that had both a marginal and circumoral suture, but not fused. We chose Olenellus getzi for this purpose, as it is the only Olenellus with some appendages known (the antennae). We also added further characters to the morphological matrix, and additional trilobites from which soft tissues are known, in order to better resolve this part of the tree. Trilobites in the final analyses were: Anacheirurus adserai, Cryptolithus tesselatus, Eoredlichia intermedia, Olenoides serratus, Olenellus getzi, Triarthrus eatoni.

      However, addition of these trilobites added a further complication. Under unconstrained analysis, Olenellus getzi was resolved with Eoredlichia intermediata as a clade sister to all other trilobites.

      Thus the topology of Paterson et al. 2019 (PNAS) was not recovered, and so the hypothesis of Reviewer 2 could not be robustly tested. In order to achieve a topology comparable to Paterson et al., we ran a further three analyses, where we constrained a clade of all trilobites except for O. getzi. This recovered a topology where the earliest diverging trilobites had unfused sutures, and thus one suitable for considering the role of Acanthomeridion serratum ventral plates as homologues of the doublure of trilobites.

      Unfortunately, for these analyses (both constrained and unconstrained), Acanthomeridion was not resolved as sister to trilobites, but instead elsewhere in the tree (see Table 1 in main text, Fig. 9, and  SFig 9). Thus our analyses do not find support for the reviewer’s hypothesis as multiple origins of this feature are still required.

      It was still an excellent point that we should consider this hypothesis, and we have retained it, and discussion surrounding it, in our manuscript.

      The introduction to the paper does not inform the reader that all olenelloids had a marginal suture - a circumcephalic suture that was operative in their molting and that this is quite different from the situation in, say, "Cedaria" woosteri in which the only operative cephalic exoskeletal suture was circumocular. The conservative position would be that the olenelloid marginal suture is the homolog of the marginal suture in A. serratum: the ventral plates thus being homolog of the trilobite cephalic doublure, not only potential homolog to the entire or dorsal only part of the free cheeks of trilobites with a dorsal facial suture. As the authors of this paper decline to discuss the doublure of trilobites (there is a sole mention of the word in the MS, in a figure caption) and do not mention the olenelloid marginal suture, they give the reader no opportunity to assess support for this alternative. 

      At times the paper reads as if the authors are suggesting that olenelloids, which had a marginal cephalic suture broadly akin to that in Limulus, actually lacked a suture that permitted anterior egression during molting. The authors are right to stress the origin of the dorsal cephalic suture in more derived trilobites as a character seemingly of taxonomic significance but lines such as 56 and 67 may be taken by the non-specialist to imply that olenelloids lacked a forward egressionpermiting suture. There is a notable difference between not knowing whether sutures existed (a condition apparently quite common among soft-bodied artiopods) and the well-known marginal suture of olenelloids, but as the MS currently reads most readers will not understand this because it remains unexplained in the MS. 

      As noted in response to a previous point (above) we now have a greatly expanded introduction which should give the reader an opportunity to assess support for this alternative hypothesis. We now include Olenellus getzi in our analyses, and have added characters to the morphological matrix to make this clear.

      A reference to the case of ‘Cedaria’ woosteri is made in the introduction to highlight further the variability of trilobites, as is a reference to Foote’s analysis of cranidial shapes and support this provides for a  single origin of the dorsal suture.

      With that in mind, it is also worth further stressing that the primary function of the dorsal sutures in those which have them is essentially similar to the olenelloid/limulid marginal suture mentioned above. It is notable that the course of this suture migrated dorsally up from the margin onto the dorsal shield and merged with the circumocular suture, but this innovation does not seem to have had an impact on its primary function - to permit molting by forward egression. Other trilobites completely surrendered the ability to molt by forward egression, and there are even examples of this occurring ontogenetically within species, suggesting a significant intraspecific shift in suture functionality and molting pattern. The authors mention some of this when questioning the unique origin of the dorsal facial suture of trilobites, although I don't understand their argument: why should the history of subsequent evolutionary modification of a character bear on whether its origin was unique in the group? 

      We include reference to evolutionary modification and loss of this character as it is important to stress that if a character is known to have been lost multiple times it is possible that it had a deeper root (in an earlier diverging member of Artiopoda than Trilobita) and was lost in olenelloids. This is the question that we seek to address in our manuscript.

      The bottom line here is that for the ventral plates of A. serratum to be strict homologs of only the dorsal portion of the dorsal free cheeks, there would be no homolog of the trilobite doublure in A. serratum. The conventional view, in contrast, would be that the ventral plates are a homolog of the ventral doublure in all trilobites and ventral plates in artiopods. I do not think that this paper provides a convincing basis for preferring their interpretation, nor do I feel that it does an adequate job of explaining issues that are central to the subject. 

      We stress that our interpretations – that the ventral plates are not homologous to any artiopodan feature or that they are homologous to the free cheeks of trilobites – have both been raised in the literature before. Whereas we could not find mention of the reviewer’s ‘conventional view’ relating to Acanthomeridion. We appreciate that this view is still valid and worth investigating, which we have done in the further analyses conducted. However, we did not find support for it. Instead we find some support for both ventral plates as homologues of free cheeks, and as unique structures within Artiopoda.

      Concern 2. Varieties of dorsal sutures and the coexistence of dorsal and marginal sutures 

      The authors do not clarify or discuss connections between the circumocular sutures (a form of dorsal suture that separates the visual surface from the rest of the dorsal shield) and the marginal suture that facilitates forward egression upon molting. Both structures can exist independently in the same animal - in olenelloids for example. Olenelloids had both a suture that facilitated forward egression in molting (their marginal suture) and a dorsal suture (their circumocular suture). The condition in trilobites with a dorsal facial suture is that these two independent sutures merged - the formerly marginal suture migrating up the dorsal pleural surface to become confluent with the circumocular suture. (There are also interesting examples of the expansion of the circumocular suture across the pleural fixigena.) The form of the dorsal facial suture has long figured in attempts at higher-level trilobite taxonomy, with a number of character states that commonly relate to the proximity of the eye to the margin of the cephalic shield. The form of the dorsal facial suture that they illustrate in Xanderella, which is barely a strip crossing the dorsal pleural surface linking marginal and circumocular suture, is comparable to that in the trilobites Loganopeltoides and Entomapsis but that is a rare condition in that clade as a whole. The paper would benefit from a clear discussion of these issues at the beginning - the dorsal facial suture that they are referring to is a merged circumcephalic suture and circumocular suture - it is not simply the presence of a molt-related suture on the dorsal side of the cephalon. 

      We have added in an expanded introduction where these points are covered in detail. We appreciate that this was not clear in the earlier version, and this suggestion has greatly improved our work.

      Concern 3. Phylogenetics 

      While I appreciate that the phylogenetic database is a little modified from those of other recent authors, still I was surprised not to find a character matrix in the supplementary information (unless it was included in some way I overlooked), which I would consider a basic requirement of any paper presenting phylogenetic trees - after all, there's no a space limit. It is not possible for a reviewer to understand the details of their arguments without seeing the character states and the matrix of state assignments. 

      A link to a morphobank project was included in the first submission. This project has been updated for the current submission, including an additional matrix to treat the reviewer’s hypothesis for the ventral plates. Morphobank Project #P4290. Email address: P4290, reviewer password:

      Acanthomeridion2023, accessible at morphobank.org. We have added in additional details for the reviewer and others to help them access the project:

      The project can be accessed at morphobank.org, using the below credentials to log in:  Email address: P4290, Password: Acanthomeridion 2023.

      The section "phylogenetic analyses" provides a description of how tree topology changes depending on whether sutures are considered homologous or not using the now standard application of both parsimony and maximum likelihood approaches but, considering that the broader implications of this paper rest of the phylogenetic interpretation, I also found the absence of detailed discussion of the meaning and implications of these trees to be surprising, because I anticipated that this was the main reason for conducting these analysis. The trees are presented and briefly described but not considered in detail. I am troubled by "Circles indicate presence of cephalic ecdysial sutures" because it seems that in "independent origin of sutures" trilobites are considered to have two origins (brown color dot) of cephalic ecdysial sutures - this may be further evidence that the team does not appreciate that olenelloids have cephalic ecdysial sutures, as the basal condition in all trilobites. Perhaps I'm misunderstanding their views, but from what's presented it's not possible to know that. Similarly, in the "sutures homologous" analyses why would there be two independent green dots for both Acanthomeridion and Trilobita, rather than at the base of the clade containing them both, as cephalic ecdysial sutures are basal to both of them? Here again, we appear to see evidence that the team considers dorsal facial sutures and cephalic ecdysial sutures to be synonymous - which is incorrect.  

      We appreciate that the reviewer misunderstood the meaning of the dots, leading to confusion. The dots indicated how features were coded in the phylogenetic analysis. In our revised version of this figure (Figure 8 in the new version), these dots are now clearly labelled as indicating ‘coding in phylogenetic matrix’. Further, with the revised character list, we now can provide additional detail for the types of sutures (relevant as we now include more trilobite terminals).

      This point aside, and at a minimum, that team needs to do a more thorough job of characterizing and considering the variety of conditions of dorsal sutures among artiopods, their relationships to the marginal suture and to the circumocular suture, the number, and form of their branches, etc. 

      We thank the reviewer for this summary, and appreciate their concerns and thorough review. Our revised version takes into account all these points raised, and they have greatly improved the clarity, scope and thoroughness of the work.

      Reviewer #3 (Public Review): 

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are shown to be associated with ventral plates that the authors very reasonably homologise with the free cheeks of trilobites. A slight update of a phylogenetic dataset developed by Du et al, then refined slightly by Chen et al, then by Schmidt et al, and again here, permits another attempt to optimise the number of origins of dorsal ecdysial sutures in trilobites and their relatives. 

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variations within a single species. New microtomographic data shed some light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites. 

      We thank the Reviewer 3 for their positive comments about the manuscript. We appreciate the constructive comments for improvements, and detailed corrections, which we have incorporated into our revised work.

      Weaknesses:

      The main conclusion remains clouded in ambiguity because of a poorly resolved Bayesian consensus and is consistent with work led by the lead author in 2019 (thus compromising the novelty of the findings). The Bayesian trees being majority rules consensus trees, optimising characters onto them (Figure 7b, d) is problematic. Optimising on a consensus tree can produce spurious optimisations that inflate tree length or distort other metrics of fit. Line 264 refers to at least three independent origins of cephalic sutures in artiopodans but the fully resolved Figure 7c requires only two origins. 

      We thank the reviewer for pointing this out. However now the analyses have been re-run we have new results to consider. The results still support multiple origins of sutures. We also note that the dots were indicating how terminals were coded. This is now clearer in the revised version of this figure (Figure 8 in the new version).

      We have extended our interrogation of the trees by incorporating treespace analyses. These add support for the nodes of interest (around the base of trilobites), showing that the coding of Acanthomeridion ventral plate homologies impacts its position in the tree, and thus has implications for our understanding of the evolution of sutures in trilobites.

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in

      Zhiwenia/Protosutura, Acanthomeridion, and Trilobita. To their credit, the authors acknowledge this (lines 62-65). The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). 

      The following points are not meant to be "Weaknesses" but rather are refinements: 

      I recommend changing the title of the paper from "cephalic sutures" to "dorsal ecdysial sutures" to be more precise about the character that is being tracked evolutionarily. Lots of arthropods have cephalic sutures (e.g., the ventral marginal suture of xiphosurans; the Y-shaped dorsomedian ecdysial line in insects). The text might also be updated to change other instances of "cephalic sutures" to a more precise wording. 

      We appreciate this point and have changed the title as suggested. 

      The authors have provided (but not explicitly identified) support values for nodes in their Bayesian trees but not in their parsimony ones. Please do the jackknife or bootstrap for the parsimony analyses and make it clear that the Bayesian values are posterior probabilities. 

      With the addition of further trilobite terminals to our parsimony analyses, the results became poor.

      Specifically the internal relationships of trilobites did not conform to any previous study, and Olenellus getzi was not resolved as an early diverging member of the group. This meant that these analyses could not be used for addressing the hypothesis of reviewer two. We decided to exclude reporting parsimony analysis results from this version to avoid confusion.

      We have added a note that the values reported at the nodes are posterior probabilities to figures S8, S9 and S10 where we show the full Bayesian results.

      In line 65 or somewhere else, it might be noted that a single origin of the dorsal facial sutures in trilobites has itself been called into question. Jell (2003) proposed that separate lineages of Eutrilobita evolved their facial sutures independently from separate sister groups within Olenellina. 

      We have added this to the introduction (Line 98). Thank you for raising this point.

      I have provided minor typographic or terminological corrections to the authors in a list of recommendations that may not be publicly available. 

      We appreciate the points made by the reviewer and their detailed corrections, which we have corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper the authors provide a characterisation of auditory responses (tones, noise, and amplitude modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristic with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group have previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised appears to be more responsive to more complex sounds (amplitude modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gaba'ergic modules in LC. However, while both LC and DC appears to have low frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice somatosensory inputs are capable of driving responses on its own in the modules of LC, but very little in the matrix. The authors now compare bimodal interactions under anaesthesia and awake states and find that effects are different in some cases under awake and anesthesia - particularly related to bimodal suppression and enhancement in the modules.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      The manuscript is improved by the response to reviewers. The authors have addressed my comments by adding new figures and panels, streamlining the analysis between awake and anaesthetised data (which has led to a more nuanced, and better supported conclusion), and adding more examples to better understand the underlying data. In streamlining the analyses between anaesthetised and awake data I would probably have opted for bringing these results into merged figures to avoid repetitiveness and aid comparison, but I acknowledge that that may be a matter of style. The added discussions of differences between awake and anaesthesia in the findings and the discussion of possible reasons why these differences are present help broaden the understanding of what the data looks like and how anaesthesia can affect these circuits.

      As mentioned in my previous review, the strength of this study is in its demonstration of using prism 2p imaging to image the lateral shell of IC to gain access to its neurochemically defined subdivisions, and they use this method to provide a basic description of the auditory and multisensory properties of lateral cortex IC subdivisions (and compare it to dorsal cortex of IC). The added analysis, information and figures provide a more convincing foundation for the descriptions and conclusions stated in the paper. The description of the basic functionality of the lateral cortex of the IC are useful for researchers interested in basic multisensory interactions and auditory processing and circuits. The paper provides a technical foundation for future studies (as the authors also mention), exploring how these neurochemically defined subdivisions receiving distinct descending projections from cortex contribute to auditory and multisensory based behaviour.

      Minor comment:

      - The authors have now added statistics and figures to support their claims about tonotopy in DC and LC. I asked for and I think allows readers to better understand the tonotopical organisation in these areas. One of the conclusions by the authors is that the quadratic fit is a better fit that a linear fit in DCIC. Given the new plots shown and previous studies this is likely true, though it is worth highlighting that adding parameters to a fitting procedure (as in the case when moving from linear to quadratic fit) will likely lead to a better fit due to the increased flexibility of the fitting procedure.

      Thank you for the suggestion. We have highlighted that the quadratic function allowed the regression model to include the cells tuned to higher frequencies at the rostromedial part of the DC and result in a better fit, which is consistent with the tonotopic organization that was previously described as shown in text at (lines 208-211).

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      A major achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons) and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it and the writing is not quite as precise as it could be.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were overall more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was different in the awake prep, where modular neurons became more responsive to somatosensory stimuli. Thus, to this reviewer, one of the most intriguing results of the present study is the extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggests that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, the limitations of two-photon imaging for tracking neural activity are acknowledged, and appropriate statistical tests were used.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Increase font size of scale bars on figure 6.

      Thank you for the suggestion. We have increased the font size of the scale bar.

      Reviewer #2 (Recommendations For The Authors):

      Line 505: typo: 'didtinction'

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 506).

      No further comments.

      Reviewer #3 (Recommendations For The Authors):

      Line 543: Change "contripute" to "contribute"

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 544).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      The authors indicated that the adherence of ETEC is to intestinal epithelial cells. However, it is also possible that the majority of ETEC may reside in the intestinal mucus, particularly under in vivo infection condition. The colonization of ETEC in the jejunum and colon of piglets (Fig 2C) and in the intestines of mice (Fig S2A) does not necessarily reflect the adherence of ETEC to epithelial cells. Please verify these observations with other methods, such as immunostaining. Also, while Salmonella enterica serovar Typhimurium or Listeria monocytogenes can invade organoids within 1 hour, it is unknown if ETEC invade into organoids in this study. Clarifying this will help resolve if A. muciniphila block the adherence and/or invasion of ETEC. Please also address if A. muciniphila metabolites could prevent ETEC infection in the organoid models.

      In the original manuscript, the sentence “ETEC K88 adheres to intestinal epithelial cells and induces gut inflammation (Yu et al., 2018)” in line 447 is a reference cited for the purpose of connecting the previous and the following, and it is not our result. We have deleted this sentence on line 457. Previous studies have shown that ETEC enter into intestinal epithelial cells after only one hour of infection (Xiao et al., 2022; Qian et al., 2023). Whether A. muciniphila metabolites prevent ETEC infection in the organoid models is not the focus of this manuscript, it may be further explored by other members of the research group in the future.

      References:

      Xiao K, Yang Y, Zhang Y, Lv QQ, Huang FF, Wang D, Zhao JC, Liu YL. 2022. Long-chain PUFA ameliorate enterotoxigenic Escherichia coli-induced intestinal inflammation and cell injury by modulating pyroptosis and necroptosis signaling pathways in porcine intestinal epithelial cells. Br. J. Nutr. 128(5):835-850.

      Qian MQ, Zhou XC, Xu TT, Li M, Yang ZR, Han XY. 2023. Evaluation of Potential Probiotic Properties of Limosilactobacillus fermentum Derived from Piglet Feces and Influence on the Healthy and E. coli-Challenged Porcine Intestine. Microorganisms. 11(4).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      After revision, the bioinformatics section of the methods is still jumbled and may indicate issues in the pipeline. Important parameters are not included to replicate analyses. Merging the forward and reverse reads may represent a problem for denoising. Chimera detection was performed prior to denoising.

      Potential denoising issues for NovaSeq data was not addressed in the response. The authors did not clarify if multiple testing correction was applied; however, it may be assumed not as written. The raw sequencing data made available through the SRA accession (if for the correct project) indicates it was a MiSeq platform; however, the sample names do not appear to link up to this experimental design and metadata not sufficient to replicate analyses.

      We have redescribed the method for microbiome sequencing analysis on lines 298-327.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      SRA accession must be confirmed and metadata made available.

      We updated the SRA data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) In the first paragraph of the result section it is not clear why the authors introduce the function of p53ΔAS/ΔAS in thymocyte and then they mention fibroblasts. The authors should clarify this point. The authors should also explain based on what rationale they use doxorubicin and nutlin to analyze p53 activity (Figure 1 and figure S1). 

      We thank the reviewer for this comment. In the revised manuscript, we corrected this by mentioning, at the beginning of the Results section: “We analyzed cellular stress responses in thymocytes, known to undergo a p53-dependent apoptosis upon irradiation (Lowe et al., 1993), and in primary fibroblasts, known to undergo a p53-dependent cell cycle arrest in response to various stresses - e.g. DNA damage caused by irradiation or doxorubicin (Kastan et al., 1992), and the Nutlin-mediated inhibition of Mdm2, a negative regulator of p53 (Vassilev et al., 2004).”

      (2) The authors should provide quantification for the western blot in figure 2D because the reduction of p53 protein level in mutant vs wt tumors is not striking. 

      In the previous version of the manuscript, the quantification of p53 bands had been included, but quantification results were mentioned below the actin bands, rather than the p53 bands, and this was probably confusing. We have corrected this in the revised version of the manuscript. The quantification results are now provided just below the p53 bands in Figs. 1B and 2D, which should clarify this point. For Figure 2D, the quantifications show a strong decrease in p53 levels for 3 out of 4 analyzed mutant tumors. For consistency purposes, in the revised manuscript the quantification results also appear below Myc bands in Fig. 2C.

      (3) In the discussion section, the authors propose that a difference in Ackr4 expression may have prognostic value and that measuring ACKR4 gene expression in male patients with Burkitt lymphoma could be useful to identify the patients at higher risk. However the authors perform a lot of correlative analysis, both in mice and in patients, but the manuscript lacks of functional experiments that could help to functionally characterize Ackr4 and Mt2 in the etiology of B-cell lymphomas in males (both in mouse and in human models).

      In the previous version of the manuscript, we proposed that Ackr4 might act as a suppressor of B-cell lymphomagenesis by attenuating Myc signaling. This hypothesis relied on studies showing that Ackr4 impairs the Ccr7 signaling cascade, which may lead to decreased Myc activity (Ulvmar et al., 2014; Shi et al., 2015; Bastow et al., 2021) and that the loss of Ccr7 may delay Myc-driven lymphomagenesis (Rehm et al., 2011). Furthermore, we proposed that the increased expression of Mt2 in p53ΔAS/ΔAS Em-Myc male splenic cells reflected an increase in Myc activity, because Mt2 is known to be regulated by Myc (Qin et al., 2021) and because the Mt2 promoter is bound by Myc in B cells according to experiments reported in the ChIP-Atlas database. However, in the first version of the manuscript this hypothesis might have appeared only partially supported by our data because an increase in Myc activity could be expected to have a more general impact, i.e. an impact not only on the expression of Mt2, but also on the expression of many canonical Myc target genes. In the revised manuscript, we show that this is indeed the case. We performed a gene set enrichment analysis (GSEA) comparing the RNAseq data from p53ΔAS/ΔAS Eμ-Myc and p53+/+ Eμ-Myc male splenic cells and found an enrichment of hallmark Myc targets in p53ΔAS/ΔAS Eμ-Myc cells. These new data, which strengthen our hypothesis of differences in Myc signaling intensity, are presented in Fig. 3K and Table S2.

      Importantly, we now go beyond correlative analyses by providing direct experimental evidence that ACKR4 impacts on the behavior of Burkitt lymphoma cells. We used a CRISPR-Cas9 approach to knock-out ACKR4 in Raji Burkitt lymphoma cells and found that ACKR4 KO cells exhibited a 4-fold increase in chemokine-guided cell migration. These new data are presented in Figure 4F and the supplemental Figures S5-S7.  

      Finally, following a suggestion of Reviewer#2, we now also point out that “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.”

      In sum, we now mention in the Discussion that a decrease in Ackr4 expression might promote B-cell lymphomagenesis through three non-exclusive mechanisms.

      Reviewer #2 (Recommendations For The Authors): 

      (1) A great addition would be to demonstrate how p53AS specifically contributes to the regulation of Ackr4. In particular, is there evidence that p53AS might be preferentially recruited on p53 RE within that gene as compared to WT? The availability of specific antibodies that distinguish between AS and WT p53 might help to address this (experimentally complex) question. As a note, usage of such antibodies would also strengthen Fig 1B, in which the AS isoform appears as a mere faint shadow under p53, thus making its "disappearance" in trp53ΔAS/ΔAS difficult to evaluate. 

      We agree with the referee that efficient antibodies against p53-AS isoforms would have been useful. In fact, we tried a non-commercial antibody developed for that purpose, but it led to many unspecific bands in western blots and appeared not reliable. Importantly however, our luciferase assays clearly show that both p53-a and p53-AS can transactivate Ackr4, a result that might be expected because these isoforms share the same DNA binding domain. Furthermore, because p53-a isoforms appear more abundant than p53-AS isoforms at the protein and RNA levels (Figs. 1B and S1A), and because the loss of p53-AS isoforms leads to a significant decrease in p53-a protein levels (Figs. 1B and 2D), we think that in p53ΔAS/ΔAS cells the reduction in p53-a levels might be the main reason for a decreased transactivation of Ackr4. This is now more clearly discussed in the revised manuscript.

      (2) A most interesting observation is in Fig3 A and Fig S3, showing that spleen cells of p53ΔAS Eμ-Myc males (but not females) were enriched in pre-B and immature B cells as compared to WT counterparts. This observation points to a possible defect in B cell maturation process. It would be most interesting to determine whether this particular defect is directly mediated by a p53AS-Ackr4 axis. The hypothesis raised by the authors in the Discussion section is that increased Ackr4 expression may delay lymphomatogenesis, but data in Fig 3A and 3S actually suggest that ΔAS increases the pool of immature B-cell that may be prone to lymphomagenesis. 

      We thank the reviewer for this useful comment, which we integrated in the Discussion of the revised manuscript. Ackr4 was shown to regulate B cell differentiation (Kara at al. (2018) J Exp Med 215, 801–813), so this is indeed one of the possible mechanisms by which a deregulation of the p53-Ackr4 axis might promote lymphomagenesis. We now mention: “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.” This is presented as one of three possible mechanisms by which decreased Ackr4 levels may promote tumorigenesis, the two others being the impact of Ackr4 on the chemokine-guided migration of lymphoma cells and its apparent effect on Myc signalling.

      (3) The concordance with a male-specific prognostic effect of Ackr4 is most interesting in itself but is only of correlative evidence with respect to the study. Is there any information on whether p53AS expression is also a prognostic factor in BL? And is there evidence that Ackr4 may also be a male-specific prognostic factor in other B-cell malignancies, e.g. Multiple Myeloma?

      We have now performed the CRISPR-mediated knock-out of ACKR4 in Burkitt lymphoma cells and found that it leads to a dramatic increase in chemokine-guided cell migration, which goes beyond correlation. This significant new result is mentioned in the revised abstract and presented in detail in Figures 4F and S5-S7.

      Regarding p53-AS isoforms, they are murine-specific isoforms (Marcel et al. (2011) Cell Death Diff 18, 1815-1824), so there is no information on p53-AS expression in Burkitt lymphoma. Human p53 isoforms with alternative C-terminal domains are p53b and p53g isoforms, but the datasets we analyzed did not provide any information on the relative levels of p53a (the canonical isoform), p53b or p53g isoforms. We agree with the referee that this is an interesting question, but that cannot be answered with currently available datasets.

      Regarding the different types of B-cell malignancies, we had already shown that Ackr4 is a male-specific prognostic factor in Burkitt lymphomas but not in Diffuse Large B cell lymphomas, which indicated that it is not a prognostic factor in all types of B cell lymphomas. For this revision, we also searched for its potential prognostic value in multiple myeloma, and found that, as for DLBCL, it is not a prognostic factor in this cancer type. This new analysis is presented in Figure S4C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: This article explores the role of Ecdysone in regulating female sexual receptivity in Drosophila. The researchers found that PTTH, throughout its role as a positive regulator of ecdysone production, negatively affects the receptivity of adult virgin females. Indeed, loss of larval PTTH before metamorphosis significantly increases female receptivity right after adult eclosion and also later. However, during metamorphic neurodevelopment, Ecdysone, primarily through its receptor EcR-A, is required to properly develop the P1 neurons since its silencing led to morphological changes associated with a reduction in adult female receptivity. Nonetheless, the result shown in this manuscript sheds light on how Ecdysone plays a dual role in female adult receptivity, inhibiting it during larval development and enhancing it during metamorphic development. Unfortunately, this dual and opposite effect in two temporally different developmental stages has not been highlighted or explained. 

      Strengths: This paper exhibits multiple strengths in its approach, employing a well-structured experimental methodology that combines genetic manipulations, behavioral assays, and molecular analysis to explore the impact of Ecdysone on regulating virgin female receptivity in Drosophila. The study provides clear and substantial findings, highlighting that removing PTTH, a positive Ecdysone regulator, increases virgin female receptivity. Additionally, the research expands into the temporal necessity of PTTH and Ecdysone function during development. 

      Weaknesses: 

      There are two important caveats with the data that are reflecting a weakness: 

      (1) Contradictory Effects of Ecdysone and PTTH: One notable weakness in the data is the contrasting effects observed between Ecdysone and its positive regulator PTTH. PTTH loss of function increases female receptivity, while ecdysone loss of function reduces it. Given that PTTH positively regulates Ecdysone, one would expect that the loss of function of both would result in a similar phenotype or at least a consistent directional change. 

      A1. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A expression in the whole body of newly formed prepupae compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcRA at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.  

      (2) Discordant Temporal Requirements for Ecdysone and PTTH: Another weakness lies in the different temporal requirements for Ecdysone and PTTH. The data from the manuscript suggest that PTTH is necessary during the larval stage, as shown in Figure 2 E-G, while Ecdysone is required during the pupal stage, as indicated in Figure 5 I-K. Ecdysone is a crucial developmental hormone with precisely regulated expression throughout development, exhibiting several peaks during both larval and pupal stages. PTTH is known to regulate Ecdysone during the larval stage, specifically by stimulating the kinetics of Ecdysone peaking at the wandering stage. However, it remains unclear whether pupal PTTH, expressed at higher levels during metamorphosis, can stimulate Ecdysone production during the pupal stage. Additionally, given the transient nature of the Ecdysone peak produced at wandering time, which disappears shortly before the end of the prepupal stage, it is challenging to infer that larval PTTH will regulate Ecdysone production during the pupal stage based on the current state of knowledge in the neuroendocrine field.  

      Considering these two caveats, the results suggest that the authors are witnessing distinct temporal and directional effects of Ecdysone on virgin female receptivity.  

      A2. First of all, it is necessary to clarify the detailed time for the manipulation of Ptth gene and PTTH neurons. In Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      Reviewer #1 (Recommendations For The Authors): 

      In light of the significant caveat previously discussed, I will just make a few general suggestions: 

      (1) The paper primarily focuses on robust phenotypes, particularly in PTTH mutants, with a well-detailed execution of several experiments, resulting in thorough and robust outcomes. However, due to the caveat previously presented (opposite effect in larva and pupa), consider splitting the paper into two parts: Figures 1 to 4 deal with the negative effect of PTTH-Ecdysone on early virgin female receptivity, while Figures 5 to 7 focus on the positive metamorphic effect of Ecdysone in P1 metamorphic neurodevelopment. However, in this scenario, the mechanism by which PTTH loss of function increases female receptivity should be addressed.

      A3. It is a good suggestion that splitting the paper into two parts associated with the PTTH function and EcR function in pC1 neurons separately, if it is impossible that PTTH functions in female receptivity through the function of EcR-A in pC1 neurons. However, because of the feedforward relationship between PTTH and EcR-A in the newly formed prepupae, and the time of manipulating Ptth and EcR-A in pC1 neurons is continuous, it is possible that these two functions are not independent of each other. So, we still keep the initial edition.

      (2) Validate the PTTH mutants by examining homozygous mutant phenotypes and the dose-dependent heterozygous mutant phenotype using existing PTTH mutants. This could also be achieved using RNAi techniques.

      A4. We did not get other existing PTTH mutants. We instead decreased the PTTH expression in PTTH neurons and dsx+ neurons, but did not detect the similar phenotype to that of PTTH -/-. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      (3) Clarify if elav-Gal4 is not expressed in PTTH neurons and discuss how the rescue mechanisms work (hormonal, paracrine, etc.) in the text.

      A5. We tested the overlap of elav-Gal4>GFP signal and the stained PTTH with PTTH antibody. We did not detect the overlap. It suggests that elav-Gal4 is not expressed in PTTH neurons. However, we detected the expression of PTTH (PTTH antibody) in CNS when overexpressed PTTH using elav-Gal4>UASPTTH based on PTTH -/-. Furthermore, this rescued the phenotype of PTTH -/- in female receptivity. Insect PTTH isoforms have similar probable signal peptide for secreting. Indeed, except for the projection of axons to PG gland, PTTH also carries endocrine function acting on its receptor Torso in light sensors to regulate light avoidance of larvae. The overexpressed PTTH in other neurons through elav-Gal4>UASPTTH may act on the PG gland through endocrine function and then induce the ecdysone synthesis and release. So that, although elav-Gal4 is not expressed in PTTH neurons, the ecdysone synthesis triggered by PTTH from the hemolymph may result in the rescued PTTH -/- phenotype in female receptivity.

      (4) Consider renaming the new PTTH mutant to avoid confusion with the existing PTTHDelta allele. 

      A6. We have renamed our new PTTH mutant as PtthDelete.

      (5) Include the age of virgin females in each figure legend, especially for Figures 2 to 7, to aid in interpretation. This is essential information since wild-type early virgins -day 1- show no receptivity. In contrast, they reach a typical 80% receptivity later, and the mechanism regulating the first face might differ from the one occurring later.

      A7. We have included the age of virgin females in each figure legend. 

      (6) Explain the relevance of observing that PTTH adult neurons are dsx-positive, as it's unclear why this observation is significant, considering that these neurons are not responsible for the observed receptivity effect in virgin females. Alternatively, address this in the context of the third instar larva or clarify its relevance.  

      A8. We decreased the DsxF expression in PTTH neurons and did not detect significantly changed female receptivity. Almost all neurons regulating female receptivity, including pC1 neurons, express DsxF. We suppose that PTTH neurons have some relationship with other DsxF-positive neurons which regulate female receptivity. Indeed, we detected the overlap of dsx-LexA>LexAop-RFP and torso-Gal4>UAS-GFP during larval stage. Furthermore, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. 

      These results suggest that, PTTH regulates female receptivity not only through ecdysone, but also may through regulating other neurons especially DsxF-positive neurons associated with female receptivity directly. 

      Reviewer #2 (Public Review): 

      Summary: The authors tried to identify novel adult functions of the classical Drosophila juvenile-adult transition axis (i.e. ptth-ecdysone). Surprisingly, larval ptth-expressing neurons expressed the sex-specific doublesex gene, thus belonging to the sexual dimorphic circuit. Lack of ptth during late larval development caused enhanced female sexual receptivity, an effect rescued by supplying ecdysone in the food. Among many other cellular players, pC1 neurons control receptivity by encoding the mating status of females. Interestingly, during metamorphosis, a subtype of pC1 neurons required Ecdysone Receptor A in order to regulate such female receptivity. A transcriptomic analysis using pC1-specific Ecdyone signaling down-regulation gives some hints of possible downstream mechanisms. 

      Strengths: the manuscript showed solid genetic evidence that lack of ptth during development caused enhanced copulation rate in female flies, which includes ptth mutant rescue experiments by overexpressing ptth as well as by adding ecdysone-supplemented food. They also present elegant data dissecting the temporal requirements of ptth-expressing neurons by shifting animals from non-permissive to permissive temperatures, in order to inactivate neuronal function (although not exclusively ptth function). By combining different drivers together with a EcR-A RNAi line authors also identified the Ecdysone receptor requirements of a particular subtype of pC1 neurons during metamorphosis. Convincing live calcium imaging showed no apparent effect of EcR-A in neural activity, although some effect on morphology is uncovered. Finally, bulk RNAseq shows differential gene expression after EcR-A down-regulation. 

      Weaknesses: the paper has three main weaknesses. The first one refers to temporal requirements of ptth and ecdysone signaling. Whereas ptth is necessary during larval development, the ecdysone effect appears during pupal development. ptth induces ecdysone synthesis during larval development but there is no published evidence about a similar role for ptth during pupal stages. Furthermore, larval and pupal ecdysone functions are different (triggering metamorphosis vs tissue remodeling). The second caveat is the fact that ptth and ecdysone loss-of-function experiments render opposite effects (enhancing and decreasing copulation rates, respectively). The most plausible explanation is that both functions are independent of each other, also suggested by differential temporal requirements. Finally, in order to identify the effect in the transcriptional response of down-regulating EcR-A in a very small population of neurons, a scRNAseq study should have been performed instead of bulk RNAseq. 

      In summary, despite the authors providing convincing evidence that ptth and ecdysone signaling pathways are involved in female receptivity, the main claim that ptth regulates this process through ecdysone is not supported by results. More likely, they'd rather be independent processes. 

      B1. Clarification: in Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the start of prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      B2. During the forming of prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect the development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcR-A at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.

      B3. We will do single cell sequencing in pC1 neurons for the exploration of detailed molecular mechanism of female receptivity in the future.

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments and suggestions: 

      - torso LOF in the PG to determine whether or not the ecdysone peak regulated by ptth (there is a 1-day delay in pupation) is responsible for the ptth effect in L3. In the same line, what happens if torso is downregulated in the pC1 neurons? Is there any effect on copulation rates? 

      B4. Because the loss of phm-Gal4, we could not test female receptivity when decreasing the expression of Torso in PG gland. However, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. This suggests that PTTH regulates female receptivity not only through ecdysone but also through regulating dsx+ pC1 neurons in female receptivity directly.

      - What is the effect of down-regulating ptth in the dsx+ neurons? No ptth RNAi experiments are shown in the paper. 

      B5. We decreased PTTH expression in dsx+ neurons but did not detect the change in female receptivity.  We also decreased PTTH expression in PTTH neurons using PTTH-Gal4, also did not detect the change in female receptivity. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      - Why are most copulation rate experiments performed between 4-6 days after eclosion? ptth LOF effect only lasts until day 3 after eclosion (but very weak-fig 1). Again, this supports the idea that ptth and ecdysone effects are unrelated.

      B6. Most behavioral experiments were performed between 4-6 days after eclosion as most other studies in flies, because the female receptivity reaches the peak at that time. Ptth LOF made female receptivity enhanced from the first day after eclosion. This seems like the precocious puberty. Wild type females reach high receptivity at 2 days after eclosion (about 75% within 10 min). We suppose that Ptth LOF effect only lasts until day 3 after eclosion because too high level of receptivity of control flies to exceed.

      It is not sure whether the effect of PTTH-/- in female receptivity disappears after the 3rd day of adult flies. So that it is not sure whether PTTH and EcR-A effects in pC1 neurons are unrelated.

      - The fact that pC1d neuronal morphology changes (and not pC1b) does not explain the effect of EcR-A LOF. Despite it is highlighted in the discussion, data do not support the hypothesis. How do these pC1 neurons look like in a ptth mutant animal regarding Calcium imaging and/or morphology? 

      B7. We detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. However, it is not sure that the expression of EcR-A in pC1 neurons is increased when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment not only regulating pC1 neurons. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity when EcR-A is decreased in pC1 neurons or PTTH is deleted could not be seen clearly. So, the abnormal development of pC1-b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      - The discussion is incomplete, especially the link between ptth and ecdysone; discuss why the phenotype is the opposite (ptth as a negative regulator of ecdysone in the pupa, for instance); the difference in size due to ptth LOF might be related to differential copulation rates.  

      B8. We have revised the discussion. We could not exclude the effect of size of body on female receptivity when PTTH was deleted or PTTH neurons were manipulated, although there was not enough evidence for the effect of body size on female receptivity.

      - scheme of pC neurons may help. 

      B9. We have tried to label pC1 neurons with GFP and sort pC1 neurons through flow cytometry sorting, but could not success. This may because the number of pC1 neurons is too low in one brain. We will try single-cell sequencing in the future. 

      - Immunofluorescence images are too small.

      B10. We have resized the small images.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript shows that mutations that disable the gene encoding the PTTH gene cause an increase in female receptivity (they mate more quickly), a phenotype that can be reversed by feeding these mutants the molting hormone, 20-hydoxyecdysone (20E). The use of an inducible system reveals that inhibition or activation of PTTH neurons during the larval stages increases and decreases female receptivity, respectively, suggesting that PTTH is required during the larval stages to affect the receptivity of the (adult) female fly. Showing that these neurons express the sex-determining gene dsx leads the authors to show that interfering with 20E actions in pC1 neurons, which are dsx-positive neurons known to regulate female receptivity, reduces female receptivity and increases the arborization pattern of pC1 neurons. The work concludes by showing that targeted knockdown of EcRA in pC1 neurons causes 527 genes to be differentially expressed in the brains of female flies, of which 123 passed a false discovery rate cutoff of 0.01; interestingly, the gene showing the greatest down-regulation was the gene encoding dopamine beta-monooxygenase. 

      Strengths 

      This is an interesting piece of work, which may shed light on the basis for the observation noted previously that flies lacking PTTH neurons show reproductive defects ("... females show reduced fecundity"; McBrayer, 2007; DOI 10.1016/j.devcel.2007.11.003). 

      Weaknesses: 

      There are some results whose interpretation seem ambiguous and findings whose causal relationship is implied but not demonstrated. 

      (1) At some level, the findings reported here are not at all surprising. Since 20E regulates the profound changes that occur in the central nervous system (CNS) during metamorphosis, it is not surprising that PTTH would play a role in this process. Although animals lacking PTTH (rather paradoxically) live to adulthood, they do show greatly extended larval instars and a corresponding great delay in the 20E rise that signals the start of metamorphosis. For this reason, concluding that PTTH plays a SPECIFIC role in regulating female receptivity seems a little misleading, since the metamorphic remodeling of the entire CNS is likely altered in PTTH mutants. Since these mutants produce overall normal (albeit larger--due to their prolonged larval stages) adults, these alterations are likely to be subtle. Courtship has been reported as one defect expressed by animals lacking PTTH neurons, but this behavior may stand out because reduced fertility and increased male-male courtship (McBrayer, 2007) would be noticeable defects to researchers handling these flies. By contrast, detecting defects in other behaviors (e.g., optomotor responses, learning and memory, sleep, etc) would require closer examination. For this reason, I would ask the authors to temper their statement that PTTH is SPECIFICALLY involved in regulating female receptivity.  

      C1. We agree with that, it is not surprising that PTTH regulates the profound changes that occur in the CNS during metamorphosis through ecdysone. Also, the behavioral changes induced by PTTH mutants include not only female receptivity. We will temper the statement about the function of PTTH on female receptivity.

      We think there are two new points in our text although more evidences are needed in the future. On the one hand, PTTH deletion and the reduction of EcR-A in pC1 neurons during metamorphosis have opposite effects on female receptivity. On the other hand, development of pC1-b neurons regulated by EcR-A during metamorphosis is important for female receptivity.

      (2) The link between PTTH and the role of pC1 neurons in regulating female receptivity is not clear. Again, since 20E controls the metamorphic changes that occur in the CNS, it is not surprising that 20E would regulate the arborization of pC1 neurons. And since these neurons have been implicated in female receptivity, it would therefore be expected that altering 20E signaling in pC1 neurons would affect this phenotype. However, this does not mean that the defects in female receptivity expressed by PTTH mutants are due to defects in pC1 arborization. For this, the authors would at least have to show that PTTH mutants show the changes in pC1 arborization shown in Fig. 6. And even then the most that could be said is that the changes observed in these neurons "may contribute" to the observed behavioral changes. Indeed, the changes observed in female receptivity may be caused by PTTH/20E actions on different neurons.

      C2. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al., 2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced upregulated EcR-A in the whole body of newly formed prepupae compared with PTTH -/+ flies. We also detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. 

      However, it is not sure that the expression of EcR-A in pC1 neurons increases compared with genetic controls when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity through EcR-A function in pC1 neurons could not be seen clearly. So, the abnormal development of pC1b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      (3) Some of the results need commenting on, or refining, or revising:  a- For some assays PTTH behaves sometimes like a recessive gene and at other times like a semidominant, and yet at others like a dominant gene. For instance, in Fig. 1D-G, PTTH[-]/+ flies behave like wildtype (D), express an intermediate phenotype (E-F), or behave like the mutant (G). This may all be correct but merits some comment.

      C3. Female receptivity increases with the increase of age after eclosion, not only for wild type flies but also PTTH mutants. At the first day after eclosion (Figure 1D), maybe the loss of PTTH in PTTH[-]/+ flies is not enough for sexual precocity as in PTTH -/-. At the second day after eclosion and after (Figure 1E-G), the loss of PTTH in PTTH[-]/+ flies is sufficient to enhance female receptivity compared with wild type flies. However, After the 2nd day of adult, female receptivity of all genotype flies increases sharply. At the 3rd day of adult and after, female receptivity of PTTH -/- reaches the peak and the receptivity of PTTH[-]/+ reaches more nearly to PTTH -/- when flies get older.  

      b - Some of the conclusions are overstated. i) Although Fig. 2E-G does show that silencing the PTTH neurons during the larval stages affects copulation rate (E) the strength of the conclusion is tempered by the behavior of one of the controls (tub-Gal80[ts]/+, UAS-Kir2.1/+) in panels F and G, where it behaves essentially the same as the experimental group (and quite differently from the PTTH-Gal4/+ control; blue line).(Incidentally, the corresponding copulation latency should also be shown for these data.). ii) For Fig. 5I-K, the conclusion stated is that "Knock-down of EcR-A during pupal stage significantly decreased the copulation rate." Although strictly correct, the problem is that panel J is the only one for which the behavior of the control lacking the RNAi is not the same as that of the experimental group. Thus, it could just be that when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental. Again, the results shown in J are strictly speaking correct but the statement is too definitive given the behavior of one of the controls in panels I and K. Note also that panel F shows that the UAS-RNAi control causes a massive decrease in female fertility, yet no mention is made of this fact.

      C4. i) For all figures in the text, only when all the control groups were significant different from assay group, we say the assay group is significantly different. In Figure 2E-G, the control groups were both different from the assay group only at the larval stage. The difference between two control groups may due to the genetic background. We have described more detailed statistical analysis in the legend. In addition, the corresponding copulation latency has been shown. ii) For Figure 5, we have revised the conclusion in text as “when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental.” Besides, the UAS-RNAi control causes a massive decrease in female fertility in panel F has been mentioned.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I am not sure that PTTH neurons should be referred to as "PG neurons". I am aware that this name has been used before but the PG is a gland that does not have neurons; it is not even innervated in all insects. 

      C5. Agree. “PG neurons” has been changed into “PTTH neurons”.

      (2) Fig. 1A warrants some explanation. One can easily imagine what it shows but a description is warranted. 

      C6. Explanation has been added.

      (3) When more than one genotype is compared it would be more useful to use letters to mark the genotypes that are not statistically different from each other rather than simply using asterisks. For instance, in the case of copulation latencies shown in Fig. 1E-G, which result does the comparison refer to? For example, since the comparisons are the result of ANOVAs, which comparison receives "*" in Fig. 1F? Is it PTTH[-]/+ vs PTTH[-]/PTTH[-] or vs. +/+? 

      C7. Referred genotypes and conditions were marked in all figure legends.

      (4) Fig. 1H: Why is copulation latency of PTTH[-]/PTTH[-]+elav-GAL4 significantly different from that of PTTH[-]/PTTH[-]? This merits a comment. Also, why was elav-GAL4 used to effect the rescue and not the PTTH-GAL4 driver? 

      C8. We could not explain this phenomenon. This may due to the different genetic backgrounds between controls. We have mentioned this in figure legend.

      (5) Fig. 2C, the genotype is written in a confusing order, GAL4+UAS should go together as should LexA+LexAop. 

      C9. We have revised for avoiding confusion.

      (6) In Fig. 2, is "larval stage" the same period that is shown in Fig. 3A? Please clarify.

      C10. We have clarified this in text and legends.

      (7) Fig. 6. The fact that pC1 neurons can be labeled using the pC1-ss2-Gal4 at the start of the pupal stage does not mean that this is when these neurons appear (are born), only when they start expressing this GAL4. Other types of evidence would be needed to make a statement about the birthdate of these neurons. 

      C11. We have revised the description for the appearance of pC1-ss2-Gal4>GFP. The detailed birth time of pC1 neurons will be tested in future.

      (8) The results shown in Fig. 7 are not pursued further and thus appear like a prelude to the next manuscript. Unless the authors have more to add regarding the role of one of the differentially expressed genes (e.g., dopamine beta-monooxygenase, which they single out) I would suggest leaving this result out. 

      C12. We have leave this out.

      (9) Female flies lacking PTTH neurons were reported to show lower fecundity by McBrayer et al. (2007) and should be cited. 

      C13. This important study has been cited in the first manuscript. In this revision, we have cited it again when mentioning the lower fecundity of female flies lacking PTTH neurons.

      (10) Line 230: when were PTTH neurons activated? Since they are dead by 10h post-eclosion it isn't clear if this experiment even makes sense. 

      C14. Yes, we did this for making sure that PTTH neurons do not affect female receptivity at adult stage again.

      (11) Line 338: the statements in the figures say that PTTH function is required during the larval stages, not during metamorphosis 

      C15. This has been revised as “The result suggested that EcR-A in pC1 neurons plays a role in virgin female receptivity during metamorphosis. This is consistent with that PTTH regulates virgin female receptivity before the start of metamorphosis.”

      (12) Did the authors notice any abnormal behavior in males? McBrayer et al. (2007) mention that males lacking PTTH neurons show male-male courtship. This may remit to the impact of 20E on other dsx[+] neurons. 

      C16. Yes, we have noticed that males lacking PTTH show male-male courtship. It is possible that PTTH deletion induces male-male courtship through the impact of 20E on other dsx+ or fru+ neurons. We have added the corresponding discussion.

      (13) Line 145: please define CCT at first use 

      C17. CCT has been defined.

      (14) Overall the manuscript is well written; however, it would still benefit from editing by a native English speaker. I have marked a few corrections that are needed, but I probably missed some. 

      + Line 77: "If female is not willing..." should say "If THE female is not willing..." 

      + Line 78 "...she may kick the legs, flick the wings," should say "...she may kick HER legs, flick HER wings," 

      + Lines 93-94 this sentence is unclear: "...while the neurons in that fru P1 promoter or dsx is expressed regulate some aspects..." 

      + Line 108 "...similar as the function of hypothalamic-pituitary-gonadal (HPG).." should say "...similar

      TO the function of hypothalamic-pituitary-gonadal (HPG).." 

      + Line 152 "Due to that 20E functions through its receptor EcR.." should say ""BECAUSE 20E ACTS through its receptor EcR.." 

      + Lines 155, 354 "unnormal" is not commonly used (although it is an English word); "abnormal" is usually used instead. 

      + Line 273: "....we then asked that whether ecdysone regulates" delete "that"  + Sentences lines 306-309 need to be revised.

      C18. Thank you for your suggestions. We have revised as you advise.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The manuscript lacks the conclusion section to summarize their finding. The rebuttal is too simple to state where and in which way the authors have made their revisions. In this case, please return this revision to the authors and ask them revise their contribution carefully.

      We now indicate in detail the places and the way that we make revisions. Specific revisions in sentences/words are marked with blue color in the main text where necessary. A conclusion is now provided at the end of the main text (lines 264-275). Other major revisions include:

      (1) We add Fig. 5 as a new figure to reconstruct ovule structure of Alasemenia and to compare three- and four-winged ovules. This is followed by Fig. 6 relating to mathematical analysis.

      (2) We re-organize (sequences of some) paragraphs and revise sentences in Discussion, and then divide Discussion into three parts: “Late Devonian acupulate ovules and their functions” (lines 124-150), “Late Devonian winged ovules and evolution of ovular wings” (lines 151-179), “Mathematical analysis of wind dispersal of ovules with 1-4 wings” (lines 180-262).

      (3) We move “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section from the supplementary information to the main text as the third part of Discussion (lines 180-262). The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      (4) With moving “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section from the supplementary information to the main text, five references are accordingly added to the list (lines 278-282, 296-300, 329-330).

      (5) We change the format of citing references in the main text.

      We have therefore returned your manuscript to you to allow you to make the updates necessary to address the editors comments. Please ensure that you also update your preprint with the newly revised version once complete.

      Many thanks for this allowance and we now make the necessary updates to address the editors’ and reviewers’ comments. At the same time, the new version is also provided as a preprint.

      Reviewer #1 (Public Review):

      Summary:

      Winged seeds or ovules from the Devonian are crucial to understanding the origin and early evolutionary history of wind dispersal strategy. Based on exceptionally well-preserved fossil specimens, the present manuscript documented a new fossil plant taxon (new genus and new species) from the Famennian Series of Upper Devonian in eastern China and demonstrated that three-winged seeds are more adapted to wind dispersal than one-, two- and four-winged seeds by using mathematical analysis.

      Many thanks for these positive comments by the reviewer.

      Strengths:

      The manuscript is well organised and well presented, with superb illustrations. The methods used in the manuscript are appropriate.

      Many thanks for the reviewer’s positive comments.

      Weaknesses:

      I would only like to suggest moving the "Mathematical analysis of wind dispersal of ovules with 1-4 wings" section from the supplementary information to the main text, leaving the supplementary figures as supplementary materials.

      Ok, following the suggestion, we have moved this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion. The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      Reviewer #2 (Public Review):

      Summary:

      This manuscript described the second earliest known winged ovule without a capule in the Famennian of Late Devonian. Using Mathematical analysis, the authors suggest that the integuments of the earliest ovules without a cupule, as in the new taxon and Guazia, evolved functions in wind dispersal.

      Yes, these include our description, mathematical analysis and suggestion.

      Strengths:

      The new ovule taxon's morphological part is convincing. It provides additional evidence for the earliest winged ovules, and the mathematical analysis helps to understand their function.

      Many thanks for these positive comments of the reviewer.

      Weaknesses:

      The discussion should be enhanced to clarify the significance of this finding. What is the new advance compared with the Guazia finding? The authors can illustrate the character transformations using a simplified cladogram. The present version of the main text looks flat.

      To clarify the significance of this finding, the discussion is now enhanced in the following respects. We now re-organize the contents of Discussion and divide it into three parts. These three parts are entitled “Late Devonian acupulate ovules and their functions” (lines 124-150), “Late Devonian winged ovules and evolution of ovular wings” (lines 151-179), “Mathematical analysis of wind dispersal of ovules with 1-4 wings” (lines 180-262). The third part is transformed from the original Supplementary information.

      Regarding new advance (Alasemenia) compared with Guazia and illustration of the character transformations:

      (1) we now provide a new figure (Fig. 5) to reconstruct ovule of Alasemenia and to compare the structure of these two ovules.

      (2) in the second part of Discussion, we now say “As in Alasemenia (Fig. 5a), the integumentary wings of acupulate ovule of Guazia are broad, thin and fold inwards along the abaxial side, but their numbers are four in each ovule and their free portions usually arch centripetally (Fig. 5c; Wang et al., 2022, Figure 5).”

      (3) also in the second part of Discussion, we now say “Compared to Warsteinia with short and straight wings and Guazia with long but distally inwards curving wings, Alasemenia with longer and outwards extending wings would efficiently reduce the rate of descent and be more capably moved by wind. Furthermore, the quantitative analysis in mathematics indicates that three-winged ovules such as Alasemenia are more adapted to wind dispersal than four-winged ovules including Warsteinia and Guazia (see following).”

      (4) in the third part of Discussion, we now say “Significantly, the maximum windward area of each wing of Alasemenia is greater than that of Guazia and Warsteinia with four wings. All these factors suggest that Alasemenia is well adapted for anemochory.”

      (5) in Conclusion, we now say “Compared to Famennian four-winged ovules of Warsteinia and Guazia, Alasemenia with three distally outwards extending wings shows advantage in anemochory.”

      Recommendations for the authors:

      Ok, we undertake some revisions and keep some original contents.

      Reviewer #1 (Recommendations For The Authors):

      I would only like to suggest moving the "Mathematical analysis of wind dispersal of ovules with 1-4 wings" section from the supplementary information to the main text, leaving the supplementary figures as supplementary materials.

      Ok, following the suggestion, we now move this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) The mathematical part as the supplement can be incorporated into the text.

      Ok, following the suggestion, we now move this “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section to the main text (lines 180-262). It now represents the third part of Discussion. The original paragraph headed with Mathematical analysis in Results is now modified and inserted to “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 250-256). The last paragraph in the original Supplementary information is now greatly modified and presented at the end of “Mathematical analysis of wind dispersal of ovules with 1-4 wings” section (lines 256-262).

      (2) The comparisons between three- or four-winged ovules are not addressed enough.

      We now add Fig. 5 as a new figure. Based on this figure and revisions, the comparisons between three- and four-winged ovules now include:

      a) “Their integumentary wings illustrate diversity in number (three or four per ovule), length, folding or flattening, and being straight or curving distally. As in Alasemenia (Fig. 5a), the integumentary wings of acupulate ovule of Guazia are broad, thin and fold inwards along the abaxial side, but their numbers are four in each ovule and their free portions usually arch centripetally (Fig. 5c; Wang et al., 2022, Figure 5). In contrast to Alasemenia, Warsteinia has four integumentary wings without folding and their free portions are short and straight (Rowe, 1997, TEXT-FIG. 4).” (lines 154-160).

      b) “Furthermore, the quantitative analysis in mathematics indicates that three-winged ovules such as Alasemenia are more adapted to wind dispersal than four-winged ovules including Warsteinia and Guazia (see following).” (lines 166-168).

      c) “The relative wind dispersal efficiency of three-winged seeds is obviously better than that of single- and two- winged seeds, and is close to that of four-winged seeds (Fig. 6). In addition, three-winged seeds have the most stable area of windward, which also ensures the motion stability in wind dispersal. Significantly, the maximum windward area of each wing of Alasemenia is greater than that of Guazia and Warsteinia with four wings.” (lines 256-261).

      d) “Compared to Famennian four-winged ovules of Warsteinia and Guazia, Alasemenia with three distally outwards extending wings shows advantage in anemochory.” (lines 272-274).

      (3) The significance of this finding should be well summarized with solid evidence.

      It has been summarized in Abstract (lines 19-28) and is now further summarized especially in the newly provided Conclusion (lines 264-275).

    1. Author response:

      Reviewer #1

      - The entire study is based on only 2 adult animals, that were used for both the single cell dataset and the HCR. Additionally, the animals were caught from the ocean preventing information about their age or their life history. This makes the n extremely small and reduces the confidence of the conclusions. 

      This statement is incorrect.  While the scRNAseq was indeed performed in two animals (n=2), the HCR-FISH was performed in 3-5 animals (depending on the probe used).  These were different animals from those used for the scRNAseq.  We are partly responsible for this confusion, since we did not state the number of animals used for the HSC-FISH in the manuscript. 

      - All the fluorescent pictures present in this manuscript present red nuclei and green signals being not color-blind friendly. Additionally, many of the images lack sufficient quality to determine if the signal is real. Additional images of a control animal (not eviscerated) and of a negative control would help data interpretation. Finally, in many occasions a zoomed out image would help the reader to provide context and have a better understanding of where the signal is localized. 

      Fluorescent photos will be changed to color-blind friendly colors. 

      Diagrams, arrows and new photos will be included as to guide readers to the signal

      or labeling in cells. In the original manuscript 6 out of 7 cluster validations included a photo of a normal, non-eviscerated control.  We will make certain that this is highlighted in the resubmission and that ALL figures with HCR-FISH labeling will include data from control animals.

      - The Authors frequently report the percentage of cells with a specific feature (either labelled or expressing a certain gene or belonging to a certain cluster). This number can be misleading since that is calculated after cell dissociation and additional procedures (such as staining or sequencing and dataset cleanup) that can heavily bias the ratio between cell types. Similarly, the Authors cannot compare cell percentage between anlage and mesentery samples since that can be affected by technical aspects related to cell dissociation, tissue composition and sequencing depth. 

      The Reviewer has correctly identified the limitations of using cell percentages in scRNA-seq analyses. However, these percentages do offer a general overview of the sequenced cell populations and highlight potential differences between samples. In addition, these percentages, as addressed by the Reviewer, not only emphasize the shortcommings of the dissociation methods but at the same time provide some explanation for the absence of particular cell populations, as we describe in the manuscript. In our future resubmission, we will acknowledge these limitations and inform readers of any potential biases introduced by relying on these numbers.

      - The Authors decided to validate only a few clusters and in many cases there are no positive controls (such as specific localization, specific function, changes between control and regenerating animals, co-stain) that could actually validate the cluster identity and the specificity of the selected marker. There is no validation of the trajectory analysis and there is no validation of the proliferating cluster with H3P or BrdU stainings. 

      We validated the seven clusters that were important to reach our conclusions. Six of these had controls of normal (uneviscerated) intestine.  Nonetheless we will increase the number of cluster validations and include the dividing cell cluster using BrdU.

      - It is not clear what is already known about holothurian intestine regeneration and what are the new findings in this manuscript. The Authors reference several papers throughout the whole result sectioning mentioning how the steps of regeneration, the proliferating cells, some of the markers and some of the cell composition of mesenteries and anlages was already known. 

      The manuscript presents several novel findings on holothurian intestine regeneration, including:

      - The integration of multiple cellular processes, reported for the first time within a single species, along with the identification of the specific mRNAs expressed by each involved cell population.

      - A comparative analysis of the sea cucumber anlage structure, highlighting its similarities to previously described blastemal structures.

      - The identification of the potential dedifferentiated cell populations that form the foundation of the anlage, serving as the epicenter for proliferating and differentiating cells.

      We will ensure that these and other significant findings are prominently emphasized in the resubmitted manuscript.

      Reviewer #2

      - The spatial context of the RNA localization images is not well represented, making it difficult to understand how the schematic model was generated from the data. In addition, multiple strong statements in the conclusion should be better justified and connected to the data provided.

      As explained above we will make an effort to provide a better understanding of the cellular/tissue localization of the labeled cells. Similarly, we will revise the conclusions so that the statements made are well justified.

      Reviewer #3

      - Possible theoretical advances regarding lineage trajectories of cells during sea cucumber gut regeneration, but the claims that can be made with this data alone are still predictive.

      We are conscious that the results from these lineage trajectories are still predictive and will emphasize this in the text. Nonetheless, they are important part of our analyses that provide the theoretical basis for future experiments.

      - Better microscopy is needed for many figures to be convincing. Some minor additions to the figures will help readers understand the data more clearly.

      As explained above we will make an effort to provide a better

      understanding of the cellular/tissue localization of the labeled cells.  Similarly, we will revise the conclusions so that the statements made are well justified.

    1. Author response:

      We sincerely appreciate the reviewers' time, effort, and thoughtful feedback, which have significantly contributed to our research.

      A key concern raised was the potential overinterpretation of our data. While the reviewers acknowledged our identification of a possible synchronization mechanism among active mitral and tufted cells (MTCs) that is distance-independent, they correctly pointed out that we did not provide direct evidence showing how ensemble MTCs synchronize. We concur with their assessment and will address this in our forthcoming response to ensure a precise interpretation of our findings.

      Another concern raised involves the interpretation of results obtained under Ketamine anesthesia. Since Ketamine is an NMDA receptor antagonist, which plays a crucial role in MTC-GC reciprocal synapses, this might impact our conclusions. To address this, we will include analyses demonstrating that optogenetic activation of granule cells (GCs) in an anesthetized state inhibits recorded MTCs during baseline but does not affect odor-evoked MTC firing rates. Additionally, we will thoroughly discuss the potential influence of Ketamine anesthesia on GC-MTC synapses and its implications for our findings.

      Lastly, in our detailed response to the reviewers' comments, we will discuss several recent studies that are particularly relevant to our research. We will also expand on our hypothesis that parvalbumin-positive cells in the olfactory bulb may serve as key mediators of the activity- and distance-dependent lateral inhibition observed in our findings.

    1. Author response:

      General comments, factual mistakes:

      Reviewer 1 - Summary: “This study builds on the observation that the kynurenine pathway is required in the conceptus, as HOO null embryos are sensitive to maternal deficiency of NAD precursors (vitamin B3) and tryptophan, and narrows the window of sensitivity to a 3-day period.”

      Correction:

      Vitamin B3 should not be in parentheses, because vitamin B3 and tryptophan are both NAD precursors. We also suggest that the second half of this sentence is changed to “…and narrows the window of sensitivity to a 3-day period from embryonic day 7.5 to E10.5.” Currently, it reads as if Haao-null embryos are sensitive to any 3-day period of maternal NAD precursor restriction.

      Reviewer 1 – Strengths: “Abnormalities develop under conditions of maternal vitamin B3 deficiency, indicating…”

      Correction:

      We suggest replacing “vitamin B3 deficiency” with “NAD deficiency”, as this is more accurate.

      Reviewer 2 – Strengths: “…and then re-analysis of RNA-seq datasets suggested the endoderm was the cell source of NAD synthesis.”

      Correction:

      We suggest re-phrasing this sentence to “…and then re-analysis of RNA-seq datasets suggested the yolk sac endoderm cells are the source of NAD de novo synthesis.”

      Reviewer 1 (Public Review):

      However, without analysis of embryos at later stages in this experiment it is not known how long is needed for NAD synthesis to be recovered - and therefore until when the period of exposure to insufficient NAD lasts. This information would inform the understanding of the developmental origin of the observed defects.

      We are currently seeking funds to investigate the developmental origin of the observed defects. This study includes assessing how the timing of maternal NAD precursor restriction corresponds to the timing of NAD deficiency in the embryo.

      More importantly, there is still a question of whether in addition to the yolk sac, there is HAAO activity within the embryo itself prior to E12.5 (when it has first been assayed in the liver - Figure 1C).

      We have additional data showing that at E11.5 the embryo has no HAAO activity. We also tested E14.5 embryos with their livers removed, and these also do not have HAAO activity. We are planning to include these data sets in the revised version of this manuscript.

      Reviewer 2 (Public Review):

      Page 4 and Table S4. The descriptors for malformations of organs such as the kidney and vertebrae are quite vague and uninformative. More specific details are required to convey the type and range of anomalies observed as a consequence of NAD deficiency.

      Kidney defects were classified as described in Cuny et al. 2020 PNAS (PMID:32015132). In brief, kidneys with a length (tip to tip) of ≤ 1.5 mm in length were counted as hypoplastic, because the average length of a normal kidney at E18.5 is 2.98 mm (2.75-3.375 mm). The one dysmorphic kidney we observed in our dataset had a cyst. We plan to include this information plus more details of the observed vertebral defects in the revised version of this manuscript.

      Can the authors define whether the role of the NAD pathway in a couple of tissue or organ systems is the same? By this I mean is the molecular or cellular effect of NAD deficiency is the same in the vertebrae and organs such as the kidney. What unifies the effects on these specific tissues and organs and are all tissues and organs affected? If some are not, can the authors explain why they escape the need for the NAD pathway?

      We agree that this is a very important question, but consider it beyond the scope of this manuscript. To elucidate the underlying cellular and molecular mechanisms in individual organs will require a multiomic approach because NAD is involved in hundreds of molecular and cellular processes affecting gene expression, protein levels, metabolism, etc. For details of NAD functions that have relevance to embryogenesis see Dunwoodie et al 2023 https://doi.org/10.1089/ars.2023.0349. Furthermore, organs develop at different times during embryogenesis with both distinct, but in some cases shared, molecular and cellular processes. Relating these to specific NAD functions is the challenge. We are currently seeking funds to investigate how NAD deficiency disrupts organogenesis.

      Page 5 and Figure 6C. The expectation and conclusion for whether specific genes are expressed in particular cell types in scRNA-seq datasets depend on the number of cells sequenced, the technology (methodology) used, the depth of sequencing, and also the resolution of the analysis. It is therefore essential to perform secondary validation of the analysis of scRNA-seq data. At a minimum, the authors should perform in situ hybridization or immunostaining for Tdo2, Amid, Kmo, Kanu, Haao, Qprt, and Nadsyn1 or some combination thereof at multiple time points during early mouse embryogenesis to truly understand the spatiotemporal dynamics of expression and NAD synthesis.

      We have tested antibodies against HAAO, KYNU, and QPRT in adult mouse liver samples (the main site of NAD de novo synthesis) which produced non-specific bands with western blotting. Therefore, in situ immunostaining  studies on embryonic tissues are not feasible. We will investigate the possibility of effectively localizing transcripts of NAD de novo synthesis enzymes using in situ hybridization.

      Absolute functional proof of the yolk sac endoderm as being essential and required for NAD synthesis in the context of CNDD might require conditional deletion of Haoo in the yolk sac versus embryo using appropriate Cre driver lines or in the absence of a conditional allele, could be performed by tetraploid embryo-ES cell complementation approaches. But temporal dietary intervention can also approximate the same thing by perturbing NAD synthesis Shen the yolk sac is the primary source versus when the liver becomes the primary source in the embryo.

      Reviewer 1 has a related comment. We have additional data showing that at E11.5 the embryo has no HAAO activity, like the placenta. Similarly, E14.5 embryos with their livers removed, do not have HAAO activity either. We believe this provides sufficient proof that the yolk sac endoderm is the only site of NAD de novo activity in the conceptus until the liver has formed and takes over this function.

    1. Author response:

      We are grateful to the reviewers for recognizing the importance of our work and for their helpful suggestions. We will revise our manuscript in the revised version. However, we’d like to provide provisional responses now to answer the key questions and comments from the reviewers.

      (1) Both reviewers asked why we chose 24-120 hpf to measure the apoptotic rates. We chose this time window based on the following two reasons: 1) Previous studies showed that although the motor neuron death time windows vary in chick (E5-E10), mouse (E11.5-E15.5), rat (E15-E18) and human (11-25 weeks of gestation), the common feature of these time windows is that they are all the developmental periods when motor neurons contact with muscle cells. The contact between zebrafish motor neurons and muscle cells occurs before 72 hpf, which is included in our observation time window. 2) Zebrafish complete hatching during 48-72 hpf, and most organs form before 72 hpf. More importantly, zebrafish start swimming around 72 hpf, indicating that motor neurons are fully functional.

      Thus, we are confident that this 24-120 hpf time window covers the time window during which motor neurons undergo programmed cell death during zebrafish early development. We frequently used “early development” in this manuscript to describe our observation. However, we missed “early” in our title. We will add “early” in the title in the revised version.

      (2) Both reviewers also asked about the neurogenesis of motor neurons. Previous studies have shown that the production of spinal cord motor neurons largely ceases before 48 hpf and then the motor neurons remain largely constant until adulthood. Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our data and conclusions.

      (3) Both reviewers questioned the specificity of using the mnx1 promoter to label motor neurons. The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons. Furthermore, a few of those green cell bodies turned into blue apoptotic bodies inside the spinal cord and changed to blue axons in the muscle regions at the same time, which strongly suggests that those apoptotic neurons are not interneurons. Although the mnx1 promoter might have labeled some interneurons, this will not affect our major finding that only a small portion of motor neurons died during zebrafish early development.

      (4) Reviewer 2 is concerned that the estimated 50% of motor neuron death was in limb-innervating motor neurons but not in body wall-innervating motor neurons. The death of motor neurons in limb-innervating motor neurons has been extensively studied in chicks and rodents, as it is easy to undergo operations such as amputation. However, previous studies have shown this dramatic motor neuron death does not only occur in limb-innervating motor neurons but also occurs in other spinal cord motor neurons. In our manuscript, we studied the naturally occurring motor neuron death in the whole spinal cord during the early stage of zebrafish development.

      (5) Reviewer 2 mentioned that we ignored the death of an identified motor neuron. Our study was to examine the overall motor neuron apoptosis rather than a specific type of motor neuron death, so we did not emphasize the death of VaP motor neurons. We agree that the dead motor neurons observed in our manuscript contain VaP motor neurons. However, there were also other types of dead motor neurons observed in our study. The reasons are as follows: 1) VaP primary motor neurons die before 36 hpf, but our study found motor neuron cells died after 36 hpf and even at 84 hpf. 2) The position of the VaP motor neuron is together with that of the CaP motor neuron, that is, at the caudal region of the motor neuron cluster. Although it’s rare, we did observe the death of motor neurons in the rostral region of the motor neuron cluster. 3) There is only one or zero VaP motor neuron in each hemisegment. Although our data showed that usually one motor neuron died in each hemisegment, we did observe that sometimes more than one motor neuron died in the motor neuron cluster. We will include this information in the revised manuscript.

      (6) For the morpholinos, we did not confirm the downregulation of the target genes. These morpholino-related data are a minor part of our manuscript and shall not affect our major findings. Thus, we didn’t think we missed “important” controls. We will perform experiments to confirm the efficiency of the morpholinos or remove these morpholino-related data from the revised version.

    1. Author Response:

      We would like to thank the editors and reviewers for the careful consideration of our manuscript and their many helpful comments. We would like to provide provisional author responses to address the public reviews.

      Response to Reviewer 1:

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 has a role beyond mitochondrial fission in zygotes. However, there are several possible reasons why the Drp1 KO zygotes differs from the somatic cell Drp1 KO models.  

      First, the reviewer mentions that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures  (Udagawa et al. Current Biology 2014, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. 

      These mitochondrial morphologies in Drp1-deficient oocytes/zygotes may be attributed to the unique mitochondrial architecture in these cells. Mitochondria in oocytes have the shape of a small sphere with an irregular cristae located peripherally or transversely. These structural features might be the cause of insensitivity or resistance to inner membrane fusion. In addition, in our previous study (Wakai et al., Molecular Human Reproduction 2014, Fig. 2), overexpression of mitochondrial fusion factors in oocytes resulted in mitochondrial aggregation when outer membrane fusion factor Mfn1/Mfn2 was overexpressed, while overexpression of Opa1 did not cause any morphological changes. Thus, while mitochondria in oocytes/zygotes divide actively, complete fusion, including the inner membrane, as seen in somatic cells, is unlikely to occur.

      As for mitochondrial transport, we do not entirely discard its role. Althogh mitochondrial intrinsic dynamics such as fission are of primary importance for the mitochondrial distribution and partitioning in embryos, the regulation of dynamics by the cytoskeletons may be important and thus needs further study, as the reviewer pointed out.

      Response to Reviewer 2:

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      We will indicate the time after hCG as the reviewer pointed out. The only problem is that in this experiment, there may be a slight deviation from the actual mitochondrial distribution change (Fig. S1A) due to the manipulation time for Trim-Away (since it was performed outside of the incubator). Also, no significant delay in pronuclear formation or embryonic development was observed with Drp1 depleted zygotes.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various RNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 hours of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the western blotting analysis, samples were taken into account their condition at the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      We would like to add quantitative data on mitochondrial aggregation in Drp1-depleted embryos.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      We would like to add the quantitative results of the intensity of the bands for the Western blot analysis. The number of embryos analyzed is described in Fig legends, from 20 (Fig. 4) to 30 (Fig. 2) pooled samples were used.

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      We will present to indicate quantitative results on the accumulation of ROS.

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      It has been reported that Drp1 regulates meiotic spindle through spindle assembly checkpoint (SAC) (Zhou et al., Nature Communications 2022). We would like to mention the possibility pointed out in the discussion part.

      Response to Reviewer 3:

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      - Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      We would like to add a comment regarding cristae morphology.

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      ATeam fluorescence is analyzed using a regular fluorescence microscope, not a confocal laser microscope, in order to analyze the intensity in the whole embryo (or the whole blastomere). Therefore, we are currently unable to obtain images of localized areas within the cell (e.g., around the spindle) as expected by the reviewer; as shown in the images in Figure 3-figure supplement 1C, there is a tendency to see high ATP levels at the cell periphery, but further analysis is needed for clear and definitive results.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Aggregated mitochondria are localized toward the cell center, but do not behave in such a way that they are preferentially concentrated near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca2+ response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We assume that what the reviewer have pointed out is right. However, although we were able to show the bias in Ca2+ store levels between blastomeres of Drp1 depleted embryos, we did not stain mitochondria simultaneously, so we were unable to say details such as more Ca2+ stores in blastomere that inherited more mitochondria or less Ca2+ stores in blastomere with more aggregated mitochondria

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked accumulation of mitochondria around the spindle is unique to the first cleavage and seems to be coincident with the migration of the pronuclei toward the center. Since the process of assembly of the male and female pronuclei is also an event unique to the first cleavage, abnormalities such as binucleation due to mitochondrial misplacement are thought to be a phenomenon seen only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.