5,954 Matching Annotations
  1. Dec 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Weaknesses: One minor weakness in this study is the conclusion that the guide RNAs didn't seem to have unique effects on GnRH cFos expression or the reproductive phenotypes. Though the data indicate a 60-70% knockdown for both gRNA2 and gRNA3, 3 of the 4 gRNA2 mice had no cFos expression in GnRH neurons during the time of the LH surge, whereas all mice receiving gRNA3 had at least some cFos/GnRH co-expression. In addition, when mice were re-categorized based on reduction (>75%) in kisspeptin expression, most of the mice in the unilateral or bilateral groups received gRNA2, whereas many of the mice that received gRNA3 were in the "normal" group with no disruption in kisspeptin expression. Thus, additional experiments with increased sample sizes are needed, even if the efficacy of the ESR1 knockdown was comparable before concluding these 2 gRNAs don't result in unique reproductive effects.

      Response: A draw back of the CRISPR approach is the substantial mosaicism in gene knockdown that is unavoidable due to the nature of DNA repair in each cell relying on several competing pathways. As such, variable knockdown occurs in each mouse as shown in Fig.1C. In the case of the correlation between RP3V ESR1 knockdown and cFos in GnRH neurons (Fig.4C), three gRNA3 and four 4 gRNA2 mice look to be very similar with two gRNA3 mice having knockdown but normal cFos activation. The reasons for this are not known and it is very likely chance that these two (of nine) mice happened to have received gRNA3. This issue becomes exacerbated when animal group numbers unintentionally become smaller with the re-grouping on the basis of kisspeptin expression. The key point here is that each “kisspeptin grouping” remains mixed in terms of gRNA2 and gRNA3 mice so that gRNA3 mice did contribute to the “bilateral group” even if it was only one of four mice. The practicalities of repeating this work are substantial and we do not think justified. We would note that we have previously used Kiss-Cre mice to undertake CRISPR knockdown of ESR1 in RP3V kisspeptin neurons but this failed to target sufficient cells with Cas9 to be experimentally useful.

      In Figure 2B (gRNA2), there appear to be 4 mice (4 lines) that have a normal cycle length and then drop to 0 for the cycle length. However, in the Figure legend, it states that there were 3 gRNA2 mice that had a cycle length of 0. Can the authors clarify if it was 4 mice (as indicated in Figure 2B) or 3 mice (as indicated in the legend) that received gRNA2 and exhibited constant estrus?

      Response: We have now clarified in the text that 3 gRNA2 mice went into constant estrus, the other mouse was in constant diestrus, also scored as “0” cycles.

      In Figure 3H, there is one green data point that has an LH level of around 0.15 and % VGAT with ESR1 around 10%. However, that data point does not appear in Figures 3I and 3J, when you would expect it to be in a similar place (~10%) on the x-axis in those Figures. Was it excluded? If so, please elaborate on the justification for excluding that data point. Response: This was one of the three mice that exhibited no LH pulses so we were only able to report on mean LH levels.

      Similarly, in Figure 3K, there is a blue data point that is almost at 0 for both the x-axis and the y-axis. However, that data point does not show up in Figures 3L and 3M around 0 on the x-axis as you would expect. Can the authors clarify where this data point went in Figures 3L and 3M?

      Response: This was one of the three mice that exhibited no LH pulses so we were only able to report on mean LH levels.

      Reviewer #2 (Recommendations For The Authors):

      Finally, the study leaves unanswered the role of GABA itself. As there was no evident phenotype for the ESR1 knockdown in GABA neurons that do not coexpress kisspeptin, this suggests that GABA neurotransmission in the preoptic area is not involved in the estrogen regulation of LH secretion.

      Response: The current evidence for no substantial role of GABA from RP3V neurons in the LH surge agrees with our prior in vivo work showing that low frequency optogenetic stimulation of RP3V kisspeptin neurons (only GABA release) has no impact on LH secretion (doi: 10.1523/JNEUROSCI.0658-18.2018).

      1. Title. The present data do not clearly demonstrate the blockade of the LH surge. Thus, the statement that "abolishes the preovulatory surge" is an overinterpretation of the findings.

      Response: We agree and now use “suppresses the preovulatory surge”.

      1. Fig. 3. The numbers of individual data points per group change for the different LH pulse parameters, but they should not (Fig. 3 E-G).

      Response: This occurs because one mouse in each group had no LH pulses so that only a mean value was available for these mice.

      1. Fig. 4. (4B) The use of only one terminal blood collection (4B) is insufficient to comprehensively characterize the LH surge. It is not possible to conclude what was the actual effect on the LH surge, whether a blockade or altered amplitude or timing. Serial blood samples at 30- or 60-minute intervals should be used. For comparative purposes, the pulsatile LH secretion, which does not seem to be a major outcome in the study, was fully characterized (Fig. 3). (4C) The linear correlation between c-Fos/GnRH and RP3V/ESR1 appears to be well-fitted for gRNA2 (blue) but not gRNA3 (green). Although this is interpreted as an important result of the study, its description and consistency are not so clear. Authors should perform an Anova/ Kruskal-Wallis analysis of these data as a column graph (as in Fig. 4A, B) and discuss the discrepancies between gRNA2 and gRNA3.

      Response: As noted in the manuscript, we agree that a single point LH measurement is a relatively inaccurate assessment of the LH surge and very likely underlies much of the substantial variability between mice. However, the extended duration of cFos expression in GnRH neurons at the time of the surge is a much more accurate “single point” indicator and we feel that these results better reflect the state of surge activation. This was noted in the original manuscript.

      The linear correlations for the different preoptic regions are undertaken on the complete data set not on individual gRNA groups due to low N numbers in the sub-divided groups. However, column graphs of the RP3V and MPN look the same as Fig.4A and would not change the current interpretation. Please see comments to Reviewer 1 on discrepancies between gRNA2 and 3.

      1. Table. It is unclear why the % VGAT with ESR1 was not statistically reduced in the "bilateral" animals. Would this mean that the ESR1 knockdown was not effective in this subgroup with the more consistent effects?

      Response: Yes, this would be a reasonable interpretation suggesting that mice with kisspeptin ablation may have had a slightly different overall impact on ESR1 in VGAT neurons. However, this was not discernable from examining the anatomical distribution of AAV.

      1. Discussion 1st paragraph. It is interpreted that mice lacking kisspeptin expression "failed to exhibit an LH surge". This should be revised.

      Response: We believe that this is a correct statement. Mice lacking kisspeptin had LH surge values between 0.8 and 2.1 ng/ml that we would not consider consistent with being a surge.

      1. Immunohistochemistry. It is not clear in the text how a cross-reaction between goat antirabbit 568 (ERa) and goat antirabbit/streptavidin 647 (mChery) was avoided when used in the same reaction.

      Response: We were forced into this option due to the lack of different primary antisera to ESR1 and mCherry. We first stained for rabbit ESR1 detected by biotin anti-rabbit/ strep647 which resulted in confined nuclear staining (pseudo-blue; far red). The subsequent staining for rabbit mCherry was detected by goat anti-rabbit 568 that will indeed cross-react by binding to any free epitopes on the rabbit ESR1 primary antibody. However, this would not compromise interpretation as additional 568 labelling to the nucleus is essentially irrelevant when examining far red 647 nm emission and only mCherry cytoplasmic immunoreactivity was used to define the anatomical locations of the AAV spread. This is now clearly explained in the Methods section.

      1. Statistical analysis. It is unclear when repeated measures Wilcoxon tests were used in the manuscript.

      Response: Thank you for pointing this out. Only Wilcoxon paired test were used. Amended.

      1. Data Availability. Further reference to supplementary information files was not found in the manuscript.

      Response: A supplementary file with individual data for each mouse is now attached.

      Reviewer #3 (Recommendations For The Authors):

      Weaknesses:

      One aspect for which I have ambiguous feelings is the minimal level of detail regarding the HPG axis and its regulation by estrogens. This limited amount of detail allows for an easy read with the well-articulated introduction quickly presenting the framework of the study. Although not presenting the axis itself nor mentioning the position of GnRH neurons in this axis or its lack of ERα expression is not detrimental to the understanding of the study, presenting at least the position of GnRH neurons in the axis and their critical role for fertility would likely broaden the impact of this work beyond a rather specialist audience.

      Response: We agree that this would provide a more complete picture and have modified the Introduction.

      The expression of kisspeptin constitutes a key element for the analysis and conclusion of the present work. However, the quality of the kisspeptin immunostaining seems suboptimal based on the representative images. The staining primarily consists of light punctuated structures and it is very difficult to delineate cytoplasmic immunoreactive material defining the shape of neurons in LacZ animals. For some of the cells marked by an arrow, it is also sometimes difficult to determine whether the staining for ESR1 and Kp are in the same focal plane and thus belong to the same neurons. Although this co-expression is not critical for the conclusions of the study, this begs the question of whether Kp expression was determined directly at the microscope (where the focal plan can be adjusted) or on the picture (without possible focal adjustment). Moreover, in the representative image of Kp loss, several nuclei stained for fos (black) show superimposed brown staining looking like a dense nucleus (but smaller than an actual nucleus). This suggests some sort of condensed accumulation of Kp immunoproduct in the nucleus which is not commented. Given the critical importance of this reported change in Kp expression for the interpretation of the present results, it is important to provide strong evidence of the quality/nature of this staining and its analysis which may help interpret the observed functional phenotype.

      Response: The kisspeptin immunoreactivity represents both fiber and cytoplasmic staining that can be difficult to discern in some cases. The reviewer can be assured that all counts were undertaken “live” on the microscope so that the plane of focus was adjusted to establish co-labelling. Please note that the nuclear immunoreactivity is for ESR1 and not cFos. Regardless, we struggle to see condensed brown staining over the black nuclei as suggested by the Reviewer. The kisspeptin staining is light brown and confined to just a few fibers in Fig.5B.

      As acknowledged in the introduction, this study is not the first to use in vivo Crisp-Cas editing to demonstrate the role of kisspeptin neurons in the control of positive feedback. Although the present work achieved this indirectly by targeting VGAT neurons, I was surprised that the paper did not include more comparison of their results with those of Wang et al., 2019. In particular, why was the present approach more successful in achieving both lack of surge and complete acyclicity?

      Response: Wang et al., reported an ~60% reduction in ESR1 expression in Kiss1-Cre (Elias) driven Cas9-expressing cells in the AVPV. As they did not examine kisspeptin expression itself it is unknown to what degree their editing impacted upon kisspeptin neurons. The other differentiating factor was that Wang focussed on the AVPV that only contains a minority of the preoptic kisspeptin population whereas we targeted the AVPV and PeN together. Thus, we suspect that the Wang phenotype arises from insufficient ESR1 knockdown in just the AVPV sub-population of preoptic kisspeptin neurons. We have added a comment to the Discussion as requested.

      Moreover, why is it that targeting ESR1 in a selected fraction of GABAergic neurons can lead to a near-complete absence of Kp expression in this region? This is briefly discussed in the penultimate paragraph but mostly focuses on the non-kisspeptinergic GABA neurons rather than those co-expressing the two markers.

      Response: We have modified this section to try and make it clear that it is very likely that all RP3V kisspeptin neurons would have been targeted to express Cas9 in this mouse model. Our very recent unpublished RNA scope data show that >80% of RP3V kisspeptin neurons express Vgat mRNA in adult mice.

      • Unless I have missed it, the target sequence of the guide RNAs is not mentioned. For reproducibility purposes and to allow comparison with Wang et al., 2019, this information should be provided.

      Response: The target sequences for gRNA2 and gRNA3 were around exon 3 and are provided in the Supplementary files of McQuillan et al., 2022 (https://doi.org/10.1038/s41467-022-35243-z). The Wang et al study used the unusual strategy of designing sense and antisense gRNAs against the same sequence in Exon1.

      • The first result section is devoted to the design and validation of the guide RNA reports data that were recently published (McQuillan et al., 2022). It is actually acknowledged that the design was reported previously but as written it is not clear whether the actual validation was already reported. This should be said more clearly.

      Response: Clarified as requested.

      • What was the rationale for choosing gRNA 2 and 3 and not 3 and 6 like in the McQuillan study?

      Response: As all three gRNAs worked equally well, the choice of 2 and 3 was entirely pragmatic and only based upon quantities of packaged AAVs that we had produced and were available at the time.

      • Introduction, 4th paragraph: It would be clearer if GABAa receptor dynamics was replaced by GABAa receptors mediated neurotransmission or any other verbiage avoiding possible confusion with receptor mobility.

      Response: Clarified as requested.

      • The section reporting the location of ESR1 knockdown is really clear about the number of animals included in the functional analyses. This is less clear for the number of mice involved in the evaluation of the extent of ESR1 knockdown in the previous section. Specifically, the text reports that 8 and 9 mice received gRNA3 in PVpo and MPN respectively, but the figure shows 7 and 8. This is likely explained by the mouse that was excluded due to normal ESR1 despite the correct positioning of the injection site. It is thus unclear whether this mouse was included in the calculation of the mean percentage of neurons reported in the previous page. Logically, this mouse should have been removed from this analysis and it is assumed that the sample size reported in the text is incorrect.

      Response: thank you for picking this up - you are correct. In reviewing this point we realized that the gRNA-lacZ RP3V N numbers also were incorrect and have re-analyzed the data set completely resulting in even stronger significance levels.

      • In the section « CRISPR knockdown ESR1 in RP3V GABA-kisspeptin neurons », the extent of ESR1 knockdown is expressed in a counterintuitive manner as « <20% » which is thought to represent the percentage of cells expressing ESR1 rather than the actual knockdown (>80%). This should be clarified.

      Response: Corrected as noted.

      • Page 6, 3rd line before the last paragraph, there is a mismatch between the highest p value reported in the text (0.242) and the value reported in the table (0.0242).

      Response: Corrected thank you.

      • Similar to presenting F values for ANOVAs, H values should also be presented for Kruskal Wallis tests.

      Response: Values have been added.

      • Immunohistochemistry : Origin and reference numbers of all primary antibodies should be reported as well as citation of studies where they have been validated. Although these protocols are standard, information regarding the duration of incubation is necessary to allow replication or for comparison purposes.

      Response: We have included the RRID numbers for each of these antisera and added information on incubation times.

      • The section on data availability mentions the existence of supplementary files, but I see none.

      Response: These have now been attached.

      • There are several typos or redundancies to be corrected. Here are a few examples but the manuscript should be carefully double-checked.

      Introduction, 3rd paragraph, line 4: upregulated

      Introduction, 4th paragraph, 4th line: « to » or « through » not both.

      Page 7, line 11 : Kruskal

      Page 7, 6th line to the end: does this indicate 'the' general utility?

      Page 8, 2nd paragraph, line 13: Crispr

      Response: Thank you for these edits.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      It is not clear if the cost-effectiveness cited refers exactly to the PAVE protocol. No line item costings are given. As far as I know, the AmpFire test is very expensive (some 6 USD) and AI-assisted colposcopy has at least formerly been very expensive.

      Response: As mentioned in the section on "Cost-effectiveness analysis," the cost-effectiveness results refer to "an early exercise to approximate the potential costs and benefits of a highly effective screening campaign delivered to women aged 30-49 years in the ~65 highest burden LMIC (Figure 1; Suppl Materials) and an HPV vaccination program delivered to girls aged 9-14 years". Because this modeling was intended to be a high-level approximation prior to the availability of micro-costing and use of a new microsimulation model reflecting the epidemiology of HPV in PAVE study sites, we used a bundled cost of US$15 per woman screened and managed appropriately, including the ~$6 cost of the ScreenFire test, triage with AVE for women with HPV positivity, and treatment based on risk stratification. Micro-costing and microsimulation model development for PAVE sites are ongoing alongside the study and will have the capability to reflect setting-specific differences in delivery costs, as well as different burdens of HPV and precancer. These refinements of costing and cost-effectiveness estimates are a high priority of the PAVE consortium

      Reviewer #2 (Recommendations For The Authors):

      As mentioned above, the description of phase 2 could be improved. I suggest that the inclusion of Implementation Science frameworks and tools could contribute to strengthening methods to measure implementation outcomes. Perhaps if the protocol and scope of the study allows it, I suggest that the authors evaluate the incorporation of the assessment of barriers and facilitators of implementation to inform future scaling up of the PAVE strategy. To do this, for example, some Implementation Science Frameworks, such as Conceptual Framework of Implementation Research (CFIR)1-2 could be useful. In addition, as the authors mentioned, future dissemination will need an effective communication strategy and to design it they will carry out a pilot study. The inclusion of CFIR framework or other similar framework, could contribute to identifying contextual factors that might affect implementation and contribute to designing an accurate implementation and dissemination strategy.

      The authors also mentioned that if the PAVE strategy is effective, it could replace the current standard of care. This fact would lead to the need to carry out a des-implementation process. This process needs stakeholders' engagement and political will, among other contextual factors (e.g., human resources, organizational changes, etc.). Implementation of new strategies needs that implementers perceive it as acceptable, adaptable, compatible and with greater advantages than the usual practice. In this sense, the analysis of implementation outcomes guided by CFIR framework could play an important role in this future des-implementation process.

      1. Damschroder, et al. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Sci 4, 50 (2009) https://doi.org/10.1186/1748-5908-4-50.

      2. Damschroder, L.J., Reardon, C.M., Widerquist, M.A.O. et al. The updated Consolidated Framework for Implementation Research based on user feedback. Implementation Sci 17, 75 (2022). https://doi.org/10.1186/s13012-022-01245-0

      Response: Phase 2 refers to limited aspects of PAVE implementation, mainly introducing the management algorithms and evaluating the acceptability by providers and patients. Based on preliminary results of PAVE in the efficacy analysis a more comprehensive implementation intervention is being planned.

      Reviewer #3 (Recommendations For The Authors):

      This is a very strong protocol and obviously the synthesis of many years' of work. I have some minor suggestions only.

      The issue raised as a weakness could be addressed by specifying that biopsy adequacy is evaluated by the local histopathologist. Those cases that don't contain at least some stroma and only superficial strips of epithelium should probably be assessed as "unsatisfactory" and excluded from triage performance calculations.

      While endocervical curettage is commonly performed in North America, resulting in good quality samples, there is considerable global variation in this practice. The procedure yielding high quality samples is usually somewhat painful due to the cervical dilation and may in fact be more painful than small biopsies.

      Response: We are undertaking a thorough evaluation of histology assessment together with the on-site pathologists and an external expert reviewer. It is critical that the study material be of good quality and that the diagnosis be highly accurate as these elements are critical for patient management but also for an adequate training of the AI algorithm. We are recommending to use for endocervical sampling a soft tissue by Histologics that provides excellent material and it is reported to be less painful than regular curette. Pathologists are requested to verify the quality of the sampling of this approach.

      The sentence starting at line 311 could add that, clinicians also record transformation type and/ or colposcopy adequacy.

      Response: Added

      The clinicians are reporting the VIA or the colposcopy impression and also the visibility of the SCJ.

      The manuscript could be strengthened by specifying what will happen to people who have HPV detected and are triage negative. Will they be recalled for follow-up HPV test at around 12 months or some other interval?

      Finally, will those who have been treated be recalled for a follow-up HPV test at around 12 months, particularly those treated with thermal ablation? Follow-up of people in whom HPV is detected, whether triage negative or positive (and treated) would strengthen the study and enhance participant safety. If this is already planned it would strengthen the manuscript to cover these aspects.

      Response: The PAVE strategy runs under a Consortium agreement and thus we cannot dictate specific protocols for follow-up. We are very eager to promote an adequate follow-up for those with a triage test negative, but the monitoring of its implementation is beyond PAVE. All settings have under their guidelines a yearly follow-up for any woman receiving thermal ablation and shorter intervals for those getting LEEP (LLETZ).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study offers an inventory of proteins and their phosphorylated sites that are up- and down-regulated in the adipose tissue and skeletal muscle of women with PCOS. The data were collected and analyzed using rigorous and validated methodology, making it a useful resource for identifying targets and strategies for future PCOS treatments. However, even though some of the predicted targets are compelling, further functional validation is required to ensure the accuracy of these identified targets. If confirmed, the findings of this study would be of significant interest to a wide range of readers.

      Thank you very much for the opportunity to carry out some final revisions to our manuscript and for the invitation to submit a revised version of our work for further consideration in eLife. We are grateful for the very constructive and thorough feedback provided. Consequently, our manuscript has undergone revisions to address the issues raised, providing additional data from mouse models showing that androgen receptor signaling has a direct effect on muscle fiber type.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript, the authors tried to explore the molecular alterations of adipose tissue and skeletal muscle in PCOS by global proteomic and phosphorylation site analysis. In the study, the samples are valuable, while there are no repeats for MS and there are no functional studies for the indicted proteins, phosphorylation sites. The authors achieved their aims to some extent, but not enough.

      Response: Indeed, the samples are valuable but given the relatively high sensitivity and specificity of the method we don’t see why repeats for MS would increase the power of the study. The number of tissue samples analyzed would however do so. Although no functional studies have been done, we do show that hyperandrogenism is associated with a shift towards fewer type I fibers in skeletal muscle. In the revised manuscript we have added data showing that androgens (dihydrotestosterone, DHT) have a direct effect on reducing the number of type I muscle fibers in a PCOS-like mouse model. Prepubertal DHT exposure led to a dramatic decrease in type I fibers, and this effect was partly prevented by the androgen receptor antagonist flutamide (Fig. 4A). Moreover, while skeletal muscle specific AR knockout mice presented with fewer type I muscle fibers, they were protected against the DHT-induced type I muscle fiber loss (Fig. 4B).

      Reviewer #2 (Public Review):

      This study provides the proteomic and phosphoproteomics data for our understanding of the molecular alterations in adipose tissue and skeletal muscle from women with PCOS. This work is useful for understanding of the characteristics of PCOS, as it may provide potential targets and strategies for the future treatment of PCOS. While the manuscript presents interesting findings on omics and phenotypic research, the lack of in-depth mechanistic exploration limits its potential impact.

      The study primarily presents findings from omics and phenotypic research, but fails to provide a thorough investigation into the underlying mechanisms driving the observed results. Without a thorough elucidation of the mechanistic underpinnings, the significance and novelty of the study are compromised.

      Response: We do provide solid evidence that women with PCOS have a lower expression of proteins specific for type I muscle fibers. A comprehensive exploration of the mechanism driving the observed results is not within the scope of this paper. However, we have included experimental data from a PCOS-like mouse model to strengthen our results that hyperandrogenism has a direct effect on lowering the number of type I fibers. Prepubertal dihydrotestosterone (DHT) exposure led to a dramatic decrease in type I fibers, and this effect was abolished in DHT-exposed mice with skeletal muscle-specific deletion of the androgen receptor (Fig. 4B). Moreover, the decrease in type I fibers was partly prevented by the androgen receptor antagonist flutamide in wild-type mice (Fig. 4A). Notably, unchallenged skeletal muscle specific AR knockout mice had fewer type I muscle fiber. These data indicate that muscle AR signaling is important for normal muscle development, but that exaggerated muscle AR signaling leads to decreased abundance of type I muscle fibers in adult females.

      Reviewer #1 (Recommendations For The Authors):

      1. For participant recruitment the age should be considered.

      Response: The age of the women is shown in Table 1, the mean age was around 30 years. Cases and controls were matched for age, weight, and BMI at recruitment.

      1. The current method is that biopsies from 10 participants are collected as a sample, biopsy from 1 participant for MS and comprehensive analysis in the group may be better.

      Response: The skeletal muscle biopsies from the 10 controls and 10 women with PCOS at baseline and after 5 weeks of treatment were collected and analyzed as individual samples. For MS each sample was handled as individual samples with subsequent comprehensive analysis of each group. This has now been further clarified in the methods; paragraph Proteomic sample preparation and LC-MS/MS analysis.

      1. Figure 2C, it is not convincing that "The increased expression of perilipin-1 was confirmed by immunofluorescence staining of muscle biopsies".

      Response: we have quantified perilipin-1 staining in skeletal muscle cells from control and PCOS using ImageJ software (National Institutes of Health, Bethesda, MD, USA). The channels of the images were split and converted into 8-bit. The minimum and maximum thresholds were adjusted and kept constant for all the images. Regions of interest were drawn around the cells and empty space for background intensity measurement. The mean perilipin-1 intensity was measured and corrected by deducting the background. A total of 28 PCOS and 33 control cells were quantified. The quantification of perilipin-1 staining is included in Fig. 2D. Perilipin-1 staining was more abundant in skeletal muscle cells from women with PCOS.

      1. Figs.3F,4C,5C,6B, methods for the quantification are needed respectively.

      Response: For each of the graphs, a detailed description of how the stainings were quantified has been included in the Methods section; Histological analyses and immunofluorescence.

      Fig.3F; Fiber cross-sectional area was automatically determined using MyoVision v1.0 and the proportion of type I fibers was manually counted on ImageJ. A total of 579 fibers from seven controls (60-150 fibers per muscle section) and 177 fibers (15-80 fibers per muscle section) from women with PCOS were quantified. Data are expressed as mean ± SD and graphically depicted with each individual fiber quantified.

      Fig. 4C and 6B; Quantification of picrosirius red staining of adipose tissue before and after treatment with electrical stimulation was performed using a semi-automatic macro in ImageJ software. This macro allows for calculation of the total area (m2) and the % of collagen staining from each area adjusting the minimum and maximum thresholds.. Three different random pictures per section (4-5 sections/subject) were taken at 10x or 20x magnification using a regular bright field microscope (Olympus BX60 & PlanApo, 20x/0.7, Olympus, Japan). All images were analyzed on ImageJ software v1.47 (National Institutes of Health, Bethesda, MD, USA) using this protocol https://imagej.nih.gov/ij/docs/examples/stained-sections/index.html with the following modification; threshold min 0, max 2.

      Fig. 5C; Quantification of picrosirius red staining of skeletal muscle before and after treatment with electrical stimulation was performed using a semi-automatic macro in ImageJ software v1.47 (National Institutes of Health, Bethesda, MD, USA) using the same protocol as for adipose tissue described above. % of collagen staining was calculated on 8 – 10 images of different microscopic fields from each muscle sample.

      Reviewer #2 (Recommendations For The Authors):

      While the study presents some valuable research findings, it falls short in terms of providing a comprehensive understanding of the mechanistic basis of the observed outcomes. Further exploration and elucidation of the mechanisms involved would greatly enhance the quality and impact of the study. For example, the authors need to provide sufficient evidence to elucidate why PCOS patients exhibit changes in these proteins and phosphorylation sites, as well as how these changes may impact PCOS patients, such as whether they are related to fertility. It would be valuable to provide further mechanistic insights to enhance the scientific rigor of the study.

      I encourage the authors to further refine their research and resubmit the manuscript with a more robust and comprehensive exploration of the mechanistic aspects to strengthen its scientific merit.

      Response: PCOS is characterized by reproductive and metabolic features. Changes in protein expression and phosphorylation sites in skeletal muscle and adipose tissue likely impact metabolic function to a larger degree than fertility. With that said, altered muscle function may affect insulin resistance and inflammation, thereby potentially aggravating reproductive status including ovulatory cyclicity and fertility potential. We found that aldo-keto reductase family 1 members C1 (AKR1C1) and C3 (AKR1C3), which for example can convert androstenedione to testosterone, had a higher expression in skeletal muscle. Expression of AKR1C1 was strongly correlated to higher circulating testosterone levels (Spearman rho=0.65, P=0.002), suggesting that muscle may produce testosterone via the backdoor pathway (added to the second paragraph of the results section). Moreover, a lower expression of the mitochondrial acetyl-CoA synthetase 2 correlated with a higher HOMA-IR (Spearman rho=-0.46, P=0.04), suggesting that an impaired mitochondrial fatty acid beta-oxidation contributes to insulin resistance. There was indeed a lower expression of various mitochondrial matrix proteins, some involved in mitochondrial fatty acid beta-oxidation; enoyl acyl carrier protein reductase; enoyl-CoA delta isomerase 1, and acyl-CoA thioesterase 11 (R-HSA-77289, q=0.0008) in PCOS muscle (this has been added to the discussion).

      A comprehensive exploration of the mechanism driving these changes is not within the scope of this paper. However, we have added data from PCOS-like mice to strengthen the paper. This mouse model supports our hypothesis that androgens drive the shift towards less type I muscle fibers, an effect that can be partly reversed by blocking the androgen receptor with the antagonist flutamide (Fig. 4A). Prepubertal DHT exposure led to a dramatic decrease in type I fibers but this effect was not observed in DHT-exposed mice with skeletal muscle-specific deletion of the androgen receptor (Fig. 4B). These data strongly indicate that AR signaling is driving the decrease in type I muscle fibers in females.

  2. Nov 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Editor and Reviewers

      Terzioglu et al, Mitochondrial temperature homeostasis resists external metabolic stresses

      Editor:

      We greatly appreciate the specific direction of the editors in guiding us as to what experiments are needed to strengthen the manuscript for publication. We here summarize how we have handled this advice (please refer to response to specific reviewer points, below, for the details). Changes to the text are indicated by red text and marginal red boxes numbered as per the responses below.

      Benchmarking: we now include a direct calibration of MTY against temperature. Performing experiments on temperature probes localized to different subcellular and submitochondrial compartments would be interesting and potentially informative, but is a whole new study that would require a great deal of validation. Hopefully it will be implemented, but it would not change the basic conclusions from the current study.

      Probe localization: In addition to referring to previously published literature, and the existing Figures 3B, 4 and S4 indicating that both MTY and mito-gTEMP are localized in mitochondria (the latter in the matrix), we have conducted some simple experiments to determine the intramitochondrial localization of MTY, applying standard subfractionation protocols. The findings confirm our previous assumption that MTY is inner membrane-associated.

      Expected outcomes: Since, in most cases, it is not possible to do this simultaneously with fluorescence measurements, we rely mostly on previous literature which is fully cited, or on measurements conducted in parallel (e.g. respirometry, Fig. S5) or previously in our own laboratories (e.g. flow cytometry on TMRM-stained cells). We accept that specific inferences on causality, e.g. that the effect of anisomycin is mediated by decreased ATP usage, or that the effects of Gal medium are to enforce dependence on OXPHOS, are arguably an over-reach. We have therefore toned down these statements so as to focus on the mt temperature response to the treatments, rather than to the imputed downstream physiological effects thereof.

      Confounding factors: We tested (and excluded) possible confounding factors affecting MTY and report the findings in an expanded supplementary figure.

      Discussion of the model(s) proposed by Matta: We have now included this, as far as we considered appropriate for the eLife readership. However, not being theoretical physicists, we would greatly welcome a careful scrutiny of what we have written, by both the reviewer and handling editor.

      Reviewer #1:

      A1. Causality: We agree with the reviewer in that we cannot formally distinguish, in this study, whether metabolism is adjusted to maintain mitochondrial temperature, or whether mitochondrial temperature maintenance is a secondary consequence of metabolic changes induced by stress. We have added a note to the Discussion to this effect. On balance, we would argue that the many cases that we have documented here tend to favour the former assertion, although this does not constitute proof. Identification of a sensor of mitochondrial temperature changes and an associated signal transduction machinery to orchestrate responses to it would be needed to settle this, but we are obviously very far from this at present. We have added this point to the Discussion, as well.

      A2. Metabolic correlates: We concede that the reviewer has a valid point, although exploring its ramifications in detail is not straightforward. The effects of AOX on respiration and resistance to OXPHOS inhibitors are documented previously and are also included in the paper as a check (Fig. S5). Our starting assumptions were that cells grown in low glucose/galactose would depend more upon mitochondrial as opposed to glycolytic ATP production, whilst net ATP production in anisomycin-treated cells should be attenuated, due to decreased ATP demand. Nevertheless, there are a number of ways this could be achieved, especially if our suggestion that altered ATP production is balanced by decreased or increased futile ATP turnover geared to maintenance of mitochondrial temperature. For example, measuring total oxygen consumption, P to O ratio or steady-state levels of ATP (or any other metabolite) would not be definitive. To accommodate the reviewer’s point, we have made clear that the various treatments we applied are predicted to alter metabolism in the specified ways, based upon theoretical arguments and previous data. To establish the exact details of the metabolic changes that accompany these treatments would require tracer-based metabolomics over time (see Jang 2018, 10.1016/j.cell.2018.03.055), followed up by measurements of specified enzyme activities. Whilst this would be very useful data that may illuminate our observations, it is obviously beyond the scope of the present paper. We hope that future studies will eventually unravel the relationship between metabolic adaptation and mitochondrial temperature.

      A3. Combinations of inhibitors: We were (and remain) reluctant to cram the paper too full of unsubstantiated speculations. Most, though not all, of the combinations of OXPHOS inhibitors that failed to give a stable reading of MTY fluorescence involved oligomycin plus an inhibitor of respiration. Since we already know that a complete loss of membrane potential leads to leakage of the dye, we surmise that this is the most likely reason for the fluorescence instability. In the presence of oligomycin alone, the minimal respiratory electron flow sustained should suffice to maintain a membrane potential if balanced against proton leakage. Conversely, even when respiration is inhibited, ATP synthase alone should be able to generate a membrane potential. However, the membrane potential may collapse when both oligomycin and a respiratory chain inhibitor are simultaneously applied. We expanded our comment on this issue in the Discussion and referred to it, briefly, in the legend of Fig. S3A.

      A4. Figure 4A: We added the panel indicators to the figure.

      A5. Fig.7C: We have tried to tighten up the wording, for clarity. Yes, the blue trace was the relevant data, but we were comparing the effect of rotenone on cells treated with anisomycin for 1, 2….18 hours with cells not treated with anisomycin at all (i.e. blue trace, zero h time-point).

      A6. Meaning of ‘control iMEFS’ (Fig. 7C): We meant iMEFs not expressing AOX. We have made the statement more precise, accordingly.

      A7. Supplementary Movie S1: The movie was sent, to accompany the submission. If it is not accessible for review, please contact the handling editor.

      Reviewer #2:

      B1. Theoretical considerations (‘mitochondrial paradox’): Since we are not theoretical physicists, we have deferred to the reviewer’s expertise in these matters and quoted the suggested literature as succinctly as possible for the largely biological audience of eLife, sticking closely to the reviewer’s own words. In this light, we would invite the reviewer to scrutinize our added text (in a short additional section of the Discussion, for both this and point B3, below), and suggest any rewording that they consider appropriate.

      B2. Biological implications: We appreciate the point, but since the Discussion section is already long, we have just referred the reader to the treatment of Fahimi et al. We hope to expand on these issues in a separate paper, to be published elsewhere.

      B3. Theoretical considerations (Landauer’s principle and ATP synthase electrostatics): Once again, we have mentioned the issue as suggested, but would ask the reviewer to check the exact language we have used and propose any amendments they consider necessary.

      Reviewer #3:

      C1. Benchmark comparisons: We acknowledge that there are limitations to the use of each method of mitochondrial temperature assessment, and we now explain them more thoroughly in a new section of the Discussion. However, the fact that the two methods give approximately the same result constitutes a crucial validation. In addition, we verified the temperature-responsiveness of MTY fluorescence in free solution at physiological pH (see new supplementary figure panel, Fig. S2D), showing that the response is almost linear over the temperature range inferred in the experiments (35-65 ºC). Note, however, that the response curve generated cannot be used directly for calibration, due to the unknown contributions in vivo from cellular autofluorescence and quenching under OXPHOS-inhibited conditions, which may modify the signal, and will vary according to the amount of dye taken up in a given experiment. Because of this, the internal calibration used in each experiment is a far more reliable way of relating observed fluorescence changes to temperature. Note, however, that if the slight deviation from linearity seen at higher temperatures in the MTY fluorescence temperature-response curve (dotted line in Fig. S2D) reflects how the dye responds in vivo, MTY-based estimations of mitochondrial temperature may be over-estimated by ~2 ºC. This is now made clear in the text.

      C2. Basal temperature: The basal mitochondrial temperature (no inhibitors) as inferred from the mitogTEMP calibration curve was already in the paper (zero time points for iMEF(P) and iMEF(AOX) cells, Fig. 7A, 7B.

      C3. Other organelles: In principle, gTEMP could be targeted to other organelles, such as the nucleus, peroxisomes, ER, plasma membrane and so on, which would be highly informative in profiling intracellular temperature heterogeneities. However, this would require further rounds of recloning and expression, followed in each case by verification of intracellular targeting; obviously quite a large study beyond the scope of our present work. In any case, it would now best be undertaken using the improved, next-generation ratiometric probes (B-gTEMP), which is under way. We agree that this is an important question for future experimentation and have added a short extra section to the Discussion, accordingly.

      C4. Variation with external temperature: We implemented additional experiments to test this, subjecting cells to a mild heat- or cold-shock, and tracking MTY fluorescence both before and after the subsequent addition of oligomycin, with final internal calibration as before. The results were again qualitatively reproducible, but suggested that the combination of external temperature shock and bioenergetic stress. We show the details of the results of these experiments here, for the reviewer and others to inspect and consider. However, since they are not straightforwardly interpretable, we feel that they should be reserved for a future study which investigates the effects of external temperature changes on intramitochondrial temperature and bioenergetics in much greater detail. For these reasons we show the data here only, and not in the revised paper.

      Both cold shock (38→32 ºC) and heat shock (38→41 ºC) produced immediate shifts of mt temperature, but by lesser amounts than the external stresses applied, i.e. a cooling of 2-4 ºC in the first case and a warming of 0-2 ºC in the second. Over the following 10 min the mt temperature of the temperature-shocked cells held steady or drifted only slightly. These observations are broadly consistent with the general conclusions of the paper that mitochondrial temperature resists external stresses. However, the effect of then adding oligomycin was intriguingly different from that seen in control cells. In cold-shocked cells the mt temperature shift produced by oligomycin was several degrees less than in control cells and mitochondrial temperature then gradually readjusted upwards to near the starting value, suggesting the induction of thermogenic pathways to compensate for the decreased external temperature. In heat-shocked cells, the response to oligomycin was reproducibly triphasic: the initial cooling effect was less pronounced than in control cells, but was followed by rewarming and then by a prolonged and progressive cooling. This is obviously much harder to interpret, and will require substantial further studies to parse.

      C5. Other factors: Although this point is addressed in previous literature, we measured effects directly in solution (for MTY). Note, however, that it is not feasible to measure membrane potential simultaneously, due to the spectral overlap between e.g. TMRM and MTY. Nevertheless we were able to test the effects on MTY fluorescence of incremental changes in Ca2+, pH and ROS within the physiological range (see doi: 10.1073/pnas.95.12.6803, doi: 10.1074/jbc.M610491200 and doi: 10.3390/antiox10050731). The results clearly indicate that changes in any of these parameters has no effect on MTY fluorescence (new supplementary figure panels S3E, S3F and S3G).

      C6. Localization of probes: The existing Figures 3B, 4 and S4, as well as previous literature, indicate a mitochondrial localization both for MTY and mito-gTEMP. The matrix localization of proteins of the GFP reporter family tagged with the COX8 matrix-directed targeting signal used here is well established (e.g. see doi: 10.1016/S0076-6879(09)05016-2). To investigate the sub-mitochondrial localization of MTY we conducted a standard series of fractionation steps, using detergents, centrifugation and sonication. Whilst these do not provide absolute purity, they clearly indicate that MTY in energized mitochondria resides in or closely associated with the inner mitochondrial membrane. In two trials, in which mitochondria were fractionated into mitoplasts versus outer membrane/inter-membrane space fractions, an average 92% of the MTY fluorescence was retained in the mitoplast fraction (after subtracting autofluorescence from control samples not treated with MTY). After sonication, which should render most of the inner membrane pelletable as ‘inside out’ submitochondrial particles (SMPs), leaving most of the matrix contents in solution, 90% of the MTY fluorescence signal (again based on two trials, with background subtracted) was recovered in the SMP fraction, supporting the proposition that the dye is inner-membrane associated. These findings are now reported in the Results section and commented on in the appropriate section of the Discussion. We agree with the reviewer that it would be useful to target temperature probes, e.g. B-gTEMP, to specific sub- and extra-mitochondrial compartments (cytosol, MAMs, outer membrane, IMS, inner membrane or even specific protein complexes therein), so as to gauge the nature of intramitochondrial heat conduction between compartments and its radiation to the extramitochondrial environment. However, because it would be an extensive study in its own right, requiring careful validation of targeting, we feel this should be attempted as a follow-up study.

      C7. Use of probes in isolated mitochondria: In principle we see no reason why this should not work, but any result would be non-physiological, since the external environment of isolated mitochondria is not the complex protein- and organelle-rich environment of the cytoplasm, which must play a crucial role in modulating heat diffusion from the organelle. Such an experiment may be useful to assess how much temperature buffering is provided by the rest of the cytoplasm, even though it does not directly address the internal temperature of mitochondria in vivo. Accordingly, we added a sentence to the Discussion foreshadowing such an experiment.

      C8. Other probes and methods: See points C1 and C3 above. The reviewer’s suggestion could best be addressed using the superior B-gTEMP reporters engineered for specific expression in the nucleus and cytosol. This would be part of an extensive new study beyond the scope of the present work, but would of course be a further validation of its conclusions. We agree that multiple approaches are needed to address the issue of temperature differences within cells, in light of the surprising findings both of ourselves and of others, such as the study of Okabe et al (2012) to which the reviewer refers. This point too is now added to the Discussion.

      C9. Theoretical considerations: The critiques referred to are now briefly addressed in the revised Discussion, along with those raised by Reviewer 2. However, since we are not theoretical physicists we do not feel qualified to enter the debate further. As Baffou and colleagues point out, in https://doi.org/10.1038/nmeth.3552, “In order for the community to come to a consensus, we believe some effort will be required to identify the actual origin of the signal measured in these studies, both theoretically and experimentally“. Our experimental findings provide source data for this debate but do not resolve it.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study reports important findings regarding the systemic function of hemocytes controlling whole-body responses to oxidative stress. The evidence in support of the requirement for hemocytes in oxidative stress responses as well as the hemocyte single-nuclei analyses in the presence or absence of oxidative stress are convincing. In contrast, the genetic and physiological analyses that link the non-canonical DDR pathway to upd3/JNK expression and high susceptibility, and the inferences regarding the function of hemocytes in systemic metabolic control are incomplete and would benefit from more rigorous approaches. The work will be of interest to cell and developmental biologists working on animal metabolism, immunity, or stress responses.

      We would like to thank the editorial team for these positive comments on our manuscript and the constructive suggestions to improve our manuscript. We are now happy to send you our revised manuscript, which we improved according to the suggestions and valuable comments of the referees.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study examines how hemocytes control whole-body responses to oxidative stress. Using single cell sequencing they identify several transcriptionally distinct populations of hemocytes, including one subset that show altered immune and stress gene expression. They also find that knockdown of DNA Damage Response (DDR) genes in hemocytes increases expression of the immune cytokine, upd3, and that both upd3 overexpression in hemocytes and hemocyte knockdown of DDR genes leads to increased lethality upon oxidative stress.

      Strengths

      1. The single cell analyses provide a clear description of how oxidative stress can cause distinct transcriptional changes in different populations of hemocytes. These results add to the emerging them in the field that there functionally different subpopulations of hemocytes that can control organismal responses to stress.

      2. The discovery that DDR genes are required upon oxidative stress to limit cytokine production and lethality provides interesting new insight into the DDR may play non-canonical roles in controlling organismal responses to stress.

      We are grateful to referee 1 to point out the importance and novelty of our snRNA-seq data and our findings on the role of DNA damage-modulated cytokine release by hemocytes during oxidative stress. We further extended these analyses in the revised manuscript by looking deeper into the transcriptomic alterations in fat body cells upon oxidative stress (Figure 4, Figure S4). We further provide additional data to support the connection of DNA damage signaling and regulation of upd3 release from hemocytes (Figure 6F). Here we show that upd3-deficiency can abrogate the increased susceptibility of flies with mei41 and tefu knockdown in hemocytes. In line with this finding, we also show that upd3null mutants show a reduced but not abolished susceptibility to oxidative stress overall (Figure 6F), underlining the role of upd3 as a mediator of oxidative stress response.

      Weaknesses

      1. In some ways the authors interpretation of the data - as indicated, for example, in the title, summary and model figure - don't quite match their data. From the title and model figure, it seems that the authors suggest that the DDR pathway induces JNK and Upd3 and that the upd3 leads to tissue wasting. However, the data suggest that the DDR actually limits upd3 production and susceptibility to death as suggested by several results:

      According to the referee’s suggestion, we revised the manuscript and adjusted our title, abstract and graphical summary to be more precise that DNA damage signaling seem to have a modulatory or regulatory effect on upd3 release. Furthermore, we provide now additional data to support the connection between DNA damage signaling and upd3 release. For example, we added several genetic “rescue” experiments to strengthen the epistasis that modulation of DNA damage signaling and the higher susceptibility of the fly is connected to altered upd3 levels (Figure 6F). We now provide additional data showing that the loss of upd3 rescues the susceptibility to oxidative stress in flies, which are deficient for DDR components in hemocytes.

      a. PQ normally doesn't induce upd3 but does lead to glycogen and TAG loss, suggesting that upd3 isn't connected to the PQ-induced wasting.

      Even though in our systemic gene expression analysis of upd3 expression, we could not detect a significant induction of upd3 upon PQ feeding. However, we found upd3 expression within our snRNAseq data in a distinct cluster of immune-activated hemocytes (Figure 3B, Cluster 6). Upon knockdown of the DNA damage signaling in hemocytes, the levels then increase to a detectable level in the whole fly. This supports our assumption that upd3 is needed upon oxidative stress to induce energy mobilization from the fat body, but needs to be tightly controlled to balance tissue wasting for energy mobilization. Furthermore, we found evidence in our new analysis of the snRNA-seq data of the fat body cells, that indeed we can find Jak/STAT activation in one cell cluster here, which could speak for an interaction of Cluster 6 hemocytes with cluster 6 fat body cells. A hypothesis we aim to explore in future studies.

      b. knockdown of DDR upregulates upd3 and leads to increased PQ-induced death. This would suggest that activation of DDR is normally required to limit, rather than serve as the trigger for upd3 production and death.

      Our data support the hypothesis that DDR signaling in hemocytes “modulates” upd3 levels upon oxidative stress. We now carefully revised the text and the graphical summary of the manuscript to emphasize that oxidative stress causes DNA damage, which subsequently induces the DNA damage signaling machinery. If this machinery is not sufficiently induced, for example by knockdown of tefu and mei-41, non-canonical DNA damage signaling is altered which induces JNK signaling and induces release of pro-inflammatory cytokines, including upd3. Whereas DNA damage itself is only slightly increase in the used DDR deficient lines (Figure 5C) and hemocytes do not undergo apoptosis (unaltered cell number on PQ (Figure 5B)), we conclude that loss of tefu, mei-41, or nbs1 causes dysregulation of inflammatory signaling cascades via non-canonical DNA damage signaling. However, oxidative stress itself seems to also induce upd3 release and DNA damage signaling in the same cell cluster, as shown by our snRNA-seq data (Figure 3B). Hence, we think that DNA damage signaling is needed as a rate-limiting step for upd3 release.

      c. hemocyte knockdown of either JNK activity or upd3 doesn't affect PQ-induced death, suggesting that they don't contribute to oxidative stress-induced death. It’s only when DDR is impaired (with DDR gene knockdown) that an increase in upd3 is seen (although no experiments addressed whether JNK was activated or involved in this induction of upd3), suggesting that DDR activation prevents upd3 induction upon oxidative stress.

      Whereas the double knockdown of upd3 or bsk and DDR genes was resulting in insufficient knockdown efficiencies, we added a rescue experiment where we combined upd3null mutants with knockdown of tefu and mei-41 in hemocytes and found a reduced susceptibility of DDR-deficient flies to oxidative stress.

      1. The connections between DDR, JNK and upd3 aren't fully developed. The experiments show that susceptibility to oxidative stress-induced death can be caused by a) knockdown of DDR genes, b) genetic overexpression of upd3, c) genetic activation of JNK. But whether these effects are all related and reflect a linear pathway requires a little more work. For example, one prediction of the proposed model is that the increased susceptibility to oxidative stress-induced death in the hemocyte DDR gene knockdowns would be suppressed (perhaps partially) by simultaneous knockdown of upd3 and/or JNK. These types of epistasis experiments would strengthen the model and the paper.

      As mentioned before, we had some technical difficulties combining the knockdown of bsk or upd3 with DDR genes. However, we added a new experiment in which we show that upd3null mutation can rescue the higher susceptibility of hemocytes with tefu and mei41 knockdown.

      1. The (potential) connections between DDR/JNK/UPD3 and the oxidative stress effects on depletion of nutrient (lipids and glycogen) stores was also not fully developed. However, it may be the case that, in this paper, the authors just want to speculate that the effects of hemocyte DDR/upd3 manipulation on viability upon oxidative stress involve changes in nutrient stores.

      In the revised version of the manuscript, we now provide a more thorough snRNA-seq analysis in the fat body upon PQ treatment to give more insights on the changes in the fat body upon PQ treatment. We added additional histological images of the abdominal fat body on control food and PQ food, to demonstrate the elimination of triglycerides from fat body with Oil-Red-O staining (Figure S1). We also analyzed now hemocyte-deficient (crq-Gal80ts>reaper) flies for their levels of triglycerides and carbohydrates during oxidative stress, to support our hypothesis that hemocytes are key players in the regulation of energy mobilization during oxidative stress. Loss of hemocytes (and therefore also their regulatory input on energy mobilization from the fat body) results in increased triglyceride storage in the fat body during steady state with a decreased consumption of these triglycerides on PQ food compared to control flies (Figure 1J). In contrast, glycogen storage and mobilization, which is mostly done in muscle, is not altered in these flies during oxidative stress (Figure 1L). Interestingly, free glucose levels are drastically reduced in hemocyte-deficient flies, which could be due to insufficient energy mobilization from the fat body and subsequently results in a higher susceptibility of these flies on oxidative stress (Figure 1K). Additionally, we aim to point out here that “functional” hemocytes are needed for effective response to oxidative stress, but this response has to be tightly balanced (see also new graphical abstract).

      Reviewer #2 (Public Review):

      Hersperger et al. investigated the importance of Drosophila immune cells, called hemocytes, in the response to oxidative stress in adult flies. They found that hemocytes are essential in this response, and using state-of-the-art single-cell transcriptomics, they identified expression changes at the level of individual hemocytes. This allowed them to cluster hemocytes into subgroups with different responses, which certainly represents very valuable work. One of the clusters appears to respond directly to oxidative stress and shows a very specific expression response that could be related to the observed systemic metabolic changes and energy mobilization. However, the association of these transcriptional changes in hemocytes with metabolic changes is not well established in this work. Using hemocyte-specific genetic manipulation, the authors convincingly show that the DNA damage response in hemocytes regulates JNK activity and subsequent expression of the JAK/STAT ligand Upd3. Silencing of the DNA damage response or excessive activation of JNK and Upd3 leads to increased susceptibility to oxidative stress. This nicely demonstrates the importance of tight control of JNK-Upd3 signaling in hemocytes during oxidative stress. However, it would have been nice to show here a link to systemic metabolic changes, as the authors conclude that it is tissue wasting caused by excessive Upd3 activation that leads to increased susceptibility, but metabolic changes were not analyzed in the manipulated flies.

      We thank the referee for the suggestion to better connect upd3 cytokine levels to energy mobilization from the fat body. We agree that this is an important point to support our hypothesis. First, we added now a detailed analysis of fat body cells in our snRNA-seq data to evaluate the changes induced in the fat body upon oxidative stress. We further added additional metabolic analyses of hemocyte-deficient flies (crq-Gal80ts>reaper) to support our hypothesis that hemocytes are key players in the regulation of energy mobilization during oxidative stress (see also answer to referee 1). Loss of the regulatory role of hemocytes in the energy mobilization and redistribution leads to a decreased consumption of these triglycerides on PQ food compared to control flies (Figure 1J). In contrast, glycogen storage and mobilization from muscle, is not affected in hemocyte-deficient flies during oxidative stress (Figure 1L). Interestingly, free glucose levels are drastically reduced in hemocyte-deficient flies compared to controls, which could be due to insufficient energy mobilization from the fat body resulting in a higher susceptibility to oxidative stress (Figure 1K). This data supports our assumption that “functional” hemocytes are needed for effective response to oxidative stress, but this response has to be tightly balanced (see also new graphical summary).

      The overall conclusion of this work, as presented by the authors, is that Upd3 expression in hemocytes under oxidative stress leads to tissue wasting, whereas in fact it has been shown that excessive hemocyte-specific Upd3 activation leads to increased susceptibility to oxidative stress (whether due to increased tissue wasting remains a question). The DNA damage response ensures tight control of JNK-Upd3, which is important. However, what role naturally occurring Upd3 expression plays in a single hemocyte cluster during oxidative stress has not been tested. What if the energy mobilization induced by this naturally occurring Upd3 expression during oxidative stress is actually beneficial, as the authors themselves state in the abstract - for potential tissue repair? It would have been useful to clarify in the manuscript that the observed pathological effects are due to overactivation of Upd3 (an important finding), but this does not necessarily mean that the observed expression of Upd3 in one cluster of hemocytes causes the pathology.

      We agree with the referee that the pathological effects and increased susceptibility to oxidative stress are mediated by over-activated hemocytes and enhanced cytokine release, including upd3 during oxidative stress. We edited the revised manuscript accordingly to imply a “regulatory” role of upd3, which we suspect and suggest as an important mediator for inter-organ communication between hemocytes and fat body. Whereas our used model for oxidative stress (15mM Paraquat feeding) is a severe insult from which most of the flies will not recover, we could not account and test how upd3 might influence tissue repair after injury, insults and infection. We believe that this is an important factor, we aim to explore in future studies.

      Reviewer #3 (Public Review):

      In this study, Kierdorf and colleagues investigated the function of hemocytes in oxidative stress response and found that non-canonical DNA damage response (DDR) is critical for controlling JNK activity and the expression of cytokine unpaired3. Hemocyte-mediated expression of upd3 and JNK determines the susceptibility to oxidative stress and systemic energy metabolism required for animal survival, suggesting a new role for hemocytes in the direct mediation of stress response and animal survival.

      Strength of the study:

      1. This study demonstrates the role of hemocytes in oxidative stress response in adults and provides novel insights into hemocytes in systemic stress response and animal homeostasis.

      2. The single-cell transcriptome profiling of adult hemocytes during Paraquat treatment, compared to controls, would be of broad interest to scientists in the field.

      We are grateful to these positive comments on our data and are excited that the referee pointed out the importance of our provided snRNA-seq analysis of hemocytes and other cell types during oxidative stress. In the revised, version we now extended this analysis and looked not only into hemocytes but also highlighted induced changes in the fat body (Figure 4).

      Weakness of the study:

      1. The authors claim that the non-canonical DNA damage response mechanism in hemocytes controls the susceptibility of animals through JNK and upd3 expression. However, the link between DDR-JNK/upd3 in oxidative stress response is incomplete and some of the descriptions do not match their data.

      In the revised manuscript, we aimed to strengthen the weaknesses pointed out by the referee. We now included additional genetic crosses to validate the connection of DDR signaling in hemocytes with upd3 release. For example, we added now survival studies where we show that upd3null mutation can rescue the higher susceptibility of flies with tefu and mei41 knockdown in hemocytes during oxidative stress. Furthermore, we added additional data to highlight the importance of hemocytes themselves as essential regulators of susceptibility to oxidative stress. We analyzed the hemocyte-deficient flies (crq-Gal80ts>reaper) for their triglyceride content and carbohydrate levels during oxidative stress (Figure 1 I-L). As outlined above, loss of hemocytes leads to a decreased consumption of these triglycerides on PQ food compared to control flies (Figure 1J). In contrast, glycogen storage and mobilization from muscle, is not affected in hemocyte-deficient flies during oxidative stress (Figure 1L). Interestingly, free glucose levels are drastically reduced in hemocyte-deficient flies, which could be due to insufficient energy mobilization from the fat body resulting in a higher susceptibility to oxidative stress (Figure 1K).

      1. The schematic diagram does not accurately represent the authors' findings and requires further modifications.

      We carefully revised the text throughout the manuscript describing our results and edited the graphical abstract to display that upd3 levels and hemocytes are essential to balance and modulate response to oxidative stress.

      Reviewer #1 (Recommendations For The Authors):

      The summary doesn't say too much about what the specific discoveries and results of the study are. The description is limited to just one sentence saying, "Here we describe the responses of hemocytes in adult Drosophila to oxidative stress and the essential role of non-canonical DNA damage repair activity in direct "responder" hemocytes to control JNK-mediated stress signaling, systemic levels of the cytokine upd3 and subsequently susceptibility to oxidative stress" which doesn't provide sufficient explanation of what the results were.

      In the revised version of our manuscript, we now provide further information for the reader to outline the findings of our study in a concise way in the summary.

      Reviewer #2 (Recommendations For The Authors):

      1. To strengthen the conclusion that the DDR response suppresses JNK, and thus Upd3, rescue of DDR by upd3 null mutation would help (knockdown by Hml>upd3IR might not work, RNAi seems problematic).

      We would like to thank the referee for this suggestion and included now a genetic experiment where we combined upd3null mutants with hemocyte-specific knockdown of mei-41 and tefu to test their susceptibility to oxidative stress. Our data indeed provide evidence that loss of upd3 rescues the higher susceptibility of flies with hemocyte-specific knockdown for tefu and mei-41 (Figure 6F). Furthermore, we see that upd3null mutants show a diminished susceptibility to oxidative stress compared to control flies (Figure 6F).

      1. To link the observed effects to systemic metabolic changes, it would be useful to measure glycogen and triglycerides in these flies as well:
      2. crq-Gal80ts>reaper to see what role hemocytes play in the observed metabolic changes.

      3. Hml-Upd3 overexpression and Upd3 null mutant (Upd3 RNAi seems to be problematic, we have similar experiences) to see if Upd3 overexpression leads to even more profound changes as suggested, and if Upd3 mutation at least partially suppresses the observed changes.

      We agree with the referee that analyzing the connection of hemocyte activation to metabolic changes should be demonstrated in our manuscript to support our claim that hemocytes are important regulators of energy mobilization during oxidative stress. Hence, we analyzed triglycerides and carbohydrate levels in hemocyte-deficient flies (crq-Gal80ts>reaper) during oxidative stress. Indeed, we found substantial differences in energy mobilization in these flies supporting the assumption that the higher susceptibility of hemocyte-deficient flies could be caused by substantial decrease in free glucose and inefficient lysis of triglycerides from the fat body (Figure 1I-K).

      1. To test whether the cause of the increased susceptibility to oxidative stress is due to Upd3 overactivation induced by DDR silencing, the authors should attempt to rescue DDR silencing with an Upd3 null mutation.

      The suggestion of the reviewer was included in the revised manuscript and as outlined above we now added this data set to our manuscript (Figure 6F). Indeed, we can now provide evidence that upd3null mutation rescues the higher susceptibility of flies with DDR knockdown in hemocytes.

      1. Lethality after PQ treatment varies widely (sometimes from 10 to 90%! as in Figure 5D) - is this normal? In some experiments the variability was much lower. In particular, Figure 5D is very problematic and for example the result with upd3 null mutant compared to control is not very convincing. This could be an important result to test whether Upd3, with normal expression likely coming from cluster 6, actually plays a beneficial role, whereas overexpression with Hml leads to pathology.

      We agree with the referee that it would be more convincing if the variation cross of survival experiments would be less. However, we included a lot of flies and vials in many individual experiments to test our hypothesis and variation in these survivals was always the case. These effects can be caused by many factors for example the amount of food intake by the flies, genetic background or inserted transgenes. The n-number is quite high across our survivals; so that we are convinced, the seen effects are valid. This reflects also the power of using Drosophila melanogaster as a model organism for such survivals. The high n-number in our data falls into a normal Gauss distribution with a distinct mean susceptibility between the genotypes analyzed.

      1. I like the conclusion at the end of the results: line 413: "We show that this oxidative stressmediated immune activation seems to be controlled by non-canonical DNA damage signaling resulting in JNK activation and subsequent upd3 expression, which can render the adult fly more susceptible to oxidative stress when it is over-activated." This is actually a more appropriate conclusion, but in the summary, introduction and discussion along with the overall schematic illustration, this is not actually stated as such, but rather as Upd3 released from cluster 6 causes the pathology. For example: line 435 "Hence, we postulate that hemocyte-derived upd3, most likely released by the activated plasmatocyte cluster C6 during oxidative stress in vivo and subsequently controlling energy mobilization and subsequent tissue wasting upon oxidative stress."

      We thank the referee for this suggestion and edited our manuscript and conclusions accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1. In Figure 2, the authors claim showed that PQ treatment changes the hemocyte clusters in a way that suppresses the conventional Hml+ or Pxn+ hemocytes (cluster1) while expanding hemocyte clusters enriched with metabolic genes such as Lpin, bmm etc. It is not clear whether these cells are comparable to the fat body and if these clusters express any of previously known hemocyte marker genes to claim that these are bona fide hemocytes.

      We now included a new analysis of our snRNA-seq data in Figure S4, where we clearly show that all identified hemocyte clusters do not have a fat body signature and are hemocytes, which seem to undergo metabolic adaptations (Figure S4A). Furthermore, we show that the identified fat body cells have a clear fat body signature (Figure S4B) and do not express specific hemocyte markers (Figure S4C).

      1. In Figure 4C, the authors showed that comet assays of isolated hemocytes result in a statistically significant increase in DNA damage in DDR-deficient flies before and after PQ treatment. However, the authors conclude that, in lines 324-328, the higher susceptibility of DDR-deficient flies is not due to an increase in DNA damage. To explicitly conclude that "non-canonical" DNA damage response, without any DNA damage, is specifically upregulated during PQ treatment, the authors require further support to exclude the potential activation of canonical DDR.

      The referee is correct that we do not provide direct evidence for non-canonical DNA damage signaling. Therefore, we also decided to tune down our statement here a bit and removed that claim from the title. Increase in DNA damage can of course also increase the non-canonical DNA damage signaling pathway, loss of DNA damage signaling genes such as tefu and mei-41 seem to only have minor impacts on the overall amount of DNA damage acquired in hemocytes by oxidative stress. We therefore concluded that the induction in immune activation is most unlikely only caused by increased DNA damage but might be connected to dysregulation in non-canonical DNA damage signaling. Canonical DNA damage signaling leads essentially to DDR, which could be slow in adult hemocytes because they post-mitotic, or to apoptosis, which we could not observe in the analyzed time window in our experiments. Hemocyte number remained stable over the 24h PQ treatment without reduction in cell number (Figure 1H).

      1. From Figure 4D-F, the authors showed that loss of DDR in hemocytes induces the expression of unpaired 2 and 3, Socs36E, which represent the JAK/STAT pathway, and thor, InR, Pepck in the InR pathway, and a JNK readout, puc. These results indicate that the DDR pathway normally inhibits the upd-mediated JAK/STAT activation upon PQ treatment, compared to wild-type animals during PQ treatment in Figure 1B-C, which in turn protects the animal during oxidative stress responses. However, the authors claim that "enhanced DNA damage boosts immune activation and therefore susceptibility to oxidative stress (lines 365-366); we show that this oxidative stress-mediated immune activation seems to be controlled by non-canonical DNA damage signaling resulting in JNK activation and subsequent upd3 expression (line 413-416)". These conclusions are not compatible with the authors' data and may require additional data to support or can be modified.

      In the revised manuscript, we carefully revised now the text and our statements that it seems that DNA damage signaling in hemocytes has regulatory or modulatory effect on the immune response during oxidative stress. Accordingly, we also adjusted our graphical summary. We agree with the referee and used the term “non-canonical” DNA damage signaling more carefully throughout the manuscript. The slight increase in DNA damage seen after PQ treatment can contribute to immune activation but seems to be not correlative to the induced cytokine levels or the susceptibility of the flies to oxidative stress.

      1. In Fig 1I, the authors showed that genetic ablation of hemocytes using UAS-repear induces susceptibility to PQ treatment. It is possible that inducing cell death in hemocytes itself causes the expression of cytokine upd3 or activates the JNK pathway to enhance the basal level of upd3/JNK even without PQ treatment. If this phenotype is solely mediated by the loss of hemocytes, the results should be repeated by reducing the number of hemocytes with alternative genetic backgrounds.

      In the different genotypes analyzed across our manuscript we did not detect cell death of hemocytes or a dramatic reduction in hemocytes number (see Figure 1H, Figure 5B, Figure 6C). The higher susceptibility if hemocyte-deficient flies during oxidative stress is most likely caused by the loss of their regulatory role during energy mobilization. We tested triglyceride levels in hemocyte-deficient flies and found a decreased triglyceride consumption (lipolysis), with reduced levels of circulating glucose levels. This findings support our hypothesis that hemocytes are needed to balance the response to oxidative stress. In contrast, the flies with DDR-deficient hemocytes show higher systemic cytokine levels, which most likely enhance energy mobilization from the fat body and therefore result in a higher susceptibility of the fly to oxidative stress. Hence, we claim that hemocytes and their regulation of systemic cytokine levels are important to balance the response to oxidative stress and guarantee the survival of the organism.

      1. Lethality of control animals in PQ treatment is variable and it is hard to estimate the effect of animal susceptibility during 15mM PQ feeding. For example, Fig1A shows that control animals exhibit ~10% death during 15mM PQ which is further enhanced by crq-Gal80>reaper expression to 40% (Fig 1I). However, in Fig 5D-E, the basal lethality of wild-type controls already reaches 40~50%, which makes them hard to compare with other genetic manipulations. Related to this, the authors demonstrated that the expression of upd3 in hemocytes is sufficient to aggravate animal survival upon PQ treatment; however, upd3 null mutants do not rescue the lethality, which indicates that upd3 is not required for hampering animal mortality. These data need to be revisited and analyzed.

      As outlined above, we find the variability of susceptibility to oxidative stress across all of our experiments. This could be due to different effects such as food intake but also transgene insertion and genetic background. Crq-gal80ts>reaper flies are healthy, but show a shortened life span on normal food (Kierdorf et al., 2020) due to enhanced loss of proteostasis in muscles. We show in the revised manuscript that these flies have a higher susceptibility to oxidative stress and that this effect could be mediated by defects in energy mobilization and redistribution as shown by less triglyceride lysis from the fat body and decreasing levels in free glucose. This would explain the high mortality rate of these flies at 7 days after eclosion. Paraquat treatment (15mM) is a severe inducer of oxidative stress, which results in death of most flies when they are maintained for longer time windows on PQ food. Hence, it is a model, which is not suitable to examine and monitor recovery from this detrimental insult. upd3null mutants were extensively reexamined in this manuscript, and even though we could not see a full protection of these flies from oxidative stress induced death, we found a reduced susceptibility compared to control flies (Figure 6F). Furthermore, when we combined upd3null mutants with flies deficient for tefu and mei-41 in hemocytes, the increased susceptibility to oxidative stress was rescued.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) IR reduced mature spines (mushroom) but not immature spines (filopodia) in vitro at 14 days post-2 Gy IR. Please check previous reports by C. Limoli and J. Fike groups (in vivo dendritic spine characterization following proton or photon irradiation).

      We appreciate the reviewer's comments. Although IR did not reduce filopodia in the previous study, there are no prior studies using the same time points as ours, 4 days post-2 Gy IR. Additionally, according to other previous studies, PAK3 inhibition led to an increase in filopodia (J Neurosci. 2004 Dec 1;24(48):10816-25), and IR increased thin-type spines and decreased mushroom-type spines at the 7 days after 2 Gy IR (PLoS One. 2012;7(7):e40844). Considering these findings, we believe that the increase in filopodia observed in our study is due to the short-term effects of IR and the consequent PAK3 downregulation. We added the description regarding time point in “Materials and Methods”.

      Page 20, line 439-440; "In the analysis of molecular alterations, cultured neurons were sampled 4 days after irradiation."

      2) Does IR (2 Gy or 5 x 2 Gy) affect the viability in vitro? This could be linked with reduced dendritic structure and F/G-actin ratios.

      As the reviewer mentioned, we evaluated neuronal viability following 2 Gy IR exposure. Consequently, approximately 80% of the cells survived after the IR exposure (Fig. 4A). Although we agree that cognitive abilities may decrease due to the neuron death after IR, we identified that PAK3 overexpression restores the F/G-actin ratio in surviving neurons after IR, suggesting the IR-induced alterations at least in neuronal plasticity are mainly regulated by PAK3 rather than IR itself. Additionally, neurons that survive after IR maintain similar levels of NeuN, a mature neuron marker (Fig. S5A). We added the description regarding additional experiments in “Results”.

      Page 10, line 206-209; "IR decreased neuronal viability in human differentiated neurons, with approximately 80% survival (Fig 4A). However, IR did not alter the mature neuronal marker, NeuN (Fig S5A). These results indicate that IR-induced disruption of PAK3 signaling occurs in surviving neurons following irradiation. Consistent with previous murine neuron data, IR reduced the F/G-actin ratio (Fig. 4B)."

      3) The authors state, "Overall, these results indicated that IR could induce cognitive impairment by disrupting dendritic spine maturation." Dendritic spine damage may not be the only factor contributing to cognitive dysfunction (neural circuit function, neuroinflammation, astrogliosis, etc., needs to be discussed).

      We agree with the reviewer's comment that dendritic spine damage may not be the only factor contributing to cognitive impairment. Since our study has only confirmed the effects on dendritic spines as part of the complex impact of radiation, we added the description of the necessity for further research on various factors related to IR-induced cognitive dysfunction in “Discussion”.

      Page 15, line 317-324; >The dendritic spine is one of the major factors influencing cognitive function. In our study, we observed changes in dendritic spines due to radiation exposure, followed by subsequent cognitive impairment. Additionally, we established that regulating PAK3, which affects dendritic spine maturation, can modulate radiation-induced cognitive dysfunction. However, considering that radiation can impact the entire nervous system and that neural circuit function, neuroinflammation, and astrogliosis can also influence cognitive function (Makale et al., 2017), future studies is needed to investigate the mechanisms of factors beyond dendritic spine changes caused by radiation.>

      4) Fig 2 and Suppl Fig S2. The in vivo results should be placed in the manuscript Fig 2 as this would provide relevant physiological information on PAK3 downregulation and reduced dendritic spines and cognition.

      We appreciated the reviewer's comment. As the reviewer mentioned, we rearranged Fig S2C to Fig 2H.

      Page 33, line 825-827; "(H) Left: the protein levels of phosphorylated LIMK1, LIMK1, phosphorylated cofilin, and cofilin after IR in frontal cortex and hippocampus. Right: each western blot bands are quantified by ImageJ."

      5) miR-206-3p expression was found to be elevated post-IR in the human and mouse neurons in vitro. This was correlated with IR-induced downregulation of PAK3 using an antagonist miR experiment, wherein PAK3, LIMK1, and downstream makers were restored in the irradiated neurons. MiR-206-3p upregulation data should also be confirmed in vivo using an irradiated mouse brain to correlate the cognitive dysfunction timepoint.

      We observed IR-induced miR-206-3p upregulation (Fig 6D) and consequent PAK3 downregulation (Fig 6G) in vivo at 4 days after IR. Considering that the antagomiR significantly restores cognitive dysfunction (Fig 6E) at 1-3 days after IR, we suppose the expression of miR-206-3p would be consistently increased by IR, suppressing the PAK3 signaling pathway and leading to cognitive dysfunction.

      Page 33, line 825-827; "(H) Left: the protein levels of phosphorylated LIMK1, LIMK1, phosphorylated cofilin, and cofilin after IR in frontal cortex and hippocampus. Right: each western blot bands are quantified by ImageJ."

      6) Fig 5 shows that in vivo administration of antago-miR-206 reversed IR-induced upregulation of miR-206, reductions in PAK3 and downstream markers, and, importantly, reversed cognitive deficits induced by IR. This data should be supported by in vivo staining for important dendritic markers, including cofillin/p-cofilin, PSD-95, F- and G-actin within the hippocampal and PFC regions.

      We appreciated the reviewer's comment. Based on previous studies on intranasal administration, the substance is delivered to the PFC and hippocampus through the olfactory pathway in both humans and mice (Exp Neurobiol. 2020 Dec 31;29(6):453-469, Stem Cells. 2021 Dec;39(12):1589-1600). Even though we did not show direct evidence that antagomiR-206 is delivered to both regions, we confirmed its actual delivery to the brain using Cy5 fluorescence and examined PAK3 signaling (Fig. 6G) and the F/G-actin ratio (Fig. 6H) in both regions. To show the reliability of the tissue separation, we added a detailed description of the tissue separation method in “Materials and Methods”.

      Page 19, line 410-423; "Dissection of prefrontal cortex and hippocampus. The dissection of mouse brain regions was performed following a previous study (Spijker, 2011). First, to obtain the hippocampal region, we gently held the brain and opened the forceps, slowly separating the cortical halves. Once an opening had been created along the midline for approximately 60%, we directed the forceps (in the closed position) counterclockwise by 30–40° to expose the left cortex from the hippocampus, repeatedly opening the forceps as necessary. We then repeated the same procedure for the right cortex by pointing the forceps in a 30–40° clockwise direction until the upper part of the hippocampus became visible. At the most caudal part of the hippocampus/cortex boundary, we moved the small forceps through the cortex and used them to separate the hippocampus from the fornix. After removing the hippocampus, we used the large forceps to fold the cortex back into its original position. Subsequently, we placed the brain with the dorsal side and cut coronal sections to reveal the prefrontal cortex and striatum at different levels. Using a sharp razor blade, we made the first cut to remove the olfactory bulb and cut the section containing the prefrontal cortex."

      7) Does this change in the F/G actin ratios, Cofillin, and/or p-Cofillin impact any particular neuronal subtypes, including excitatory, inhibitory or any particular layers of major neurons? This point can't be appreciated from the WB data.

      The excitatory and inhibitory neurons do play crucial roles in cognitive function. In terms of response to radiation, excitatory neurons are more likely to be responsive. A previous study showed that spike firing and excitatory synaptic input were reduced by cranial irradiation, while inhibitory input was increased (Neural Regen Res. 2022 Oct;17(10):2253-2259). Additionally, PSD-95 is localized to dense specialized regions within the dendritic spines of excitatory synapses and is associated with synaptic plasticity (Neuron. 2001 Aug 2;31(2):289-303). Indeed, IR decreases the mRNA level of PSD-95 in differentiated human neurons (Fig S5A). Considering the previous research and our data, IR-induced PAK3 downregulation may occur primarily in excitatory neurons.

      8) Discussion: "In this study, we investigated the effect of cranial irradiation on cognitive function and the underlying mechanisms in a mouse model." Please change this statement to "....underlying neuronal mechanisms using in vivo and in vitro models."

      We appreciate the reviewer’s comment. We replaced ‘mechanisms in a mouse model’ with ‘neuronal mechanisms using in vivo and in vitro models.’ in the manuscript.

      Page 14, line 283; "In this study, we investigated the effect of cranial irradiation on cognitive function and the underlying neuronal mechanisms using in vivo and in vitro models."

      9) Discussion: "Furthermore, our study identifies a potential mechanism underlying the cognitive impairment associated with cranial irradiation, which downregulates PAK3 expression." This statement should be supported by the in vivo immunofluorescence data for the synaptic markers, including cofilin, p-cofillin, PSD-95, and F/G-actin staining.

      Even though we did not show the in vivo immunofluorescence data for the synaptic markers, we examined PAK3 signaling (Fig. 6G) and the F/G-actin ratio (Fig. 6H) in the hippocampal and PFC regions. Additionally, according to The Allen Mouse Brain Atlas, PAK3 is mainly expressed in the PFC and hippocampus regions (Fig S2A), suggesting that IR-induced PAK3 downregulation in both regions may have a significant impact on the cognitive impairment. Considering these data, we strongly believe that cranial irradiation downregulates PAK3 levels in the PFC and hippocampus, thus inducing cognitive impairment.

      10) miR modulate function by affecting multiple targets. The other potential neuronal and non-neuronal targets for miR-206-3p were not discussed. This possibility should be confirmed using relevant markers.

      According to the reviewer’s comment, we performed real-time PCR to examine whether miR-206-3p affects the expressions of neuronal and non-neuronal markers (Fig S5A and S5B). As a result, the post-synaptic marker, PSD-95, was reduced by miR-206-3p treatment. However, a mature neuronal marker (NeuN) and non-neuronal markers (GFAP and IBA-1) did not change upon miR-206-3p treatment. We added the related description in “Results”.

      Page 12, line 240-243; "Additionally, the post-synaptic marker, PSD-95, was decreased by miR-206-3p treatment. However, a mature neuronal marker (NeuN) and non-neuronal markers (GFAP and IBA-1) were not alterd upon miR-206-3p treatment (Fig. S5A and S5B)."

      11) Irradiation procedure: Please confirm that sham (0 Gy)-irradiated mice were also anesthetized for a similar procedure carried out for the 2 Gy or fractionated irradiation.

      According to the reviewer's comment, we added a description of sham (0 Gy)-irradiated mice in “Materials and Methods”.

      Page 17, line 359-360; "All mice, including those in the sham (0 Gy) group, were anesthetized with an intraperitoneal (i.p.) injection of zoletil (5 mg/10 g) daily for five days."

      12) 24 mL volume (antagomir treatment) via intra-nasal delivery is a rather unusually high volume. Please clarify if such a procedure was approved by the regulatory committee and if 24 mL volume led to any hemodilution.

      We appreciate the reviewer's comment. We referred to the protocol of intranasal administration from a previous study (Mol Ther. 2021 Dec 1;29(12):3465-3483), and made an error in specifying the miRNA unit. We corrected it from mL to μL.

      Page 19, line 399-402; "According to the manufacturer’s instructions and previous study (Zhou et al., 2021), 40 nmol of antagomiR-206-3p (sequence: 5’-CCACACACUUCCUUACAUUCCA -3’) or antagomiR-NC (the antagomiR negative control, its antisense chain sequence: 5’-UCUACUCUUUCUAGGAGGUUGUGA-3’) was dissolved in 1 mL of RNase-free water."

      Page 19, line 402-403; "A total of 24 μL of the solution (1 nmol per one mouse) was instilled with a pipette, alternately into the left and right nostrils (1 μL/time), with an interval of 3–5 min."

      Reviewer #2

      1) To show the relevance of PAK3 in Radiation-induced neurocognitive decrements, I suggest using 10 Gy WBI, group of 15-16 animals and long-term follow up >2 months post-RT.

      We appreciate the reviewer's comment. Biologically Effective Dose (BED) represents the most accurate quantitative prediction of biological effects of radiation. However, our study aimed to analyze the mechanisms underlying cognitive dysfunction induced not by a total dose of 10 Gy but rather by repeating 2 Gy fractions, which is used in clinical practice such as prophylactic cranial irradiation. In this regard, the administration of 2 Gy fractions holds significant relevance in our research.

      In statistical analysis, a larger sample size tends to be more accurate. However, we determined the sample size based on ethical considerations in animal research, taking into account the parameter (Effect size: 1.2 / alpha value: 0.05 / Group: 3 groups), resulting in a total sample size of 15, five mice per group (G Power 3.1 software). Despite the relatively small sample size, radiation exposure significantly reduced PAK3 expression with marginal variance, thereby inducing cognitive impairment.

      As the reviewer mentioned, the long-term effect (>2 months) of WBI may show more severe cognitive impairment, considering results from the previous studies. Nevertheless, previous research has revealed a correlation between mouse age and human age, suggesting that 2 months in mice is roughly equivalent to 5 years in humans (Life Sci. 2020 Feb 1;242:117242). Due to the substantial difference in biological time between humans and mice, 2 months in mice might be an excessive long-term period. Additionally, our study aims to investigate short-term changes rather than long-term effects. It is clear that IR-induced PAK3 downregulation induces cognitive impairment at least in the short-term period, and we believe that our findings may contribute to preventing serious neuronal dysfunction as the long-term side effects of PCI.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      “Peng et al develop a computational method to predict/rank transcription factors (TFs) according to their likelihood of being pioneer transcription factors--factors that are capable of binding nucleosomes--using ChIP-seq for 225 human transcription factors, MNase-seq and DNase-seq data from five cell lines. The authors developed relatively straightforward, easy to interpret computational methods that leverage the potential for MNase-seq to enable relatively precise identification of the nucleosome dyad. Using an established smoothing approach and local peak identification methods to estimate positions together with identification of ChIP-seq peaks and motifs within those peaks which they referred to as "ChIP-seq motifs", they were able to quantify "motif profiles" and their density in nucleosome regions (NRs) and nucleosome free regions (NFRs) relative to their estimated nucleosome dyad positions. Using these profiles, they arrived at an odd-ratio based motif enrichment score along with a Fisher's exact test to assess the odds and significance that a given transcription factor's ChIP-seq motifs are enriched in NRs compared to NFRs, hence, its potential to be a pioneer transcription factor. They showed that known pioneer transcription factors had among the highest enrichment scores, and they could identify 32 relatively novel pioneer TFs with high enrichment scores and relatively high expression in their corresponding cell line. They used multiple validation approaches including (1) calculating the ROC-AUC associated with their enrichment score based on 16 known pioneer TFs among their 225 TFs which they used as positives and the remaining TFs (among the 225) as negatives; (2) use of the literature to note that known pioneer TFs that acted as key regulators of embryonic stem cell differentiation had a highest enrichment scores; (3) comparison of their enrichments scores to three classes of TFs defined by protein microarray and electromobility shift assays (1. strong binder to free and nucleosomal DNA, 2. weak binder to free and nucleosomal DNA, 3. strong binding to free but not nucleosomal DNA); and (4) correlation between their calculated TF motif nucleosome end/dyad binding ratio and relevant data from an NCAP-SELEX experiment. They also characterize the spatial distribution of TF motif binding relative to the dyad by (1) correlating TF motif density and nucleosome occupancy and (2) clustering TF motif binding profiles relative to their distance from the dyad and identifying 6 clusters.

      The strengths of this paper are the use of MNase-seq data to define relatively precise dyad positions and ChIP-seq data together with motif analysis to arrive at relatively accurate TF binding profiles relative to dyad positions in NRs as well as in NFRs. This allowed them to use a relatively simple odds ratio based enrichment score which performs well in identifying known pioneer TFs. Moreover, their validation approaches either produced highly significant or reasonable, trending results.

      The weaknesses of the paper are relatively minor. The most significant one is that they used ROC-AUC to assess the prediction accuracy of their enrichment score on a highly imbalanced dataset with 16 positives and 209 negatives. ROC-AUC is known to be a misleading prediction measure on highly imbalanced data. This is mitigated by the fact that they find an AUC = 0.94 for their best case. Thus, they're likely to find good results using a more appropriate performance measure for imbalanced data. Another minor point is that they did not associate their enrichment score (focus of Figure 2) with their correlation coefficients of TF motif density and nucleosome occupancy (focus of Figure 3). Finally, while the manuscript was clearly written, some parts of the Methods section could have been made more clear so that their approaches could be reproduced. The description of the NCAP-SELEX method could have also been more clear for a reader not familiar with this approach.”

      Reviewer #2 (Public Review):

      “In this study, the authors utilize a compendium of public genomic data to identify transcription factors (TF) that can identify their DNA binding motifs in the presence of nuclosome-wrapped chromatin and convert the chromatin to open chromatin. This class of TFs are termed Pioneer TFs (PTFs). A major strength of the study is the concept, whose premise is that motifs bound by PTFs (assessed by ChIP-seq for the respective TFs) should be present in both "closed" nucleosome wrapped DNA regions (measured by MNase-seq) as well as open regions (measured by DNAseI-seq) because the PTFs are able to open the chromatin. Use of multiple ENCODE cell lines, including the H1 stem cell line, enabled the authors to assess if binding at motifs changes from closed to open. Typical, non-PTF TFs are expected to only bind motifs in open chromatin regions (measured by DNaseI-seq) and not in regions closed in any cell type. This study contributes to the field a validation of PTFs that are already known to have pioneering activity and presents an interesting approach to quantify PTF activity.

      For this reviewer, there were a few notable limitations. One was the uncertainty regarding whether expression of the respective TFs across cell types was taken into account. This would help inform if a TF would be able to open chromatin. Another limitation was the cell types used. While understandable that these cell types were used, because of their deep epigenetic phenotyping and public availability, they are mostly transformed and do not bear close similarity to lineages in a healthy organism. Next, the methods used to identify PTFs were not made available in an easy-to-use tool for other researchers who may seek to identify PTFs in their cell type(s) of interest. Lastly, some terms used were not defined explicitly (e.g., meaning of dyads) and the language in the manuscript was often difficult to follow and contained improper English grammar.”

      Reviewer #3 (Public Review):

      Peng et al. designed a computational framework for identifying pioneer factors using epigenomic data from five cell types. The identification of pioneer factors is important for our understanding of the epigenetic and transcriptional regulation of cells. A computational approach toward this goal can significantly reduce the burden of labor-intensive experimental validation. Nevertheless, there are several caveats in the current analysis which may require some modification of the computational methods and additional analysis to maximize the confidence of the pioneer factor prediction results.

      A key consideration that arises during this review is that the current analysis anchors on H1 ESC and therefore may have biased the results toward the identification of pioneer factors that are relevant to the four other differentiated cell types. The low ranking of Yamanaka factors and known pioneer factors of NFYs and ESRRB may be due to the setup of the computational framework. Analysis should be repeated by using each of every cell type as an anchor for validating the reproducibility of the pioneer factors found so far and also to investigate whether TFs related to ESC identity (e.g. Yamanaka factors, NFYs and ESRRB) would show significant changes in their ranking. Given the potential cell type specificity of the pioneer factors, the extension to more cell types appears to be important for further demonstrating the utility of the computational framework.

      Author Response: We thank all reviewers for their thoughtful and constructive comments and suggestions, which helped us to strengthen our paper. Following the suggestions, we have performed additional analysis to address the reviewer’s comments and the detailed responses are itemized below.

      Reviewer #1 (Recommendations For The Authors):

      1. The authors should generate precision-recall curves in addition to (or replacing) the ROC-AUC curves shown Figure 2c. They should also calculate the precision-recall AUC and use that as their measure of enrichment score predication accuracy. Precision-recall curves and AUC are more appropriate for imbalanced positive-negative data as is the case in this study.

      Response: Following the reviewer’s suggestion, we have performed precision-recall analysis and calculated Matthews correlation coefficients (MCC) (Figure 2). We have further expanded our validation set to 32 known pioneer transcription factors (Supplementary Table 5) and compared the performance of enrichment score using different test sets (Supplementary Table 10). We have attained the highest ROC = 0.71, pr-ROC-AUC = 0.37 and MCC = 0.31 for Test set1 and ROC = 0.92, pr-ROC-AUC = 0.45 and MCC=0.49 for Test set2 (Supplementary Table 11).

      1. The authors should generate scatter plots of their TF enrichment scores (focus of Figure 2) and motif-density nucleosome occupancy Pearson correlation coefficients (focus of Figure 3) and calculate the corresponding correlation coefficient and p-value.

      Response: We observed a weak but statistically significant correlation between the enrichment scores and the correlation coefficient values (R=0.32 and p-value=1e-9)).

      1. The authors should write their computational methods in the Methods section in such a way that a skilled bioinformatician could reproduce their results. This does not require a major rewrite. They are very close. One example of this is that a minimum distance between neighboring local maxima of the smoothed dyad counts was set to 150 bps. How was this algorithmically done? Suppress/ignore weaker local maxima that are within 150bp of other stronger local maxima?

      Response: We have revised the Methods section to make it easier to follow and to reproduce the results. For identifying the local maxima, we have used the bwtool with the parameters ‘‘find local-extrema -maxima -min-sep=150’’ so that local maxima located within 150 bp of another neighboring maxima was ignored to avoid local clusters of extrema.

      1. Describe the NCAP-SELEX method more clearly so that a reader not familiar with this approach doesn't have to look it up. This can be brief.

      Response: Following the reviewer’s suggestion, we have added a detailed description of the NCAP-SELEX method.

      Reviewer #2 (Recommendations For The Authors):

      To improve the manuscript:

      1. The grammar in the manuscript should be read for accuracy to improve readability and clarify the exact meaning.

      Response: We have improved the grammar and have clarified the meaning of terms.

      1. The exact meaning of dyads needs to be defined up front. In some places seems to mean pairs of reads and others seems to refer to nucleosome positioning.

      Response: The meaning of “dyads” has been clarified. The dyad positions were determined by the midpoints of the mapped reads in MNase-seq data and refer to the center of the nucleosomal DNA.

      1. Meaning of NCAP-SELEX needs to be defined before use of acronym.

      Response: We have defined it in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1. The authors found that Yamanaka factors and several other known pioneer factors (e.g. NFY-A, NFY-B, and ESRRB) are lowly ranked in their pioneer factor analysis. Since the analysis was performed by anchoring on H1 ESCs and comparing them to the other four cell lines, the results may only be relevant to differentiated cell types. It is therefore not unexpected that the Yamanaka factors which are important for iPSC reprogramming and the NFYs which have been experimentally shown to replace nucleosomes for maintaining ESC identity from differentiation (PMID: 25132174; PMID: 31296853) would not be enriched in the analysis. I suggest the authors repeat their analysis by anchoring on differentiated cell types and validate the reproducibility of the pioneer factors found so far and also investigate whether TFs related to ESC identity (e.g. Yamanaka factors, NFYs, and ESRRB) would show significant changes in their ranking as pioneer factors.

      Response: Following reviewer’s suggestions, we have repeated the enrichment analysis by redefining differentially open regions as those closed in differentiated cell lines (HepG2, HeLa-S3, MCF-7 and K562) and open in H1 embryonic cell line (Supplementary Figure 6). The results indicate that most known PTFs still showed significantly higher enrichment scores compared with other TFs especially for FOXA, GATA and CEBPB families. Interestingly, ESSRB and Yamanaka pioneer factor POU5F1 (OCT4) have also shown significantly high enrichment scores in this analysis (Supplementary Figure 6). This could be explained by the roles of Yamanaka factors in cellular reprogramming – they reprogram somatic differentiated cells into induced pluripotent stem cells.

      1. The authors mentioned the cell-type-specificity of TFs been pioneer factors and the example of CTCF was given. This point relates closely to above point 1 and, in particular, the correlation analysis of Yamanaka factors and NFYs supports their binding to nucleosomes. Together, these results highlight potential caveats of the current analysis in that the analysis is likely to be limited to the available cell types and may be affected by which cell type was used as the anchor cell type.

      Response: Differentiated and embryonic cell lines were used to ask specific question about the functional roles of PTFs for cell differentiation and stem cell reprogramming. In the revised manuscript, we have clarified this point and separated our data set into three different sets of PTFs with different functions (Supplementary table 10). We agree with the reviewer, it would be nice to have more data from other cell lines but unfortunately the matching between different Chip-seq, DNAase-seq and Mnase-seq data sets imposes strict limitations.

      1. The differential and conserved open chromatin regions are defined based on overlaps found between five cell types using their DNase-seq mapping profiles. The limitation of this definition is its lack of quantitativeness. For example, a chromatin region can have more than 80% overlaps between H1 and another cell type but the level of accessibility (e.g. number of reads mapped to this region) can be quite different between cell types. In such a case, I think it is still more appropriate to define such a region as a differential open chromatin region. The author should explore whether using a more quantitative definition would improve the identification and categorization of differential and conserved open chromatin regions.

      Response: we thank the reviewer for these suggestions. In the revised version, we have clarified the definition and further explored different thresholds in defining the differentially and conserved open chromatin regions in enrichment analysis (Supplementary Figure 8). Our results were not significantly affected when different thresholds are applied.

      1. While it is mentioned that H3K27ac and H3K4me1 ChIP-seq data from the five human cell lines were used in the study, the information on how enhancers are mapped/defined in these cell types is lacking.

      Response: We have clarified the definition in the text. The enhancer regions were identified as the open chromatin regions overlapped with both H3K27ac and H3K4me1 ChIP-seq narrow peaks. We have elucidated the how enhancers are defined in the methods sections. In addition, we have performed additional enrichment analysis using NRs located on differentially active enhancer regions and NDRs located on conserved active enhancer regions (Supplementary Figure 7) between H1 embryonic cell line and any other differentiated cell lines and the performance of enrichment scores in PTF classification was slightly worse compared with those calculated from differentially and conserved open chromatin regions

      1. The description of "genome-wide mapping of transcription factor binding sites" is unclear. For example, what does it mean by "In total, ChIP-seq data for 225 transcription factors could be matched with MNase-seq data" and why is this step needed? I would assume that a typical approach for mapping TF binding sites in the five cell types is to obtain the ChIP-seq data for each TF in each cell type and perform sequence alignment to the reference genome. The procedure described by the authors needs a clearer motivation and justification.

      Responses: This sentence refers to matching between the ChIP-seq and MNase-seq data from the same cell type. We explain in detail how ChIP-seq data is processed. We have clarified this in the paper.

      1. I also suggest the authors clearly justify the use of ROC analyses given that only a ground truth of positive (e.g. 16 known pioneer factors) is available and the "other transcription factors" considered as negative in the analysis in fact are expected to contain unknown pioneer factors and their identification should not be minimized (which lead to the maximization of ROC) by the analysis procedure.

      Responses: (This is also pointed by review 1). The fact that unknown transcription factors are treated as negatives actually leads to the lower reported ROC scores (more hits considered to be false positives), not to their maximization. That is the reason we mentioned in the paper that the obtained ROC scores can be considered as lower bound estimates. In addition, we have expanded our validation sets to 32 known pioneer factors and compiled three sets of PTFs for validations. Following the reviewers’ suggestions, we have further performed precision-recall (PR) analysis and calculated the Matthews correlation coefficient (MCC) using three sets of PTFs for validation (Supplementary Table 11 and Supplementary Figure 2).

      1. The analysis of pioneer transcription factor binding sites lacks insight. What can we learn these this analysis other than TFs from the same families are likely to be clustered in the same group?

      Responses: We thank the review for pointing out it and have added a more detailed discussion of these results in the revised manuscript. Very few PTF-nucleosome structural complexes have currently been solved so far and the binding modes of majority of PTFs with nucleosomes still remain unknow. Our analysis has identified six distinctive clusters of TF binding profiles with nucleosomal DNA, which could provide insight into the binding modes of PTFs with nucleosome. These clusters point to the diversity of binding motifs where transcription factors belonging to the same cluster may also exhibit potential competitive binding.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The co-authors and I would like to thank you for overseeing the review, and to thank you and the reviewers for your constructive feedback about the manuscript. Below, we have summarized each suggestion for improving the manuscript and provided our response. In addition, the abstract was revised to include findings from physiological studies of mice with a single Numb cKO and to provide a more concise and conservative concluding statement.

      Reviewer #1 (Recommendations for The Authors):

      1. While the specificity of the observed muscle phenotypes seems clear, the subsequent molecular analysis of Numb protein interactors does not seem to consider the potential involvement of Numb-like. The authors should demonstrate the relative expression levels of Numb and Numb-like in the models used, and establish the specificity of the antibodies used in IP, western and staining experiments.

      Response: Perhaps the most convincing evidence that the anti-Numb antibody did not pull down Numb-like is that this protein was not detected among immunoprecipitated protein complexes pulled down by the anti-Numb antibody used. The antibody used in the immunoprecipitation was validated by the supplier and was previously reported to immunoprecipitate Numb [1, 2]. We previously demonstrated that a morpholino against Numb mRNA almost completely eliminated the band detected by this antibody and that this band was at the expected molecular weight [ref]. In our hands, mRNA levels for Numb-like in skeletal muscle are 5-10-fold lower than those for Numb [3]. We have been unable to detect Numb-like protein in healthy adult skeletal muscle by immunoblotting or immunofluorescence staining. Taking all of these findings together, it seems unlikely that the antibodies used for immunoprecipitating Numb-protein complexes pulls down Numb-like.

      1. The authors use PCR to investigate Numb isoform expression and conclude that p65 is likely the dominant protein isoform expressed. While this agrees with the single band observed in Supp Figure 4A, a positive control for exon 9 excluded and included isoforms in the PCR reactions would strengthen this conclusion.

      Response: The amplicons shown in Supplemental 4 were sequenced. The clones corresponded to the isoforms with the exon 3 present or removed. No amplicons containing exon 9 were detected. The following sentence was added to the Analysis of Splice Variants section of Methods to address this point: “PCR products were cloned using the TOPO TA cloning system (ThermoFisher) and multiple resulting clones were sequenced to confirm that the expected products were generated.”

      1. PCR analysis of total Numb and Numb-like expression levels are not shown. This is important given the specificity of the Numb antibodies used for AP-MS experiments are not described and some Numb antibodies are well known to also recognize Numb-like. Two different Numb antibodies were used for Western and immunoprecipitation but the specificity for Numb and Numb-like is not described. In particular, does the antibody used in the AP-MS experiment recognize both Numb and Numb-like? Supplementary Table 1 does not list Numb or Numb-like, but presumably peptides were identified?

      Response: As noted above, the specificity of anti-Numb antibodies was confirmed in previous studies [3]. Importantly, Numb-like mRNA levels are 5-10-fold lower than Numb mRNA, and NumbL protein is undetectable in healthy adult skeletal muscle by Western. The physiology data reported in this manuscript supports the conclusion that a single KO of Numb is sufficient to recapitulate the physiological phenotype of Numb/Numb-like KO . We therefore reason that the majority, if not all, of the physiological contribution of these proteins to muscle contractility due to Numb (Fig. 1).

      1. The validation experiment used the same Numb antibody for immunoprecipitation, immunoblotted with Septin 7. A reciprocal IP of Septin 7 and blotted with Numb should be performed. In addition, a Numb-like IP or immunoblot would also be useful to demonstrate the specificity of the interaction. Efforts to map the interaction between Numb and Septin 7 would be useful to demonstrate specificity of the interaction and strategies to establish the biological relevance of the interaction.

      Response: We agree with the reviewer and attempted several IPs with anti-Septin7 antibodies. These were unsuccessful. In a new collaboration, Dr. Italo Cavini (University of Sao Paulo) has used machine-learning-based approaches to model binding between Numb and several septins, including Septin 7. The analysis suggests that binding of Numb with septins involves a domain of Numb that has not yet been ascribed a function in protein-protein interactions. These computational predictions require experimental validation but provide rational starting point for experiments to define the domains responsible for these interactions. Such experiments were included in our recent NIH R01 renewal application. We hope to be able to report on results of confirmatory experiments of these computational models in the future.

      1. Other septins were identified in the AP-MS experiment and might have been anticipated to also be disrupted by Numb/Numb-like deletion. Are these septins known to interact in a complex?

      Response: This is an excellent question. Septins have conserved motifs providing a clear reason to imagine that many different mammalian septins could directly interact with Numb. Septins form heterooligomers consisting of complexes formed by 3, 6 or 8 septins [4]. It is likely that when Numb binds to one septin, antibodies against Numb pull down other septins present in the septin oligomer to which Numb is bound. The following paragraph was added to the discussion: “Our findings suggest that Numb may also interact with other septins such as septins 2, 9 and 10, which were also identified with a high level of confidence as Numb interacting proteins by our LC/MS/MS analysis. Our data to not allow us to determine if Numb binds directly to these septins. Septins contain highly conserved regions, and, consequently, if one such region of septin 7 interacts with Numb, then many septins would be expected to directly bind Numb through the same domain. However, because septins self-oligomerize, is possible that when Numb binds to one septin, antibodies against Numb could also pull down other septins present in the septin oligomer to which Numb is bound regardless of whether or not they are also bound by Numb. “

      1. The text for Figure 5 describes analysis of Septin localization in inducible Numb/Numb-like cKO muscle, but the figure indicates only Numb is knocked out. Please clarify.

      Response: We apologize for this oversight on our part. The Legend to Figure 5 has been corrected.

      1. Supplementary Figure 2 seems to show that TAM treatment increases Numb expression. Please clarify. Also, please correct reference 9.

      Response: The figure was incorrectly labeled. We apologize for this oversight and have corrected the figure in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Overall, the manuscript is well written. I do have a few minor issues/concerns, which are detailed below.

      Abstract: Please be a little more specific regarding which where the tissue came from (i.e. humans, mice, cell) when referring to your previous studies.

      Response: The abstract has been revised as requested.

      Introduction: Please be more specific regarding the technique used for detecting ultrastructural changes. I assume it was done with TEM, but the reference is listed as an "invalid citation" in your reference list.

      Response: The introduction was revised as requested and the citation was updated to reference a valid citation.

      Methods / Numb Co-Immunoprecipitation: Please indicated the level of confluency of the C2C12 cells as this will alter gene expression.

      Response: As indicated in the updated Methods section, confluent C2C12 cells were switched to differentiation media (low serum) for seven days. When harvested, the cells had differentiated and fused into myotubes.

      Methods / Immunohistochemical Staining: The first sentence needs to be edited regarding plurality and grammar.

      Response: Thank you for this comment. The text was revised accordingly.

      Results / GWAS and WGS Identify...: Please spell out phosphodiesterase (I assume) for PDE4D

      Response: This change was incorporated in the text.

      References cited:

      1. Wu, M., et al., Epicardial spindle orientation controls cell entry into the myocardium. Dev Cell, 2010. 19(1): p. 114-25.

      2. Garcia-Heredia, J.M. and A. Carnero, The cargo protein MAP17 (PDZK1IP1) regulates the immune microenvironment. Oncotarget, 2017. 8(58): p. 98580-98597.

      3. De Gasperi, R., et al., Numb is required for optimal contraction of skeletal muscle. J Cachexia Sarcopenia Muscle, 2022.

      4. Neubauer, K. and B. Zieger, The Mammalian Septin Interactome. Front Cell Dev Biol, 2017. 5: p. 3.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      In countries endemic for P vivax the need to administer a primaquine (PQ) course adequate to prevent relapse in G6PD deficient persons poses a real dilemma. On one hand PQ will cause haemolysis; on the other hand, without PQ the chance of relapse is very high. As a result, out of fear of severe haemolysis, PQ has been under-used.

      In view of the above, the Authors have investigated in well-informed volunteers, who were kept under close medical supervision in hospital throughout the study, two different schedules of PQ administration: (1) escalating doses (to a total of 5-7 mg/kg); (2) single 45 mg dose (0.75 mg/kg).

      It is shown convincingly that regimen (1) can be used successfully to deliver within 3 weeks, under hospital conditions, the dose of PQ required to prevent P vivax relapse.

      As expected, with both regimens acute haemolytic anaemia (AHA) developed in all cases. With regimen (2), not surprisingly, the fall in Hb was less, although it was abrupt. With regimen (1) the average fall in Hb was about 4 G. Only in one subject the fall in Hb mandated termination of the study.

      Since the data from the Chicago group some sixty years ago, there has been no paper reporting a systematic daily analysis of AHA in so many closely monitored subjects with G6PD deficiency. The individual patient data in the Supplementary material are most informative and more than precious.

      Having said this, I do have some general comments.

      1. Through their remarkable Part 1 study, the Authors clearly wish to set the stage for a revision of the currently recommended PQ regimen for G6PD deficient patients. They have shown that 5-7 mg/kg can be administered within 3 weeks, whereas the currently recommended regimen provides 6 mg/kg over no less than 8 weeks.

      We state in the abstract: “The aim was to explore shorter and safer primaquine radical cure regimens compared to the currently recommended 8-weekly regimen (0.75 mg/kg once weekly), potentially obviating the need for G6PD testing”. This is the primary goal of the study.

      1. Part 2 aims to show that, as was known already, even a single PQ dose of 0.75 mg/kg causes a significant degree of haemolysis: G6PD deficiency-related haemolysis is characteristically markedly dose-dependent. Although they do not state it explicitly in these words (I think they should), the Authors want to make it clear that the currently recommended regimen does cause AHA.

      We also wanted to compare the extent of haemolysis following single dose with the extent of haemolysis following the ascending dose regimens, in the same patients.

      1. Regulatory agencies like to classify a drug regimen as either SAFE or NOT-SAFE; they also like to decide who is 'at risk' and who is 'not at risk'. A wealth of data, including those in this manuscript, show that it is not correct to say that a G6PD deficient person when taking PQ is at risk of haemolysis: he or she will definitely have haemolysis. As for SAFETY, it will depend on the clinical situation when PQ is started and on the severity of the AHA that will develop.

      We agree completely. Haemolysis following primaquine is inevitable. What matters is the rate and extent of haemolysis, and the compensatory response. Importantly the extent of the haemolysis, even within a specific genotype and for a given drug dose, appears to be highly variable.

      The above three issues are all present in the discussion, but I think they ought to be stated more clearly.

      We have tried to clarify these points in a revised discussion.

      Finally, by the Authors' own statement on page 15, the main limitation is the complexity of this approach. The authors suggest that blister packed PQ may help; but to me the real complexity is managing patients in the field versus the painstaking hospital care in the hands of experts, of which volunteers in this study have had the benefit. It is not surprising that a fall in Hb of 4 g/dl is well tolerated by most non-anaemic men; but patients with P vivax in the field may often have mild to moderate to severe anaemia; and certainly they will not have their Hb, retics and bilirubin checked every day. In crude approximation, we are talking of a fall in Hb of 4 G with regimen (1), as against a fall in Hb of 2 G with regimen (2), that is part of the currently recommended regimen: it stands to reason that, in terms of safety, the latter is generally preferable (even though some degree of fall in Hb will recur with each weekly dose). In my view, these difficult points should be discussed deliberately.

      As above we have tried to clarify these important points in a revised discussion

      Reviewer #1 (Recommendations For The Authors):

      Page 2 para 3. The decreased haemolysis upon continued PQ administration (that originally was named the 'resistance phase' is explained by two additive factors. First, the reticulocytosis (cells with higher G6PD activity pour into circulation from the bone marrow); second, the early doses of PQ has caused selective haemolysis of the oldest red cells, that had the lowest G6PD activity. This dual phenomenon is hinted at, but I think it should be stated clearly.

      Thank you. We have added to the Introduction (fourth paragraph in revised version):

      “Continued primaquine administration to G6PD deficient subjects resulted in "resistance" to the haemolytic effect. The selective haemolysis of the older red cells resulted in a compensatory increase in the number of reticulocytes. Thus, the red cell population became progressively younger and increasing resistant to oxidant stress, so overall haemolysis decreased and a steady state was reached.”

      Page 4 and elsewhere. In the 'Hillmen scale' for haemoglobinuria a value >6 was named a 'paroxysm'; but any value of 2 and above is already frank haemoglobinuria. Incidentally, the chart was published not in ref 17, but in NEJM 350:552, 2004.

      We have changed the reference (now ref 19) to the 2004 paper by Hillmen. We used the value of 6 as clinical criterion for stopping primaquine. While >2 is detectable in dilute urine, >6 refers to clearly red/black urine.

      In Table 1 and throughout the paper I am surprised that retics are given as %: absolute retic counts are more informative.

      We showed these as % counts as the majority of measurements were taken from blood slide readings where it is not possible to get an absolute count.

      Page 10, Attenuated hemolysis with continued or recurrent doses of PQ was shown convincingly for G6PD A-. There is also one report in which the time course of AHA was extensively investigated upon deliberate administration of PQ to a subject with G6PD Mediterranean (Blood 25: 92, 1965): there was little or no evidence for a 'resistance phase'.

      We agree that this suggests it might not be possible to attenuate haemolysis with the Mediterranean variant (or variants of similar severity) as even the youngest circulating red cells may be susceptible to haemolysis. More evidence is needed.

      S6, S7. Reticulocytes remain high until PQ is stopped; they return to normal some 17 days after stopping PQ. This should be stated in the main text.

      This has been added to the main text (section “Haemolysis and reticulocyte response”):

      “It took around 2 weeks for the reticulocyte counts to re-normalise.”

      In subject 11 haemoglobinuria was slight on day 12; what was it before?

      We have changed the caption of this Figure (Appendix 5) to:

      “Day 10 urine sample from subject 11 showing slight haemoglobinuria (Hillmen score of 4). The subject had a maximum Hillmen score varying between 2 and 3 on days 4 to 9.”

      I found individual patient data in S5 and S6 most interesting, especially since the G6PD variant was identified in each case. It would be helpful if in each case the total PQ dose were also shown, and in the interest of visual comparability the abscissa scale ought to be the same for all cases.

      We have amended Figures S5 and S6 to make them consistent with each other (now Appendix 5). We also amended the figures showing the individual subject data for consistency.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors identified compound heterozygous mutations in CFAP52 recessively cosegregating with male infertility status in a non-consanguineous family. The Cfap52-mutant patient exhibits a mixed acephalic spermatozoa syndrome (ASS) and multiple morphological abnormalities of the sperm flagella (MMAF) phenotype. The influence of mutations on CFAP52 protein function is well validated by in vitro cell experiments and immunofluorescence staining. Cfap52-KO mice are further constructed and perfectly resemble the Cfap52-mutant patient's infertile phenotype, also showing a mixed ASS and MMAF phenotype. The phenotype and underlying mechanisms of the disruption of sperm head-tail connection and flagella development are carefully analyzed by TEM, Western blotting, and immunofluorescence staining. The data presented revealed a prominent role for CFAP52 in sperm development, suggesting that CFAP52 is a novel diagnostic target for male infertility with defects of sperm head-tail connection and flagella development.

      Thank you for your positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to identify the genetic factors for asthenoteratozoospermia. Using whole-exome sequencing, they analyzed a family with an infertile male and identified CFAP52 variants. They further knockout mouse Cfap52 gene and the homozygous mice phenocopied the patient. CFAP52 interacts with several other sperm proteins to maintain normal sperm morphology. Finally, CFAP52-associated male infertility in humans and mice could be overcome by using intracytoplasmic sperm injections (ICSI).

      Strengths:

      The major strength of this study is to identify genetic factors contributing to asthenoteratozoospermia, and to generate a mouse knockout model to validate the factor.

      Thank you for your positive comments.

      Weaknesses:

      The authors did not use the OMICS to dissect the potential mechanisms. Instead, they took the advantage of direct co-IP experiment to fish the binding partners. They also did not discuss in detail why other motile cilia have different behavior.

      Dear reviewer, thank you for your comments and we tried to answer your two questions as follows.

      In this study, we did not choose omics technologies to explore the binding partners for CFAP52 (e.g., IP-MS) and differentially expressed proteins after the loss of CFAP52 (e.g., proteomics). For IP-MS, we feel sorry that all available antibodies of CFAP52 could not be used to perform protein immunoprecipitation experiments. Another reason is that there are only dozens of proteins that have been reported to regulate the head-tail coupling apparatus (HTCA) of sperm. Accordingly, we used Western blotting to examine the expression of ten acephalic sperm syndrome (ASS)-associated proteins and found that only SPATA6 expression was significantly reduced in the testis protein lysates of Cfap52-KO mice (Fig. 6A). We further carefully examined the regulation of the stability of SPATA6 by its binding partner CFAP52 (Fig. 6 and Figure 6—figure supplement 2).

      In addition to male infertility, Cfap52-KO mice suffered from hydrocephalus; the ependymal cilia was sparse under SEM observation and disrupted axonemal structures were identified by TEM analysis (Figure 4—figure supplement 2). However, no obvious abnormalities of tracheal cilia were identified by SEM and TEM analyses (Figure 4—figure supplement 2). Although flagella and motile cilia exhibit quite similar “9+2” axoneme structure, they have some their unique proteins and the requirement of some axonemal proteins may be different. For example, IQUB expression is detected in tissues other than the testis, such as the lung and brain; however, IQUB deletion only affects beating of sperm flagella but not respiratory cilia (Cell Rep, 2022). Cfap43-KO mice exhibited both sperm flagella disordor and early-onset hydrocephalus (Dev Biol, 2020), and CFAP206 is required for sperm motility, mucociliary clearance of the airways and brain development (Development, 2020).

      Reviewer #3 (Public Review):

      Summary:

      In this study, Jin et al. report the first evidence of CFAP52 mutations in human male infertility by identifying deleterious compound heterozygous mutations of CFAP52 in infertile human patients with acephalic and multiple morphological abnormalities in flagella (MMAF) phenotypes but without other abnormalities in motile cilia. They validated the pathogenicity of the mutations by an in vitro minigene assay and the absence of proteins in the patient's spermatozoa. Using a Cfap52 knockout mouse model they generated, the authors showed that the animals are hydrocephalic and the sperm have coupling defects, head decapitation, and axonemal structure disruption, supporting what was observed in human patients.

      Strengths:

      The major strengths of the study are the rigorous phenotypic and molecular analysis of normal and patient spermatozoa and the demonstration of infertility treatment by ICSI. The authors demonstrated the interaction between CFAP52 and SPATA6, a head-tail coupling regulator and structural protein, and showed that CFAP52 can interact with components of the microtubule inner protein (MIP), radial spoke, and outer dynein arm proteins.

      Thank you for your positive comments.

      Weaknesses:

      The weakness of the study is some inconsistency in the localization of the CFAP52 protein in human spermatozoa in the figures and the lack of such localization information completely missing in mouse spermatozoa. Putting their findings in the context of the newly available structural information from the recent series of unambiguous and unequivocal identification of CFAP52 as an MIP in the B tubule will not only greatly benefit the interpretation of the study, but also resolve the inconsistent sperm phenotypes reported by an independent study. Since the mouse model is not designed to exactly recapitulate the human mutations but a complete knockout and the knockout mice show hydrocephaly phenotype as well, some of the claims of causality and ICSI as a treatment need to be tempered. Discussing the frequency of acephaly and MMAF in primary male infertility will be beneficial to justify CFAP52 as a practical diagnostic tool.

      Dear reviewer, thank you for your comments and we tried to answer your questions as follows.

      By immunofluorescence staining, we showed that CFAP52 was localized at both HTCA and full-length flagella from the normal control; in contrast, CFAP52 signals were barely detected in the patient’s spermatozoa (Figure 3F). Given that CFAP52 staining did not occur in other figures, no inconsistency exists in the localization of the CFAP52 protein in human spermatozoa in the figures. We did not perform the CFAP52 staining in mouse spermatozoa; however, we have shown that CFAP52 protein was completely absent in the Cfap52-KO testes compared with the WT testes (Figure 4C).

      We appreciate the reviewer’s suggestion to put our findings of CFAP52 in the context of the newly available axoneme architecture. Given that these cryo-EM studies focus on doublet microtubules (DMTs), a broader expression pattern of CFAP52 in cilia/flagella could not be excluded. In mammals, CFAP52 seems to interact with a broad range of axonemal proteins, including MIP (CFAP45), ODAs (DNAI1 and DNAH11), and DRC (DRC10) (Dougherty et al., 2020). We have mentioned that ‘a lack of FAP52 in Chlamydomonas causes an instability of microtubules and detachment of the B-tubule from the A-tubule and shortened flagella are observed in Chlamydomonas when both FAP52 and FAP20 are absent (Owa et al., 2019). Unlike a specific regulation of the stability of B-tubules by FAP52 in Chlamydomonas (Owa et al., 2019), Cfap52-KO mice and CFAP52-mutant patient showed a serious disorder of the axoneme and its accessory structures.’

      Before our study, Cfap52-KO mice have not yet been generated. To explore the physiological roles of CFAP52, we decided to construct Cfap52-KO mice. During our manuscript is under preparation, an independent group also generated the Cfap52-KO mice and explored their phenotype (Wu et al., 2023). We quite agree with this reviewer that Cfap52-mutant mice will be exact models to recapitulate the human variants. Cfap52-mutant mice were not included in our current manuscript due to i) the two identified variants were ‘nonsense’ variant and ‘frameshift’ variants, respectively, which are expected to damage the CFAP52 expression and function; ii) the influence of two variants on CFAP52 protein function has been well validated by in vitro cell experiments and iii) research funding is limited for us. The assisted reproductive technology (ART) outcomes were also reported for the CFAP52-mutant patient and Cfap52-KO mice, which will be potential useful for further clinical studies. However, it is not suggested to be over-interpreted because it is only a case study.

      Quantitative analyses showed that the decapitated spermatozoa, abnormal head-tail connecting spermatozoa, and spermatozoa with deformed flagella accounted for approximately 40%, 25%, and 30% of the total spermatozoa in Cfap52-KO mice, respectively (Figure 4I). Regarding the CFAP52-mutant patient, the frequency of acephaly and MMAF were not counted and now we feel sorry that we don’t have enough samples (repeats) to perform quantitative analyses.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      1. In lines 41-43, there seems to be some confusion about the terminology regarding "sporadic ALS". ALS is subdivided into familial and sporadic forms. Familial ALS simply indicates that the patient has a family history of ALS and presumably has a genetic predisposition for developing this disease. In many families, the identity of the mutation remains unknown. Sporadic ALS patients do not have a family history of this disease. However, this does not imply that they lack mutations that caused disease. In fact, 5-10% of these patients have the hexanucleotide repeat expansion in C9orf72. This mutation is also found in about 40% of familial ALS cases.

      We have now amended the manuscript to be more accurate in our description of underlying genetics of ALS. This changes to this section are as follows:

      Lines 39-47:

      "...The median survival time in ALS, from initial onset of symptoms to death, typically as a result of respiratory complications, is only 20-48 months Chiò et al. (2009) and ALS has an estimated global mortality of 30,000 patients per year Mathis et al. (2019).

      ALS is typically classified into either familial (fALS) or sporadic (sALS) forms of the disease, based on whether or not patients have an identified family history of the disease; between 5-10% of total ALS cases fall into the former category, fALS, with the remaining 90-95% consisting of sALS cases Mathis et al. (2019). To date, over 20 monogenic mutations that cause ALS have been identified, however these still only account for 45% of fALS cases and only 7% of sALS cases Mejzini et al. (2019)..."

      1. In Fig. 4-supplement 1, 7DD and 5DD are not defined. I assume one is the fast-firing and one is the slow-firing motor neurons. I am also a bit confused as to why the 5DD neurons produce greater muscle force than the 7DD neurons when electrically stimulated. It seems to suggest that there is some difference between the two types of neurons or the groups of mice used to test them.

      We have now defined these terms and the amended figure legend now reads as follows:

      "(A) Fast-firing motor neurons (produced using a 7-day differentiation protocol thus labelled as “7DD”) or slow-firing ChR2+ motor neurons (produced using a 5-day differentiation protocol thus labelled as “5DD”) were engrafted in age matched SOD1G93A mice… Our expectation was that fast-firing motor neurons, which normally innervate larger numbers (>100) of stronger fast-twitch muscle fibres per motor unit would elicit significantly greater contractile force when optically stimulated, compared to slow-firing motor neurons that innervate small numbers (<10) of weaker, slow-twitch muscle fibres per motor unit. Surprisingly, our data did not show any difference when using grafts consisting of fast-firing motor neurons, versus slow-firing motor neurons, at least in response to optical stimulation. The factors underlying this surprising result, and the apparent discrepancy between electrically-evoked muscle contractions in nerves that had bene engrafted with either fast or slow firing motor neurons, are likely to be highly complex; we hope to further explore this as part of a separate follow up study."

      1. Along those lines, do these two subpopulations of motor neurons innervate the same set of muscle fibers? More generally, are certain types of muscle fibers preferentially innervated by this approach? Answering these questions could point to additional ways to enhance the effectiveness of this treatment approach. This should be discussed.

      This point is partially addressed in our response to Point 2 above, but to further extrapolate: certainly, the phenotype of individual muscle fibres is largely dictated by the firing properties of the motor neuron that innervates it. Slow-twitch muscle fibres tend to produce less contractile force but are more fatigue resistant, whereas fast-twitch muscle fibres produce more force but fatigue rapidly. There is evidence that expression of the chemorepellent molecule ephrin-A3 prevents the inappropriate innervation of slow-twitch muscle fibres by fast-firing motor neurons, which express the cognate receptor EphA8 [PMID: 26644518]. Importantly, fast-firing motor neurons are preferentially susceptible to disease mechanisms in ALS and the fast-twitch muscle fibres that they innervate are therefore more likely to undergo denervation and atrophy. Surprisingly, in this study we clearly show that grafts consisting of slow-firing motor neurons are able to innervate all regions of the triceps surae muscle group, including the normally exclusively fast-twitch superficial regions of the gastrocnemius and the exclusively slow-twitch soleus muscle. This finding strongly suggests that the normal developmental pairing of motor neuron and muscle fibre properties is not essential in this therapeutic context. Indeed, the use of more disease-resistant slow-firing motor neurons may provide some advantages. Again, we hope to be able to further explore this relationship in forthcoming follow-up studies.

      1. The authors state that exercise programs are likely to accelerate disease progression. This is not supported by the current body of clinical data. In fact, current guidelines are for moderate (not strenuous) exercise, and mouse studies have demonstrated a protective effect of moderate exercise on disease progression.

      We apologise for the lack of clarity on this point, as it was not our intention to imply that voluntary exercise accelerates disease progression. We have now amended the manuscript to specify “ENS-based exercise programs” to avoid any confusion.

      1. It is unclear what the experimental endpoint is. Page 25 defines it as 135 days of age, but ranges are given the figure legends, suggesting that some other criteria were used. It also seems unclear at what determined the age at which each animal was treated since they were also not treated at the same age.

      We hope that our response in the Public Reviews section above has fully addressed this point.

      1. I am a little confused by Figure 5 - figure supplement 5, panel D. Why do the authors give specific p-values here but not in the other panels? The sample sizes in D are very low, in some cases with only 1 animal in a group, and performing statistical tests under these conditions seems futile. The statistical power is nearly zero.

      For the purposes of consistency, we have now replaced the specific p-values in panel D with “ns”. The low n-values for the MUNE analysis data is due to the extremely difficult nature of identifying the contribution of individual motor units to the total muscle contractile response, when the maximal muscle force is extremely weak. In the absence of optical stimulation training, the extremely weak force elicited by acute optical stimulation precluded our ability to separate out the contribution of individual motor units and, often, in animals where this was not possible, we did not always perform electrically-evoked MUNE analysis. Unfortunately, we are not currently in a position to increase the n-values for this component of the study. Our ongoing research to enhance the amplitude of the muscle response to optical stimulation will hopefully help to more clearly address this in the future.

      1. One concern about this approach is that the procedure could accelerate the denervation of the target muscle. Figure 5 - figure supplement 6, panel B, indicates a significant reduction in force on the ipsilateral side relative to the contralateral side, at least under electrical stimulation of the nerve. This would be consistent with the hypothesis that the procedure does enhance disease progression in the treated limb. Is there a reduction in voluntary motor activity in these animals, such as in grip strength or the position of the foot while walking?

      We hope that this important point has been satisfactorily addressed in the Public Reviews section. Unfortunately, we did not undertake any behavioural analysis relating to voluntary motor function of the engrafted (or contralateral) hindlimbs, which may have provided useful data to address this point. As described above, the most likely explanation for this finding is due to physical nerve damage caused by the intraneural injection procedure; in our efforts to refine our strategy and move it towards clinical translation, we will take this into consideration in our future research.

      1. Based on Fig. 6D, it seems that the vast majority of innervated NMJs at endpoint are innervated by cells from the graft. And yet, electrical stimulation evokes substantially greater muscle force. This may suggest that optical control of engrafted motor neurons will not yield enough force for routine tasks or that the few remaining endogenous motor neurons are much more effective at generating force. These potential limitations and ways to overcome them should be discussed.

      There appears to be a slight misunderstanding, since our aim here was to sample a sufficiently powered number of motor end-plates innervated by YFP+ for statistical analysis. To do this we specifically chose regions of interest containing at least 1 YFP+ NMJ and the adjacent muscle fibres were included at random, whatever their innervation status. Had we sampled regions of interest at random, we would have been likely to capture only a very few YFP+ terminal as they occupy a very small volume of the total muscle section and the maximum scanning area for each high-resolution z-confocal stack is relatively small, so we feel that this selection was warranted.

      Minor comments:

      1. The donor mouse strain should be described as 129S1/SvImJ.

      We have now corrected this.

      1. The first time the supplementary figures show up in the manuscript, they seem to have two titles each, such as "Figure 1-figure supplement 1. (Figure 4 - figure supplement 1)". The second seems to be the correct one.

      This was caused by an issue with the Latex template, which has now been resolved.

      1. PCB is not defined the first time it is used (page 8, line 332).

      We have now defined this term on first use: printed circuit board (PCB)

      1. CNI is not defined in the text (page 12, line 432).

      We have now defined this abbreviation at the first usage on Page 4, Line 158

      1. Some of the fonts on the graphs are very small, such as Fig. 5J.

      We have increased the font size as much as possible for Fig. 5.

      1. Figure 6 - figure supplement 1 does not include a key to indicate which antigens are stains and which color refers to which antigen. This is also needed for the videos.

      We have now included a key on this figure supplement to indicate the relevant antigens and stain and we have also done the same for the videos.

      1. Video 5 seems to indicate that there is a dead zone in the back of the chamber. Does this raise any concerns about the consistency of training from animal to animal?

      This is an extremely astute observation. However, the intermittent activation of the implantable LED devices is not due to a dead zone; rather, it is due to the orientation of the power receiving coil within the device and it’s alignment with the resonance frequency chamber that transmits the power to the device. As the animals move around, and particularly when they rear up, the power receiving coil occasionally becomes misaligned and fails to receive sufficient power to activate the LED. Since the pulses are delivered every 2 seconds, for 1 hour per day, we feel that the animals, on average, receive sufficient numbers of pulses to implement the training regimen. Indeed, we feel that the results speak for themselves.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers’ detailed corrections and insightful comments. We have revised our manuscript per reviewers’ recommendations by including new data and clarifications/expansion of the discussion on our findings. Please see below for details.

      Reviewer #1 (Recommendations For The Authors):

      1. The introduction notes that CD1d KO mice show reduced levels of Va3.2 T cells (Ruscher et al.), which is interesting because innate memory T cell development in the thymus often requires IL-4 production by NKT cells. Have the authors explored QFL T cells in CD1d KO and/or IL-4 KO mice? Since their QFL TCR Tg mice still develop QFL T cells (and these animals likely have very few thymic NKT cells), NKT cells may not be required for the intrathymic development of QFL T cells?

      Answer: We agree that investigation on the role of NKT cells or IL-4 in QFL T cell development will greatly further our understanding of these cells.

      We validated the finding that expression of the QFL TCR transgene largely repressed the expression of endogenous TCRα, as indicated by the low levels of endogenous Vα2 on mature CD8SP T cells in both thymus and spleen. However, the frequencies of Vα2 usage in CD4 SP thymocytes and splenocytes from QFL transgenic mice were similar to non-transgenic mice, confirming that they underwent positive selection using endogenous TCR rather than the QFL TCR. We thus do not exclude the possible presence of NKT cells in QFLTg mouse and their potential involvement in the QFL T cells development. Our manuscript here is mainly focused on investigating the peripheral phenotype of QFL T cells and their association with the gut microbiota environment. Investigations into the role of CD1d/IL-4 will be best addressed in our future studies.

      1. The finding that Qa-1 expression is not required for the development of QFL T cells raises questions about other MHC products that may be involved. In this context, it is interesting that TAP-deficient mice develop few QFL T cells, for reasons that are unclear, but the authors may speculate a bit. In this context, it may be helpful for the authors to note whether TAP is required for QFL presentation to QFL T cells. Since Qa-1 is not required, and CD1d is still expressed in TAP KO mice, what then could be responsible for their defect in QFL T cell development?

      Answer: This is a great point. Figure 2 (from (Valerio et al., 2023) on the development of QFL T cells) tested whether QFL TCR cross-react with other MHC I molecules.

      We assessed the activation of pre-selection QFLTg thymocytes in response to various MHC I deficient DC2.4 cell lines. While the QFL thymocytes showed partially reduced activation when stimulated with Qa-1b deficient APCs, triple knock-out (KO) of Qa-1b, Kb, and Db in DC2.4 cells reduced activation close to background levels. However, double knock-out of Qa-1b with either Kb, or Db led to stimulation that was intermediate between the triple KO and Qa-1b-KO cell lines. These data suggest that Kb and Db may contribute to the positive selection of QFL T cells in Qa-1b-KO mice.

      TAP is required for FL9 peptide presentation and is very likely needed for presentation of the yet unidentified MHC Ia presented peptide(s) that are essential to QFL T positive selection. While CD1d/NKT cells/IL-4 may be involved in supporting the maturation of QFL T cells, we think in the TAP-KO mice the absence of TAP led to deletion/altered selection of the QFL T population at early developmental stage. We have added clarification on this point in the revised manuscript (line 412~418).

      1. It may be worthwhile for the authors to note that Qa-1 was also dispensable for the intrathymic selection of another Qa-1-restricted TCR (Doorduijn et al. 2018. Frontiers Immunol.), although this is presumably not the case for others (Sullivan et al. 2002. Immunity 17, 95).

      Answer: We appreciate this recommendation. We have noted this point in the resubmitted manuscript (line 412~418).

      1. Lines 122-124: The sentence "Interesting ..." seemed confusing to me; are the numbers (60 and 30%) correct?

      Answer: The numbers 60% and 30% were referring to the largest number we have detected for percentages of Va3.2 QFL T cells and Va3.2 CD8 T cell respectively. Here in the revised version, we replaced these numbers with average percentages (20.1% and <10%) to avoid confusion (line 134).

      1. Qa-1/peptide complexes may also be recognized by CD94/NKG2 receptors, which may complicate the interpretation of the data (e.g., staining of the dextramers). From their previous work, it appears that Qa-1/QFL does not bind CD94/NKG2, which would be helpful to note in the text.

      Answer: We have noted this point in the revised manuscript (line 117~121).

      1. It would be helpful to add a few comments about the potential relevance to HLA-E.

      Answer: We have included discussion on this point (line 391~401).

      1. Figure legends: Most legends note the total number of replicates, which is usually quite high. It would also be helpful to indicate the total number of independent experiments performed and, when relevant, that the data are pooled from multiple independent experiments.

      Answer: Thank you for raising the concern. We have clarified the experimental repeats in figure legends.

      Reviewer #2 (Recommendations For The Authors):

      1. The work of Nilabh Shastri was the foundation of the present study. Unfortunately, he passed away in 2021. Since he can no longer assume the responsibilities of a senior author, I wonder if it would be more appropriate to dedicate this paper to him than to list him as a co-author.

      Answer: We have removed Dr. Shastri’s name as a co-senior author and have dedicated this work to his memory.

      1. The official symbol for ERAAP is Erap1.

      Answer: We have replaced ERAAP with ERAP1.

      1. Please refrain from editorializing. For example, "strikingly" appears eight times and "interestingly" 9 times in the manuscript. Most readers believe they do not need to be said when something is striking or interesting.

      Answer: We appreciate the Reviewer’s suggestion and have removed ‘strikingly’ and ‘interestingly’ from the manuscript.

      1. In WT mice, are there some cell types that express Qa-1b but not Erap1 and could therefore present the FL9 peptide?

      Answer: This is a great question. Using our highly sensitive QFL T cell hybridoma line BEko8Z (sensitivity shown in Fig. 6b), we have so far not been able to detect steady-state FL9 presentation by cells isolated from the spleen, lymph nodes, various gut associated lymphoid tissues or intestinal epithelial cells (Supplementary Fig. 8 a left panel). However, we do not exclude the possibility of FL9 peptide being transiently presented under certain conditions (i.e. ER stress/transformed cells) at particular locations or within certain time windows, which is of great importance for understanding the function of these cells but is beyond the scope of this study.

      1. Since you have not tested substitutions at other positions, could you explain your reasoning that P4 and P6 are the critical residues (lines 271-272)?

      Answer: Thank you for raising the concern. We have expanded on explanation of our strategy for determining peptide homology (line 272~313) in the revised manuscript. We have also included data on the structure the QFL TCR: FL9-Qa-1b complex predicted by Alphafold2, conformation alignment of FL9 and Qdm (Figure 6. a, b) and the NetMHCpan prediction of Qa1b binding of Qdm, FL9 and various FL9 mutant peptides (Supplementary Fig. 8 c) to help readers visualize the reasoning behind our strategy.

      1. Readers might appreciate having a Figure summarizing the differences between spleen and gut QFL T cells.

      Answer: This is a great suggestion. We have added a table summarizing the characteristic features of the splenic and IEL QFL T cells (Table 1).

      1. In the discussion, readers would like to know what plan you might have to elucidate the function of QFL T cells.

      Answer: We appreciate the recommendation. We have elaborated on our opinions and future directions in the resubmitted manuscript (line 393~401, 446~455).  

      Reviewer #3 (Public Review):

      1. For most of the report, the authors use a set of phenotypic traits to highlight the unique features of QFL-specific CD8+ T cells - specifically, CD44high, CD8aa+ve, CD8ab-ve. In Supp. Fig. 4, however, completely distinct phenotypic characteristics are presented, indicating that IEL QFL-specific T cells are CD5low, Thy-1low. No explanation is provided in the text about whether this is a previously reported phenotype, whether any elements of this phenotype are shared with splenic QFL T cells, what significance the authors ascribe to this phenotype (and to the fact that Qa1-deficiency leads to a more conventional Thy-1+ve, CD5+ve phenotype), and whether this altered phenotype is also seen in ERAAP-deficient mice. At least some explanation for this abrupt shift in focus and integration with prior published work is needed. On a related note, CD5 expression is measured in splenic QFL-specific CD8+ T cells from GF vs SPF mice (Supp. Fig. 9), to indicate that there is no phenotypic impact in the GF mice - but from Supp. Fig. 4, it would seem more appropriate to report CD5 expression in QFL-specific cells from the IEL, not the spleen.

      Answer: Expression of CD8αα and lack of CD4, CD8αβ, CD5 and CD90 expression was indeed reported as the characteristic phenotype of natIELs. We have clarified this point in the resubmitted manuscript (line 80). The CD8αα+ IEL QFL T cells have consistently showed CD5CD90- phenotype. While CD8αα expression was sufficient to describe their natIEL phenotype, we showed the CD5-CD90- data in Supplementary figures only to provide additional evidence.

      The CD5 molecule by itself reflects the TCR signaling strength and high CD5 level is associated with self-reactivity of T cells (Azzam et al., 2001; Fulton et al., 2015). The implication of CD5 expression on QFLTg cells is discussed in our other manuscript where we investigate the development of these cells (Valerio et al., 2023). In Supplementary Fig. 9, because the donor splenic QFLTg cell have consistently showed comparable CD5 level between the GF and SPF group, we reasoned that it would not interfere with our interpretation of the CD44 expression.

      1. The authors suggest the finding that QFL-specific cells from ERAAP-deficient mice have a more "conventional" phenotype indicates some form of negative selection of high-affinity clones (this result being somewhat unexpected since ERAAP loss was previously shown to increase the presentation of Qa-1b loaded with FL9, confirmed in this report). It is not clear how this argument aligns with the data presented, however, since the authors convincingly show no significant reduction in the number of QFL-specific cells in ERAAP-knockout mice (Fig. 3a), and their own data (e.g. Fig. 2a) do not suggest that CD44 expression correlates with QFL-multimer staining (as a surrogate for TCR affinity/avidity). Is there some experimental basis for suggesting that ERAAP-deficient lacks a subset of high affinity QFL-specific cells?

      Answer: We think the presence of QFL T cells in ERAAP-KO mice is a result of the unconventional developmental mechanism of these cells which is better addressed in our complementary manuscript on the development of QFL T cells(Valerio et al., 2023). Valerio et al. found that the most predominant QFL T clone which expresses Vα3.2Jα21, Vβ1Dβ1Jβ2-7 received relatively strong TCR signaling and underwent agonist selection during thymic development, indicating that the QFL ligand is involved in selection of the innate-like QFL T population.

      We agree that there is so far no direct evidence showing the QFL T cells that were absent in the ERAAP-KO mice were high-affinity clones. We have removed ‘high-affinity’ from the manuscript (line 180). While CD44 expression has been associated the antigen-experiences phenotype of T cells, it is yet unclear whether expression level of this molecule directly reflects TCR affinity/avidity. identification of clones of different affinities/avidities require high precision technologies that are not currently available to the research community. While we do have zMovi, a newly developed (developing) technology, in the lab claimed to measure relative avidity/affinity of different cell types for ligands, during the past two years working with this instrument has taught us that the technology is not yet advanced enough; it can only produce reliable data on extreme differences of single clones, i.e., high numbers of homogeneous cell types expressing very high affinity receptors.

      1. The rationale for designing FL9 mutants, and for using these data to screen the proteomes of various commensal bacteria needs further explanation. The authors propose P4 and P6 of FL9 are likely to be "critical" but do not explain whether they predict these to be TCR or Qa-1b contact sites. Published data (e.g., PMID: 10974028) suggest that multiple residues contribute to Qa-1b binding, so while the authors find that P4A completely lost the ability to stimulate a QFL-specific hybridoma, it is unclear whether this is due to the loss of a TCR- or a Qa-1-contact site (or, possibly, both). This could easily be tested - e.g., by determining whether P4A can act as a competitive inhibitor for FL9-induced stimulation of BEko8Z (and, ideally, other Qa-1b-restricted cells, specific for distinct peptides). Without such information, it is unclear exactly what is being selected in the authors' screening strategy of commensal bacterial proteomes. This, of course, does not lessen the importance of finding the peptide from P. pentosaceus that can (albeit weakly) stimulate QFL-specific cells, and the finding that association with this microbe can sustain IEL QFL cells.

      Answer: Thank you for raising the concern. We have expanded on explanation of our strategy for determining peptide homology (line 272~313) in the revised manuscript. We have also included data on the structure the QFL TCR: FL9-Qa-1b complex predicted by Alphafold2, conformation alignment of FL9 and Qdm (Figure 6. a, b) and the NetMHCpan prediction of Qa1b binding of Qdm, FL9 and various FL9 mutant peptides (Supplementary Fig. 8 c) to help readers visualize the reasoning behind our strategy.

      References

      Azzam, H.S., DeJarnette, J.B., Huang, K., Emmons, R., Park, C.S., Sommers, C.L., El-Khoury, D., Shores, E.W., and Love, P.E. (2001). Fine tuning of TCR signaling by CD5. J Immunol 166, 5464- 5472.10.4049/jimmunol.166.9.5464, PMID:11313384

      Fulton, R.B., Hamilton, S.E., Xing, Y., Best, J.A., Goldrath, A.W., Hogquist, K.A., and Jameson, S.C. (2015). The TCR's sensitivity to self peptide-MHC dictates the ability of naive CD8(+) T cells to respond to foreign antigens. Nat Immunol 16, 107-117.10.1038/ni.3043, PMID:25419629

      Valerio, M.M., Arana, K., Guan, J., Chan, S.W., Yang, X., Kurd, N., Lee, A., Shastri, N., Coscoy, L., and Robey, E.A. (2023). The promiscuous development of an unconventional Qa1b-restricted T cell population. bioRxiv, 2022.2009.2026.509583.10.1101/2022.09.26.509583,

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Public Review

      R1.1) Randomized clinical trials use experimental blinding and compare active and placebo conditions in their analyses. In this study, Fassi and colleagues explore how individual differences in subjective treatment (i.e., did the participant think they received the active or placebo treatment) influence symptoms and how this is related to objective treatment. The authors address this highly relevant and interesting question using a powerful method by (re-)analyzing data from four published neurostimulation studies and including subjective treatment in statistical models explaining treatment response. The major strengths include the innovative and important research question, the inclusion of four different studies with different techniques and populations to address this question, sound statistical analyses, and findings that are of high interest and relevance to the field.

      We thank the reviewer for this summary and the overall appreciation for our work.

      R1.2) My main suggestion is that authors reconsider the description of the main conclusion to better integrate and balance all findings. Specifically, the authors conclude that (e.g., in the abstract) "individual differences in subjective treatment can explain variability in outcomes better than the actual treatment", which I believe is not a consistent conclusion across all four studies as it does not appropriately consider important interactions with objective treatment observed in study 2 and 3. In study 2, the greatest improvement was observed in the group that received TMS but believed they received sham. While subjective treatment was associated with improvement regardless of objective active or sham treatment, improvement in the objective active TMS group who believed they received sham suggests the importance of objective treatment regardless of subjective treatment. In Study 3, including objective treatment in the model predicted more treatment variance, further suggesting the predictive value of objective treatment.

      We thank the reviewer for this comment and agree that the interpretation of findings requires a more nuanced and balanced description. We, therefore, implemented changes in both the abstract and discussion of the manuscript, as reported below (additions are highlighted in grey and deletions are shown in strikethrough):

      Abstract

      “Our findings consistently show that the inclusion of subjective treatment can provide a better model fit when accounted for alone or in an interaction term with objective treatment (defined as the condition to which participants are assigned in the experiment). These results demonstrate the significant contribution of subjective experience in explaining the variability of clinical, cognitive and behavioural outcomes. Based on these findings, We advocate for existing and future studies in clinical and non-clinical research to start accounting for participants’ subjective beliefs and their interplay with objective treatment when assessing the efficacy of treatments. This approach will be crucial in providing a more accurate estimation of the treatment effect and its source, allowing the development of effective and reproducible interventions.” (p. 3)

      Discussion

      “We demonstrate that participants’ subjective beliefs about receiving the active vs control (sham) treatment are an important factor that can explain variability in the primary outcome and, in some cases, fits the observed data better than the actual treatment participants received during the experiment.” (p. 21)

      “We demonstrate that participants’ subjective beliefs about receiving the active vs control (sham) treatment are an important factor that can explain variability in the primary outcome and, in some cases, fits the observed data better than the actual treatment participants received during the experiment. Specifically, in Studies 1, 2 and 4, the fact that participants thought to be in the active or control condition explained variability in clinical and cognitive scores to a more considerable extent than the objective treatment alone. Notably, the same pattern of results emerged when we replaced subjective treatment with subjective dosage in the fourth experiment, showing that subjective beliefs about treatment intensity also explained variability in research results better than objective treatment. In contrast to Studies 1 and 4, Studies 2 and 3 showed a more complex pattern of results. Specifically, in Study 2 we observed an interaction effect, whereby the greatest improvement in depressive symptoms was observed in the group that received the active objective treatment but believed they received sham. Differently, in Study 3, the inclusion of both subjective and objective treatment as main effects explained variability in symptoms of inattention. Overall, these findings suggest the complex interplay of objective and subjective treatment. The variability in the observed results could be explained by factors such as participants’ personality, type and severity of the disorder, prior treatments, knowledge base, experimental procedures, and views of the research team, all of which could be interesting avenues for future studies to explore.” (p. 22)

      R1.3) In addition to updating the conclusions to better reflect this interaction, I suggest authors include the proportion of participants in each subjective treatment group that actually received active or sham treatment to better understand how much of the subjective treatment is explained by objective treatment. I think it is particularly important to better integrate and more precisely communicate this finding, because the conclusions may otherwise be erroneously interpreted as improvements after treatment only being an effect of subjective treatment or sham.

      We thank the reviewer for this comment. The information about how many participants are included in each group is provided in the every each codebooks under the section “Count of Participants by Treatment Condition and Their Subjective Guess” which is in the project’s OSF link (https://osf.io/rztxu/). Additionally, we added these tables to the supplementary material in tables S1, S8, S15, and S18, and we referred to these tables throughout the Methods section. Further, we added this information to the manuscript results, as follows:

      • “Further details on participant groupings based on objective treatment and their subjective treatment can be found in the codebook corresponding to each of the four studies as well as S1.” (p. 8).

      • “The breakdown of participants to objective treatment and subjective treatment in the sample can be found in S8.” (p. 13).

      • “The breakdown of participants to objective treatment and subjective treatment in the sample can be found in S15.” (p. 17).

      • “The breakdown of participants to objective treatment and subjective treatment in the sample can be found in S18.” (p. 19).

      R1.4) The paper will have significant impact on the field. It will promote further investigation of the effects of sham vs active treatment by the introduction of the terms subjective treatment vs objective treatment and subjective dosage that can be used consistently in the future. The suggestions to assess the expectation of sham vs active earlier on in clinical trials will advance the understanding of subjective treatment in future studies. Overall, I believe the data will substantially contribute to the design and interpretation of future clinical trials by underscoring the importance of subjective treatment.

      We thank the reviewer for this positive comment.

      Review for authors

      R1.4) Abstract

      "Here we show that individual differences in subjective treatment.. can explain variability in outcomes better than the actual treatment". "Our findings consistently show that the inclusion of subjective treatment provides a better model fit than objective treatment alone" - these two statements could be interpreted as two different conclusions, authors should be more consistent.

      We thank the reviewer for this comment and have now changed the abstract to be consistent, as also highlighted in R1.1:

      Abstract

      “Our findings consistently show that the inclusion of subjective treatment can provides a better model fit when accounted for alone or in an interaction term with objective treatment (defined as the condition to which participants are assigned in the experiment). These results demonstrate the significant contribution of subjective experience in explaining the variability of clinical, cognitive and behavioural outcomes. Based on these findings, We advocate for existing and future studies in clinical and non-clinical research to start accounting for participants’ subjective beliefs and their interplay with objective treatment when assessing the efficacy of treatments. This approach will be crucial in providing a more accurate estimation of the treatment effect and its source, allowing the development of effective and reproducible interventions.” (p. 3)

      R1.5) Introduction

      This is an odd sentence given it is 2023: "As a result, the global neuromodulation device industry is expected to grow to $13.3 billion in 2022 (Colangelo, 2020)."

      We have now removed this sentence as indeed not applicable and instead added a reference for the previous sentence:

      “In recent years, neuromodulation has been studied as one of the most promising treatment methods (De Ridder et al., 2021).”

      Reference

      De Ridder, D., Maciaczyk, J., & Vanneste, S. (2021). The future of neuromodulation: Smart neuromodulation. Expert Review of Medical Devices, 18(4), 307–317. https://doi.org/10.1080/17434440.2021.1909470

      R1.6) Figures

      • Lines of Figure 1 are vague.

      • Figure 5 color scheme is confusing. It would be better to use green/blue colors for one, (e.g.) sham in both subjective and objective treatment and orange/red colors for active treatment.

      • For Figure 6 it would be better to use the same color for sham as subjective dosage none.

      • Relatedly, it would be easier to keep color scheme consistent across the paper and for example use green/blue colors for sham throughout.

      We thank the reviewer for this comment. Following these comments, all the figures of the paper has remade for better clarity.

      • Figure 1, the individual lines are now shown stronger, there is also a connecting line between the averages.

      • Figure 5, sham is now on cold colours (blue and green), and active treatment on warm colours (red and orange)

      • Figure 6, the same colour for sham as subjective dosage none is now applied.

      Further, we also edited Figures 2 and 4 by removing the percentages between 0% and 100% on the y-axis. Given that the outcome variable was binary coded, we implemented this change to avoid confusion.

      Reviewer 2

      Public Review

      R2.1) This manuscript focuses on the clinical impact of subjective experience or treatment with transcranial magnetic stimulation and transcranial direct current stimulation studies with retrospective analyses of 4 datasets. Subjective experience or treatment refers to the patient level thought of receiving active or sham treatments. The analyses suggest that subjective treatment effects are an important and under appreciated factor in randomized controlled trials. The authors present compelling evidence that has significance in the context of other modalities of treatment, treatment for other diseases, and plans for future randomized controlled trials. Other strengths included a rigorous approach and analyses. Some aspects of the manuscript are underdeveloped and the findings are over interpreted. Thank you for your efforts and the opportunity to review your work.

      We thank the reviewer for their overall appreciation of this work. We address the comment on the overinterpretation of findings in response to reviewer 1 (see R1.2) above, and we expand on the underdeveloped explanation of sham procedures (see R2.2) below.

      Review for authors

      R2.2) One concern is that the findings are consistently over interpreted and presented with a polarizing framework. This is a complicated area of study with many variables that are not understood or captured. For example, subjective experience effects likely varies with personality dimensions, disease, prior treatments, knowledge base, view of the research team, and disease severity. Framing subjective experience with a more balanced tone, as an important consideration for future trial design and study execution would enhance the impact of the paper.

      We thank the reviewer for this comment. We reframed our interpretation of results in both the manuscript abstract and discussion, as highlighted in response to reviewer 1 (see R1.2) above.

      R2.3) The discussion of sham approaches for transcranial magnetic stimulation and transcranial direct current stimulation is underdeveloped. There are approaches that are not discussed. The tilt method is seldom used for modern studies for example.

      We thank the reviewer for this comment, and we now rewrote a paragraph elaborating more on different practices to apply sham procedures in the introduction section:

      “Participants that take part in TMS and tES studies consistently report various perceptual sensations, such as audible clicks, visual disturbances, and cutaneous sensations (Davis et al., 2013) Consequently, they can discern when they have received the active treatment, making subjective beliefs and demand characteristics potentially influencing performance (Polanía et al., 2018). To account for such non-specific effects, sham (placebo) protocols have been employed. For transcranial direct current stimulation (tDCS), the most common form of tES, various sham protocols exist. A review by Fonteneau et al., 2019 shows 84% of 173 studies used similar sham approaches to an early method by Gandiga et al., 2005. This initial protocol had a 10s ramp-up followed by 30s of active stimulation at 1mA before cessation, differently from active stimulation that typically lasts up to 20 minutes.. However, this has been adapted in terms of intensity and duration of current, ramp-in/out phases, and the number of ramps during stimulation. Similarly, in sham TMS, the TMS coil may be tilted or replaced with purpose-built sham coils equipped with magnetic shields, which produce auditory effects but ensure no brain stimulation (Duecker & Sack, 2015). By using surface electrodes, the somatosensory effects of actual TMS are also mimicked. Overall, these types of sham stimulation aim to mimic the perceptual sensations associated with active stimulation without substantially affecting cortical excitability (Fritsch et al., 2010; Nitsche & Paulus, 2000). As a result, sham treatments should allow controlling for participants’ specific beliefs about the type of stimulation received.” (p.6)

      References

      Fonteneau, C., Mondino, M., Arns, M., Baeken, C., Bikson, M., Brunoni, A. R., Burke, M. J., Neuvonen, T., Padberg, F., Pascual-Leone, A., Poulet, E., Ruffini, G., Santarnecchi, E., Sauvaget, A., Schellhorn, K., Suaud-Chagny, M.-F., Palm, U., & Brunelin, J. (2019). Sham tDCS: A hidden source of variability? Reflections for further blinded, controlled trials. Brain Stimulation, 12(3), 668–673. https://doi.org/10.1016/j.brs.2018.12.977

      Gandiga, P. C., Hummel, F. C., & Cohen, L. G. (2006). Transcranial DC stimulation (tDCS): A tool for double-blind sham-controlled clinical studies in brain stimulation. Clinical Neurophysiology, 117(4), 845–850. https://doi.org/10.1016/j.clinph.2005.12.003

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study reports a meta-analysis of published data to address an issue that is topical and potentially useful for understanding how the sites of initiation of DNA replication are specified in human chromosomes. The work focuses on the role of the Origin Recognition Complex (ORC) and the Mini-Chromosome Maintenance (MCM2-7) complex in localizing origins of DNA replication in human cells. While some aspects of the paper are of interest, the analysis of published data is in parts inadequate to allow for the broad conclusion that, in contrast to multiple observations with other species, sites in the human genome for binding sites for ORC and MCM2-7 do not have extensive overlap with the location of origins of DNA replication.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the best genetically and biochemically understood model of eukaryotic DNA replication, the budding yeast, Saccharomyces cerevisiae, the genomic locations at which DNA replication initiates are determined by a specific sequence motif. These motifs, or ARS elements, are bound by the origin recognition complex (ORC). ORC is required for loading of the initially inactive MCM helicase during origin licensing in G1. In human cells, ORC does not have a specific sequence binding domain and origin specification is not specified by a defined motif. There have thus been great efforts over many years to try to understand the determinants of DNA replication initiation in human cells using a variety of approaches, which have gradually become more refined over time.

      In this manuscript Tian et al. combine data from multiple previous studies using a range of techniques for identifying sites of replication initiation to identify conserved features of replication origins and to examine the relationship between origins and sites of ORC binding in the human genome. The authors identify a) conserved features of replication origins e.g. association with GC-rich sequences, open chromatin, promoters and CTCF binding sites. These associations have already been described in multiple earlier studies. They also examine the relationship of their determined origins and ORC binding sites and conclude that there is no relationship between sites of ORC binding and DNA replication initiation. While the conclusions concerning genomic features of origins are not novel, if true, a clear lack of colocalization of ORC and origins would be a striking finding.

      Response: Thank you. That is where the novelty of the paper lies.

      However, the majority of the datasets used do not report replication origins, but rather broad zones in which replication origins fire. Rather than refining the localisation of origins, the approach of combining diverse methods that monitor different objects related to DNA replication leads to a base dataset that is highly flawed and cannot support the conclusions that are drawn, as explained in more detail below.

      Response: We are using the narrowly defined SNS-seq peaks as the gold standard origins and making sure to focus in on those that fall within the initiation zones defined by other methods. The objective is to make a list of the most reproducible origins. Unlike what the reviewer states, this actually refines the dataset to focus on the SNS origins that have also been reproduced by the other methods in multiple cell lines. We have changed the last box of Fig. 1A to make this clearer: Shared origins = reproducible SNS-seq origins that are contained in initiation zones defined by Repli-seq, OK-seq and Bubble-seq. This and the Fig. 2B (as it is) will make our strategy clearer.

      Methods to determine sites at which DNA replication is initiated can be divided into two groups based on the genomic resolution at which they operate. Techniques such as bubble-seq, ok-seq can localise zones of replication initiation in the range ~50kb. Such zones may contain many replication origins. Conversely, techniques such as SNS-seq and ini-seq can localise replication origins down to less than 1kb. Indeed, the application of these different approaches has led to a degree of controversy in the field about whether human replication does indeed initiate at discrete sites (origins), or whether it initiates randomly in large zones with no recurrent sites being used. However, more recent work has shown that elements of both models are correct i.e. there are recurrent and efficient sites of replication initiation in the human genome, but these tend to be clustered and correspond to the demonstrated initiation zones (Guilbaud et al., 2022).

      These different scales and methodologies are important when considering the approach of Tian et al. The premise that combining all available data from five techniques will increase accuracy and confidence in identifying the most important origins is flawed for two principal reasons. First, as noted above, of the different techniques combined in this manuscript, only SNS-seq can actually identify origins rather than initiation zones. It is the former that matters when comparing sites of ORC binding with replication origin sites if a conclusion is to be drawn that the two do not co-localise.

      Response: We agree. So the reviewer should agree that our method of finding SNS-seq peaks that fall within initiation zones actually refines the origins to find the most reproducible origins. We are not losing the spatial precision of the SNS-seq peaks.

      Second, the authors give equal weight to all datasets. Certainly, in the case of SNS-seq, this is not appropriate. The technique has evolved over the years and some earlier versions have significantly different technical designs that may impact the reliability and/or resolution of the results e.g. in Foulk et al. (Foulk et al., 2015), lambda exonuclease was added to single stranded DNA from a total genomic preparation rather than purified nascent strands), which may lead to significantly different digestion patterns (ie underdigestion). Curiously, the authors do not make the best use of the largest SNS-seq dataset (Akerman et al., 2020) by ignoring these authors separation of core and stochastic origins. By blending all data together any separation of signal and noise is lost. Further, I am surprised that the authors have chosen not to use data and analysis from a recent study that provides subsets of the most highly used and efficient origins in the human genome, at high resolution (Guilbaud et al., 2022).

      Response: 1) We are using the data from Akerman et al., 2020: Dataset GSE128477 in Supplemental Table 1. We have now separately examined the core origins defined by the authors to check its overlap with ORC binding (Supplementary Fig. S8b).

      2) To take into account the refinement of the SNS-seq methods through the years, we actually included in our study only those SNS-seq studies after 2018, well after the lambda exonuclease method was introduced. Indeed, all 66 of SNS-seq datasets we used were obtained after the lambda exonuclease digestion step. To reiterate, we recognize that there may be many false positives in the individual origin mapping datasets. Our focus is on the True positives, the SNS-seq peaks that have some support from multiple SNS-seq studies AND fall within the initiation zones defined by the independent means of origin mapping (described in Fig. 1A and 2B). These True positives are most likely to be real and reproducible origins and should be expected to be near ORC binding sites.

      We have changed the last box of Fig. 1A to make this clearer: Shared origins = reproducible SNS-seq origins that are contained in initiation zones defined by Repli-seq, OK-seq or Bubble-seq.

      Ini-seq by Torsten Krude and co-workers (Guillbaud, 2022) does NOT use Lambda exonuclease digestion. So using Ini-seq defined origins is at odds with the suggestion above that we focus only on SNS-seq datasets that use Lambda exonuclease. However, Ini-seq identifies a much smaller subset of SNS-seq origins, so, as requested, we have also done the analysis with just that smaller set of origins, and it does show a better proximity to ORC binding sites, though even then the ORC proximate origins account for only 30% of the Ini-seq2 origins (Supplementary Fig. S8d). Note Ini-seq2 identifies DNA replication initiation sites seen in vitro on isolated nuclei.

      References:

      Akerman I, Kasaai B, Bazarova A, Sang PB, Peiffer I, Artufel M, Derelle R, Smith G, Rodriguez-Martinez M, Romano M, Kinet S, Tino P, Theillet C, Taylor N, Ballester B, Méchali M (2020) A predictable conserved DNA base composition signature defines human core DNA replication origins. Nat Commun, 11: 4826

      Foulk MS, Urban JM, Casella C, Gerbi SA (2015) Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res, 25: 725-735

      Guilbaud G, Murat P, Wilkes HS, Lerner LK, Sale JE, Krude T (2022) Determination of human DNA replication origin position and efficiency reveals principles of initiation zone organisation. Nucleic Acids Res, 50: 7436-7450

      Reviewer #2 (Public Review):

      Tian et al. perform a meta-analysis of 113 genome-wide origin profile datasets in humans to assess the reproducibility of experimental techniques and shared genomics features of origins. Techniques to map DNA replication sites have quickly evolved over the last decade, yet little is known about how these methods fare against each other (pros and cons), nor how consistent their maps are. The authors show that high-confidence origins recapitulate several known features of origins (e.g., correspondence with open chromatin, overlap with transcriptional promoters, CTCF binding sites). However, surprisingly, they find little overlap between ORC/MCM binding sites and origin locations.

      Overall, this meta-analysis provides the field with a good assessment of the current state of experimental techniques and their reproducibility, but I am worried about: (a) whether we've learned any new biology from this analysis; (b) how binding sites and origin locations can be so mismatched, in light of numerous studies that suggest otherwise; and (c) some methodological details described below.

      Major comments:

      • Line 26: "0.27% were reproducibly detected by four techniques" -- what does this mean? Does the fragment need to be detected by ALL FOUR techniques to be deemed reproducible?

      Response: If the reproducible SNS-seq peaks are included in the reproducible initiation zones found by the other methods, then we consider it reproducible across datasets. The strategy is to focus our analysis on the most reproducible SNS-seq peaks that happen to be in reproducible initiation zones. It is the best way to confidently identify a very small set of true positive origins. We have re-stated this in the abstract: “only 0.27% were reproducibly obtained in at least 20 independent SNS-seq datasets and contained in initiation zones identified by each of three other techniques (20,250 shared origins),...”

      And what if the technique detected the fragment is only 1 of N experiments conducted; does that count as "detected"?

      Response: A reproducible SNS-seq origin has been reproduced above a statistical threshold of 20 reproductions of SNS-seq datasets. A threshold of reproduction in 20 datasets out of 66 SNS-seq datasets gives an FDR of <0.1. This is explained in Fig. 2a and Supplementary Fig. S2. For the initiation zones, we considered a Zone even if it appears in only 1 of N experiments, because N is usually small. This relaxed method for selecting the initiation zones gives the best chance of finding SNS-seq peaks that are reproduced by the other methods.

      Later in Methods, the authors (line 512) say, "shared origins ... occur in sufficient number of samples" but what does sufficient mean?

      Response: “Sufficient” means that SNS-seq origin was reproducibly detected in ≥ 20 datasets and was included in any initiation zone defined by three other techniques.

      Then on line 522, they use a threshold of "20" samples, which seems arbitrary to me. How are these parameters set, and how robust are the conclusions to these settings? An alternative to setting these (arbitrary) thresholds and discretizing the data is to analyze the data continuously; i.e., associate with each fragment a continuous confidence score.

      Response: We explained Fig. 2a and Supplementary Fig. S2 on line 192 as follows: The occupancy score of each origin defined by SNS-seq (Supplementary Fig. 2a) counts the frequency at which a given origin is detected in the datasets under consideration. For the random background, we assumed that the number of origins confirmed by increasing occupancy scores decreases exponentially (see Methods and Supplementary Table 2). Plotting the number of origins with various occupancy scores when all SNS-seq datasets published after 2018 are considered together (the union origins) shows that the experimental curve deviates from the random background at a given occupancy score (Fig. 2a). The threshold occupancy score of 20 is the point where the observed number of origins deviates from the expected background number (with an FDR < 0.1) (Fig. 2a).

      In the Methods: We have revised the section, “Identification of shared origins” to better describe our strategy. The number of observed origins with occupancy score greater than 20 (out of 66 measures) is 10 times more than expected from the background model. This approach is statistically sound and described by us in (Fang et al. 2020).

      • Line 20: "50,000 origins" vs "7.5M 300bp chromosomal fragments" -- how do these two numbers relate? How many 300bp fragments would be expected given that there are ~50,000 origins? (i.e., how many fragments are there per origin, on average)? This is an important number to report because it gives some sense of how many of these fragments are likely nonsense/noise. The authors might consider eliminating those fragments significantly above the expected number, since their inclusion may muddle biological interpretation.

      Response: We confused the reviewer by the way we wrote the abstract. The 50,000 origins that are mentioned in the abstract is the hypothetical expected number of origins that have to fire to replicate the whole 6x10^9 nt diploid genome based on the average inter-origin distance of 100 kb (as determined by molecular combing). The 7.5M 300 bp fragments are the genomic regions where the 7.5M union SNS-seq-defined origins are located. Clearly, that is a lot of noise, some because of technical noise and some due to the fact that origins fire stochastically. Which is why our paper focuses on a smaller number of reproducible origins, the 20,250 shared origins. Our analysis is on the 20,250 shared origins, and not on all 7.5M union origins. Thus, we are not including the excess of non-reproducible (stochastic?) origins in our analysis.

      The revised abstract in the revised paper will say: “Based on experimentally determined average inter-origin distances of ~100 kb, DNA replication initiates from ~50,000 origins on human chromosomes in each cell-cycle. The origins are believed to be specified by binding of factors like the Origin Recognition Complex (ORC) or CTCF or other features like G-quadruplexes. We have performed an integrative analysis of 113 genome-wide human origin profiles (from five different techniques) and 5 ORC-binding site datasets to critically evaluate whether the most reproducible origins are specified by these features. Out of ~7.5 million union origins identified by all the SNS-seq datasets, only 0.27% were reproducibly obtained in at least 20 independent SNS-seq datasets and contained in initiation zones identified by any of three other techniques (20,250 shared origins), suggesting extensive variability in origin usage and identification in different circumstances.”

      • Line 143: I'm not terribly convinced by the PCA clustering analysis, since the variance explained by the first 2 PCs is only ~25%. A more robust analysis of whether origins cluster by cell type, year etc is to simply compute the distribution of pairwise correlations of origin profiles within the same group (cell type, year) vs the correlation distribution between groups. Relatedly, the authors should explain what an "origin profile" is (line 141). Is the matrix (to which PCA is applied) of size 7.5M x 113, with a "1" in the (i,j) position if the ith fragment was detected in the jth dataset?

      Response: The reviewer is correct about how we did the PCA and have now included the description in the Methods. We have now done the pairwise correlations the way the reviewer suggests, and it is clear that each technique correlates best with itself (though there are some datasets that do not correlate as well as the others even with the same technique) (Supp. Fig. S3). We have also done the PCA by techniques (Fig. 1c), by cell types for all techniques (Supp. Fig. S1c), by cell-types for SNS-seq only (Supp. Fig. S1d), and by year of publication of SNS-seq data (Supp. Fig. S1e). Our conclusions remain the same: in general, origins defined from the same cell lineage are more similar to each other than across lineages, though this similarity within a lineage is more pronounced when we focus on SNS-seq alone. However, even when we look at SNS-seq alone, there is not a perfect overlap of origins determined by different studies on the same lineage. Finally, although we looked only at SNS-seq data after 2018, by which time lamda exonuclease had become the accepted way of defining SNS-seq, there is surprising clustering around each year.

      • It's not clear to me what new biology (genomic features) has been learned from this meta-analysis. All the major genomic features analyzed have already been found to be associated with origin sites. For example, the correspondence with TSS has been reported before:

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6320713/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547456/

      So what new biology has been discovered from this meta-analysis?

      Response: The new biology can be summarized as: (a) We can identify a set of reproducible (in multiple datasets and in multiple cell lines) SNS-seq origins that also fall within initiation zones identified by completely independent methods. These may be the best origins to study in the midst of the noise created by stochastic origin firing. (b) The overlap of these Shared origins (True Positive Origins) with known ORC binding sites is tenuous. So either all the origin mapping data, or all the ORC binding data has to be discarded, or this is the new biological reality in mammalian cancer cells: on a genome-wide scale the most reproduced origins are not in close proximity to ORC binding sites, in contrast to the situation in yeast. (c) Several of the features reported to define origins (CTCF binding sites, G quadruplexes etc.) could simply be from the fact that those features also define transcription start sites (TSS), and the origins may prefer to locate to these parts of the genome because of the favorable chromatin state, instead of the sequence or the structural features of CTCF binding sites or G quadruplexes specifically locating the origins.

      • Line 250: The most surprising finding is that there is little overlap between ORC/MCM binding sites and origin locations. The authors speculate that the overlap between ORC1 and ORC2 could be low because they come from different cell types. Equally concerning is the lack of overlap with MCM. If true, these are potentially major discoveries that butts heads with numerous other studies that have suggested otherwise. More needs to be done to convince the reader that such a mis-match is true. Some ideas are below:

      Idea 1) One explanation given is that the ORC1 and ORC2 data come from different cell types. But there must be a dataset where both are mapped in the same cell type. Can the authors check the overlap here? In Fig S4A, I would expect the circles to not only strongly overlap but to also be of roughly the same size, since both ORC's are required in the complex. So something seems off here.

      Response: We agree with the reviewer that there is something “off here”. Either the techniques that report these sites are all wrong, or the biology does not fit into the prevailing hypothesis. As shown in Supplementary Fig. S6C, we do not have ORC1 and ORC2 ChIP-seq data from the same cell-type. We have ORC1 ChIP-seq and SNS-seq data from HeLa cells and ORC2 ChIP seq and origins from K562 cells, and so have now done the overlap of the binding sites to the shared origins in the same cell-type in the new Figure S5e and S5f. Out of 9605 shared origins in K562 cells, 12.8% overlap with ORC2 and 5.4% overlap with MCM3-7 binding sites also defined in K562 cells. Out of 8305 shared origins in HeLa cells, 4.4% overlap with ORC1 binding sites defined in HeLa cells.

      There is nothing in the Literature that shows that various ORC subunits ChiP-seq to the same sites, and we have unpublished data that shows very poor overlap in the CHiP binding sites of different ORC subunits. The poor overlap between the binding sites of subunits of the same complex either suggests that the subunits do not always bind to the chromatin as a six-subunit complex or that all the ORC subunit ChIP-seq data in the Literature is suspect. We provide in the supplementary figure S6A examples of true positive complexes (SMARCA4/ARID1A, SMC1A/SMC3, EZH2/SUZ12), whose subunits ChIP-seq to a large fraction of common sites.

      Idea 2) Another explanation given is that origins fire stochastically. One way to quantify the role of stochasticity is to quantify the overlap of origin locations performed by the same lab, in the same year, in the same experiment, in the same cell type -- i.e., across replicates -- and then compute the overlap of mapped origins. This would quantify how much mis-match is truly due to stochasticity, and how much may be due to other factors.

      Response: A given lab may have superior reproducibility with its own results compared to the entire field, and the finding that origins published in the same year tend to be clustered together could be because a given lab publishes a number of origin sets in a single paper in a given year. But the notion of stochasticity is well accepted in the field because of this observation: the average inter-origin distance measured by single molecule techniques like molecular combing is ~100 kb, but the average inter-origin distance measure on a population of cells (same cell line) is ~30 kb. The only explanation is that in a population of cells many origins can fire, but in a given cell on a given allele, only one-third of those possible origins fire. This is why we did not worry about the lack of reproducibility between cell-lines, labs etc, but instead focused on those SNS-seq origins that are reproducible over multiple techniques and cell lines.

      Idea 3) A third explanation is that MCMs are loaded further from origin sites in human than in yeast. Is there any evidence of this? How far away does the evidence suggest, and what if this distance is used to define proximity?

      Response: MCMs, of course, have to be loaded at an origin at the time the origin fires because MCMs provide the core of the helicase that starts unwinding the DNA at the origin. Thus, the lack of proximity of MCM binding sites with origins can be because the most detected MCM sites (where MCM spends the most time in a cell-population) does not correspond to where it is first active to initiate origin firing. This has been discussed. MCMs may be loaded far from origin site, but because of their ability to move along the chromatin, they have to move to the origin-site at some point to fire the origin.

      Idea 4) How many individual datasets (i.e., those collected and published together) also demonstrate the feature that ORC/MCM binding locations do not correlate with origins? If there are few, then indeed, the integrative analysis performed here is consistent. But if there are many, then why would individual datasets reveal one thing, but integrative analysis reveal something else?

      Response: In the revised manuscript we have now discussed Dellino, 2013; Kirstein, 2021; Wang, 2017; Mas, 2023. None of them have addressed what we are addressing, which is whether the small subset of the most reproducible origins proximal to ORC or MCM binding sites, but the discussion is essential.

      Idea 5) What if you were much more restrictive when defining "high-confidence" origins / binding sites. Does the overlap between origins and binding sites go up with increasing restriction?

      Response: We have made SNS-seq origins more restrictive by selecting those reproduced by 30, 40, or 50 datasets, in addition to the FDR-determined cutoff of 20. The number of origins fall, but when we do not see any significant increase in the % of origins that overlap with or are proximal to with all ORC or MCM binding sites or Shared ORC or MCM binding sites. This analysis is now included in Supp. Fig. S9 and discussed.

      Overall, I have the sense that these experimental techniques may be producing a lot of junk. If true, this would be useful for the field to know! But if not, and there are indeed "unexplored mechanisms of origin specification" that would be exciting. But I'm not convinced yet.

      • It would be nice in the Discussion for the authors to comment about the trade-offs of different techniques; what are their pros and cons, which should be used when, which should be avoided altogether, and why? This would be a valuable prescription for the field.

      Response: Thanks for the suggestion. We have done what the reviewer suggested in the new Supp. Fig. S4.

      Among the 20,250 high-confidence shared origins, 9,901 (48.9%) overlapped with SNS-seq origins in K562; 3,872 (19.1%) overlapped with OK-seq IZs; 1,163 (5.7%) overlapped with Repli-seq IZs.

      In the reciprocal direction, we asked which method best picks out the highly reproducible shared origins. 2.7% of SNS-seq origins, 17.2% of OK-seq initiation zones and 7.7% of Repli-seq initiation zones overlapped with the 20,250 shared origins

      Thus SNS-seq identifies more of the reproducible origins, but it comes with a high false positive rate.

      ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus we have discussed why the ChIP-seq sites of these protein complexes should not be used to define origins.

      Reviewer #3 (Public Review):

      Summary: The authors present a thought-provoking and comprehensive re-analysis of previously published human cell genomics data that seeks to understand the relationship between the sites where the Origin Recognition Complex (ORC) binds chromatin, where the replicative helicase (Mcm2-7) is situated on chromatin, and where DNA replication actually beings (origins). The view that these should coincide is influenced by studies in yeast where ORC binds site-specifically to dedicated nucleosome-free origins where Mcm2-7 can be loaded and remains stably positioned for subsequent replication initiation. However, this is most certainly not the case in metazoans where it has already been reported that chromatin bindings sites of ORC, Mcm2-7, and origins do not necessarily overlap, likely because ORC loads the helicase in transcriptionally active regions of the genome and, since Mcm2-7 retains linear mobility (i.e., it can slide), it is displaced from its original position by other chromatin-contextualized processes (for example, see Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, and Prioleau et al., 2016 G&D amongst others). This study reaches a very similar conclusion: in short, they find a high degree of discordance between ORC, Mcm2-7, and origin positions in human cells.

      Strengths: The strength of this work is its comprehensive and unbiased analysis of all relevant genomics datasets. To my knowledge, this is the first attempt to integrate these observations and the analyses employed were suited for the questions under consideration.

      Response: Thank you for recognizing the comprehensive and unbiased nature of our analysis. The fact that the major weakness is that the comprehensive view fails to move the field forward, is actually a strength. It should be viewed in the light that we cannot find evidence to support the primary hypothesis: that the most reproducible origins must be near ORC and MCM binding sites. This finding will prevent the unwise adoption of ORC or MCM binding sites as surrogate markers of origins and will stimulate the field to try and improve methods of identifying ORC or MCM binding until the binding sites are found to be proximal to the most reproducible origins. The last possibility is that there are ORC- or MCM-independent modes of defining origins, but we have no evidence of that.

      Weaknesses: The major weakness of this paper is that this comprehensive view failed to move the field forward from what was already known. Further, a substantial body of relevant prior genomics literature on the subject was neither cited nor discussed. This omission is important given that this group reaches very similar conclusions as studies published a number of years ago. Further, their study seems to present a unique opportunity to evaluate and shape our confidence in the different genomics techniques compared in this study. This, however, was also not discussed.

      Response: We have done what the reviewer suggested: use K562 cell type-specific data where origins have been defined by three methods and reporting the percent of shared origins identified by each method (Supp. Fig. S4). Thanks for the suggestion. We have discussed now that SNS-seq identifies more of the reproducible origins, but it comes with a high false positive rate. ORC ChIP-seq and MCM ChIP-seq data do not define origins: they define the binding sites of these proteins. Thus, we have discussed that the ChIP-seq sites of these protein complexes as we now have them should not be used to define origins.

      We do not cite the SNS-seq data before 2018 because of the concerns discussed above about the earlier techniques needing improvement. We have discussed other genomics data that we failed to discuss.

      We have cited the papers the reviewer names:

      Gros, Mol Cell 2015 and Powell, EMBO J. 2015 discuss the movement of MCM2-7 away from ORC in yeast and flies and will be cited. MCM2-7 binding to sites away from ORC and being loaded in vast excess of ORC was reported earlier on Xenopus chromatin in PMC193934, and will also be cited.

      Miotto, PNAS, 2016: publishes ORC2 ChIP-seq sites in HeLa (data we have used in our analysis), but do not measure ORC1 ChIP-seq sites. They say: “ORC1 and ORC2 recognize similar chromatin states and hence are likely to have similar binding profiles.” This is a conclusion based on the fact that the ChIP seq sites in the two studies are in areas with open chromatin, it is not a direct comparison of binding sites of the two proteins.

      Prioleau, G&D, 2016: This is a review that compared different techniques of origin identification but has no primary data to say that ORC and MCM binding sites overlap with the most reproducible origins. It has now been referenced in the context of epigenetic marks and origins.

      Reviewing Editor:

      While there is some disagreement between the reviewers about the analysis performed, there are relevant concerns about the data analyzed (reviewers 1 and 2) and the biological significance of the observation (all three reviewers). There is also concern raised about the ORC ChIP-Seq data and the lack of overlap between published data for ORC1 and ORC2, which, if they were in a complex, the overlap in binding sites should be much better that reported.

      Given the high overlap of ChIP-seq data for subunits of three other complexes shown in Supp. Fig. S6A, the most likely explanation is that ORC1 and ORC2 do not necessarily bind to DNA only as part of a complex. In other words, other protein complexes that contain one subunit or the other also bind DNA. This is not entirely unexpected. Biochemically the ORC2-3-4-5 complex is more stable and more abundant than the six subunit ORC.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      • Line 44, missing spaces near references: "origins(Hu". Repeated issue throughout the manuscript.

      • Line 82: "Notably any technical biases are uniquely associated with each assay" -- how do you know the biases are unique to each assay and orthogonal to each other?

      • Line 135: typo: "using pipeline"

      • Line 136: "All the 113 datasets" -> "Each of the 113 datasets"?

      • Line 156: "differences among different techniques" -> "different" can be removed.

      • Figure 4F: I don't see any difference in 4F amongst shared *. What is the y-axis anyways?

      We have addressed these issues in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The most significant omission is a contextualization of the results in the discussion and an explanation of why these results matter for the biology of replication, disease, and/or our confidence in the genomic techniques reported on in this study. As written, the discussion simply restates the results without any interpretation towards novel insight. I suggest that the authors revise their discussion to fill this important gap.

      A second important, unresolved point is whether replication origins identified by the various methods differ due to technical reasons or because different cell types were analyzed. Given the correlation between TSS and origins (reported in this study but many others too), it is somewhat expected that origins will differ between cell types as each will have a distinct transcriptional program. This critique is partly addressed in Figure S1C. However, given the conclusion that the techniques are only rarely in agreement (only 0.27% origins reproducibly detected by the four techniques), a more in-depth analysis of cell type specific data is warranted. Specifically, I would suggest that cell type-specific data be reported wherever origins have been defined by at least two methods in the same cell type, specifically reporting the percent of shared origins amongst the datasets. This type of analysis may also inform on whether one or more techniques produces the highest (or lowest) quality list of true origins.

      We have done what has been suggested: used K562 cell type-specific data because here the origins have been defined by at least two methods in the same cell type, and reported the percent of shared origins amongst the datasets (Supp. Fig. S4).

      Other MINOR comments include:

      • Line 215: the authors show that shared origins overlap with TF binding hotspots more often than union origins, which they claim suggests "that they are more likely to interact with transcription factors." As written, it sounds like the authors are proposing that ORC may have some direct physical interaction with transcription factors. Is this intended? If so, what support is there for this claim?

      The reviewer is correct. We have rephrased because we have no experimental support for this claim.

      • In the text, Figure 3G is discussed before Figure 3F. I suggest switching the order of these panels in Figure 3.

      Done.

      • It's not clear what Figure 5H to Figure 6 accomplishes. What specifically is added to the story by including these data? Is there something unique about the high confidence origins? If there is nothing noteworthy, I would suggest removing these data.

      We want to keep them to highlight the small number of origins that meet the hypothesis that ORC and MCM must bind at or near reproducible origins. These would be the origins that the field can focus in on for testing the hypothesis rigorously. They also show the danger of evaluating proximity between ORC or MCM binding sites with origins based on a few browser shots. If we only showed this figure we could conclude that ORC and MCM binding sites are very close to reproducible origins.

      • Line 394: "Since ORC is an early factor for initiating DNA replication, we expected that shared human origins will be proximate to the reproducible ORC binding sites." This is only expected if one disbelieves the prior literature that shows that ORC and origins are not, in many cases, proximal. This statement should be revised, or the previous literature should be cited, and an explanation provided about why this prior work may have missed the mark.

      We do not know of any genome-wide study in mammalian cell lines where ORC binding sites and MCM binding have been compared to highly reproducible origins, or that show that these binding sites and highly reproducible origins are mostly not proximal to each other. Most studies cherry pick a few origins and show by ChIP-PCR that ORC and/or MCM bind near those sites. Alternatively, studies sometimes show a selected browser shot, without a quantitative measure of the overlap genome wide and without doing a permutation test to determine if the observed overlap or proximity is higher than what would be expected at random with similar numbers of sites of similar lengths. In the revised manuscript we have discussed Dellino, 2013; Kirstein, 2021; Wang, 2017; Mas, 2023. None of them have addressed what we are addressing, is the small subset of the most reproducible origins proximal to ORC or MCM binding sites?

      • Line 402-404: given the lack of agreement between ORC binding sites and origins the authors suggest as an explanation that "MCM2-7 loaded at the ORC binding sites move much further away to initiate origins far from the ORC binding sites, or that there are as yet unexplored mechanisms of origin specification in human cancer cells". The first part of this statement has been shown to be true (Mcm2-7 movement) and should be cited. But what do the authors mean by the second suggestion of "unexplored mechanisms"? Please expand.

      We have addressed this point in the revised manuscript.

      • The authors should better reference and discuss the previous literature that relates to their work, some of these include Gros et al., 2015 Mol Cell, Powell et al., 2015 EMBO J, Miotto et al., 2016 PNAS, but likely there are many others.

      We have addressed this point in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful for your time and efforts spent on our manuscript. Your feedback has been very valuable. Please see below a point-by-point response to each suggestion and actions taken to address each point in the manuscript.

      eLife assessment

      In this fundamental study, the authors propose analytical methods for inferring evolutionary parameters of interest from sequencing data in healthy tissue relevant to hematopoiesis. By combining analyses of single cell and bulk sequencing data, the authors can use a stochastic process to inform different aspects of genetic heterogeneity. The strength of evidence in support of the authors' claim is thus compelling. The work will be of broad interest to cell biologists and theoretical biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Authors propose mathematical methods for inferring evolutionary parameters of interest from bulk/single cell sequencing data in healthy tissue and hematopoiesis. In general, the introduction is well-written and adequately references the relevant and important previous literature and findings in this field (e.g. the power laws for well-mixed exponentially growing populations). The authors consider 3 phases of human development: early development, growth and maintenance, and mature phase. In particular, time-dependent mutation rates in Figure 2d is an intriguing and strong result, and the process underlying Figures 3 and 4 are generally wellexplained and convincing.

      Thank you for your positive comments.

      Notes & suggestions:

      1. The explanation of Figure 2 in Lines 101 - 111 should be expanded for clarity. First, is Figure 2a derived from stochastic simulation (line 101 suggests) or some theoretical analysis? Second, the gradual transition from f-2 to f-1 is appreciated, but the shape of the intermediates is not addressed in detail. The power laws are straight lines, and the simulations provide curved lines -- please expand in what range (low or high frequency variants) the power law approximations apply.

      Figure 2a was obtained from a numerical solution of equation 1, which describes the time dynamics of the expected VAF distribution. This is indeed unclear from the text, and we thank the reviewer for pointing out this discrepancy.

      We thank the reviewer for this suggestion and have now adjusted this in the text (102-110):

      “Numerical solutions of Eq.(1) show that the expected VAF distribution exhibits a gradual transition from the f-2 (growing population) to the f-1 (constant population) power law (Fig.2). These transitional states themselves do not adhere to some intermediate power-law (e.g. f for 1<<2), but instead present a sigmoidal shape, with the low frequency portion following f-1 and the high frequencies f-2 . Over time the shape changes as a wavelike front traveling from low to high frequency, with the constant-size equilibrium establishing earliest at the lowest frequencies and moving to higher frequency over time. Interestingly, the convergence towards equilibrium slows down over time -- for evenly-spaced observation times the solutions lie increasingly closer together -- further decreasing the speed at which the high frequency portion of the spectrum approaches equilibrium.”

      We also changed the caption of Figure 2 to make this clearer as

      “(a) Expected VAF distributions from evolving Eq1 to different time points for a population with an initial exponential growth phase and subsequent constant population phase (mature size N=103). Once the population reaches the maximum carrying capacity, the distribution moves from a 1/f2 growing population shape (purple) to a 1/f constant population shape (green). Note that the shift slows considerably at older age.”

      In addition, we have also added annotations to Figure 2a and 2b to further clarify which line (green or purple) is f-1 and f-2.

      Additionally, I do not understand the claim in line 108, that the transition is fast for low frequency variants, as the low frequency (on the left of the graph) lines are all close together, whereas the high frequency lines are far apart.

      The lines are closer together in the low frequency portion (left of the plot) because they are already very close to the constant-size equilibrium (f-1/green line) and these frequencies approached equilibrium very fast. On the contrary, in the high frequency portion (right side of plot) they are still very far from equilibrium and approached equilibrium much slower.

      It would be helpful to reiterate in this paragraph that these power laws are derived based on exponentially growing populations and are expected to break down under homeostatic conditions.

      We have adjusted the relevant paragraph in the text to make the validity of the power laws clearer (90-94):

      “For a well-mixed exponentially growing population without cell death the VAF spectrum 𝑣(𝑓) is given by 2𝜇/(𝑓 + 𝑓2 )$ (a 𝑓−2 power law) and is independent of time. In contrast, for a population of constant size – i.e. where birth and death rates are equal – the spectrum obeys 𝑣(𝑓) ∝ 2𝜇/ 𝑓 (a 𝑓−1 power law; see also SI), though this solution is only valid at sufficiently long times.”

      1. The sample vs population (blue vs orange) in Figure 3 is under-explained. How is it that the mutational burden and inferred mutation rate in A and B roughly match, but the VAF distributions in C are so different? How was the sampled set chosen? Perhaps this is an unimportant distinction based on the particular sample set, but the divergence of the two in C may serve as a distraction, here.

      This is an important question, and the answer was perhaps underemphasized in the caption. The sampling was performed as a uniform random sampling with replacement, and the same sample set was used for both the mutational burden and the VAF distribution. The reason for this stark contrast is that while the expectation of the burden distribution is not affected by sampling (i.e. sampling only affects the resolution/amount of stochasticity), the expectation of the VAF distribution changes due to sampling. While this was discussed in the section "Sparse sampling, single cell derived VAF spectra and evolutionary inferences", we have added note of this (indeed surprising) effect in the caption as well:

      “(b) Distribution of estimated mutation rates from 10'000 individual simulations, obtained from burden distributions of the complete populations (blue) as well as sampled sets of cells (orange). Because the expected mutational burden distribution is unaltered by sampling, the expected estimate of the mutation rate from (5) remains unchanged: 𝐸(𝜇̃𝑝𝑜𝑝) = 𝐸(𝜇̃𝑠𝑎𝑚𝑝𝑙𝑒). However, sampling increases the noise on the observed burden distribution, which results in a higher errormargin of the estimate: 𝜎(𝜇̃𝑝𝑜𝑝) < 𝜎(𝜇̃𝑠𝑎𝑚𝑝𝑙𝑒).”

      “(c) VAF spectra measured in the complete population (blue) and a sampled set of cells (orange). In contrast with the mutational burden distribution, strong sampling changes the shape of the expected distribution. A single simulation result is shown (diamonds) alongside the theoretically predicted expected values for both the total and sampled populations (Eqs. (1) and (6))(dashed line) and the average across 100 simulations (solid line).”

      1. The comparison of results herein to claims by Mitchell (ref. 12) are quite important results within the paper. I appreciate the note in the final paragraph of the discussion, and I suggest adding a sentence referencing the result noted in line 248-249 to the abstract, as well.

      We agree with the reviewer. We have extended the abstract now to reference the result in more detail:

      “However, the single cell mutational burden distribution is over-dispersed compared to a model of Poisson distributed random mutations suggesting. A time-associated model of mutation accumulation with a constant rate alone cannot generate such a pattern. At least one additional source of stochasticity would be needed. Possible candidates for these processes may be occasional bursts of stem cell divisions, potentially in response to injury, or non-constant mutation rates either through environmental exposures or cell intrinsic variation.”

      Reviewer #2 (Public Review):

      Summary: The authors provide a nice summary on the possibility to study genetic heterogeneity and how to measure the dynamics of stem cells. By combining single cell and bulk sequencing analyses, they aim to use a stochastic process and inform on different aspects of genetic heterogeneity.

      Strengths: Well designed study and strong methods

      Thank you for your positive comments.

      Weaknesses: Minor

      Further clarification to Figure 3 legend would be good to explain the 'no association' of number of samples and mutational burden estimate as per line 180-182 p.8.

      We have added a note to the caption of Figure 3b to explain more clearly how sampling affects the burden distribution and the mutation rate inferred from it (see also previous response to Reviewer 1):

      “Because the expected mutational burden distribution is unaltered by sampling, the expected estimate of the mutation rate from (5) remains unchanged: 𝐸(𝜇̃𝑝𝑜𝑝) = 𝐸(𝜇̃𝑠𝑎𝑚𝑝𝑙𝑒). However, sampling increases the noise on the observed burden distribution, which results in a higher errormargin of the estimate: 𝜎(𝜇̃𝑝𝑜𝑝) < 𝜎(𝜇̃𝑠𝑎𝑚𝑝𝑙𝑒).”

      Reviewer #1 (Recommendations For The Authors):

      Minor/editorial suggestions:

      1. Equation 1, please define \partial_t and \partial_K, for clarity.

      These have now been defined in the text (between line 85-86): “where 𝜅 = 𝑓𝑁(𝑡) denotes the number of cells sharing a variant (the variant frequency f times the total population size N), 𝛿(x) is the Dirac impulse function, 𝜕𝑡 and 𝜕𝜅 are the partial derivatives with respect to time and variant size.”

      1. Figure 2: It would be helpful to label the green and purple lines with the corresponding 1/f and 1/f^2 rule, in addition to the growing/fixed label, for clarity.

      We agree and have now added the corresponding labels to each line.

      Reviewer #2 (Recommendations For The Authors):

      Minor suggestions are given below:

      It would be nice for the authors to comment on whether the results could be extended/modified to account for possible fitness advantage of mutations which would be clinically relevant, for instance in the case of CHIP mutations and difference in time to myeloid malignancies transformation between CHIP/No CHIP individuals.

      This is an important point. We agree with the reviewer that CHIP mutations play an important role in shaping mutational diversity especially in older individuals. Evidence is now emerging that CHIP mutations are almost universally present in individuals 60+. Interestingly, in individuals younger than 60, a neutral model (as presented here), does capture the observed effective dynamics well. For the purpose of the analysis underlying this manuscript, a neutral model seems reasonable.

      The techniques we use here can be adjusted to include selection. How the results extend or modify will critically depend on the actual model of selection (rare or frequent CHIP mutations, strong vs weak selection etc.) that is realized in human hematopoiesis. Here we would say, the underlying biology currently is mostly unknown and is subject to (by others and in part by us) ongoing investigations, which extend beyond the scope of this manuscript.

      We now make note of this point in the manuscript and added a small paragraph in page 11 to the discussion:

      “Another open question is the role of selection and how it shapes intra-tissue genetic heterogeneity. Evidence is emerging that positively selected variants in blood are almost universally present in individuals above 60, while the effective observable dynamics in younger individuals is well described by neutral dynamics. How results presented here generalize or modify will critically depend on the model of selection realized in human hematopoiesis, e.g. a models of rare or frequent driver events. Details of the underlying biology are currently unknown.”

      It would be nice to see if any significant differences in parameter estimates occur between loci with/without linkage disequilibrium, for instance HLA region. Could the number of single-cell samples be 'more' relevant when studying the VAF distribution in HLA region?

      This is a good suggestion. We might be wrong or missing an important point, but somatic evolution as we use it in our modeling here is solely driven by asexual reproduction of cells. As such the entire genome of the cell is in linkage disequilibrium, independent of the precise genomic region (somatic evolution is in first approximation blind to germline mutations, as they are present in every single cell of the organism and therefore do not carry any information on the somatic evolutionary dynamics).

      We thank all editors and reviewers again for your constructive comments.

    1. Author Response

      I would like to express my thorough gratitude to the editors and reviewers, for the helpful comments and valuable suggestions, which provided us an opportunity to further address our research. Prior to submitting our final revision, here we provide our preliminary responses for the comments. Please find our detailed responses to the reviewers’ recommendations below.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Response: Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      1. Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      2. Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      3. Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      Response: We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from palaeoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this, they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths: The strengths of the work are the very careful biochemical analyses and the interesting result for wild-type LRRK2.

      Weaknesses:

      A major unexplained weakness is why the mutant T1343A starts out with so much lower activity--it should be the same as wild-type, non-phosphorylated protein. Also, if a monomer-dimer transition is involved, it should be either all or nothing. Other approaches would add confidence to the findings.

      We thank the reviewer for these suggestions. We are aware that the T1343A has generally a lower activity compared to the wild type. Therefore, we would like to emphasize that this mutant is the only one not showing an increase in Km values after ATP treatment. Other mutants, also having lower kcat values like T1503A, still show this characteristic change in Km. Our favored explanation for the lower kcat of T1343A is that this mutation lays within a critical region, the so-called ploop, of the Roc domain and is very likely structurally not neutral. Concerning the dimer-monomer transition, we are convinced that there is more than one factor involved in this equilibrium. Most likely, including, but not limited to other LRRK2 domains (e.g. the WD40 domain), binding of co-factors (e.g. Rab29/Rab32 or 14-3-3) and membrane binding. Consistently, also n with stapled peptides targeting the Roc or Cor domains we were not able to shift the equilibrium completely to the monomer (Helton et al., ACS Chem Biol. 2021, 16:2326-2338; Pathak et al. ACS Chem Neurosci. 2023, 14(11):1971-1980) We will address these points in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      This study addresses the catalytic activity of a Ras-like ROC GTPase domain of LRRK2 kinase, a Ser/Thr kinase linked to Parkinson's disease (PD). The enzyme is associated with gain-of-function variants that hyper-phosphorylate substrate Rab GTPases. However, the link between the regulatory ROC domain and activation of the kinase domain is not well understood. It is within this context that the authors detail the kinetics of the ROC GTPase domain of pathogenic variants of LRRK2, in comparison to the WT enzyme. Their data suggest that LRRK2 kinase activity negatively regulates the ROC GTPase activity and that PD variants of LRRK2 have differential effects on the Km and catalytic efficiency of GTP hydrolysis. Based on mutagenesis, kinetics, and biophysical experiments, the authors suggest a model in which autophosphorylation shifts the equilibrium toward monomeric LRRK2 (locked GTP state of ROC). The authors further conclude that T1343 is a crucial regulatory site, located in the P-loop of the ROC domain, which is necessary for the negative feedback mechanism. Unfortunately, the data do not support this hypothesis, and further experiments are required to confirm this model for the regulation of LRRK2 activity.

      Specific comments are below:

      • Although a couple of papers are cited, the rationale for focusing on the T1343 site is not evident to readers. It should be clarified that this locus, and perhaps other similar loci in the wider ROCO family, are likely important for direct interactions with the GTP molecule.

      To clarify this point: We, have not only have focused on this specific locus, but instead systematically mutated all known auto-phosphorylation sites with the RocCOR domain (see. supplemental information). Furthermore, it has been shown that this site, at least in the RCKW (Roc to WD40) construct, is quantitatively phosphorylated (Deniston et al., Nature 2020, 588:344-349). We are aware that the T1343 residue is located within the p-loop and that this can impact nucleotide binding capacities (see response to reviewer 1). We will clarify and address these points in a revised version of the manuscript.

      • Similar to the above, readers are kept in the dark about auto-phosphorylation and its effects on the monomer/dimer equilibrium. This is a critical aspect of this manuscript and a major conceptual finding that the authors are making from their data. However, the idea that auto-phosphorylation is (likely) to shift the monomer/dimer equilibrium toward monomer, thereby inactivating the enzyme, is not presented until page 6, AFTER describing much of their kinetics data. This is very confusing to readers, as it is difficult to understand the meaning of the data without a conceptual framework. If the model for the LRRK2 function is that dimerization is necessary for the phosphorylation of substrates, then this idea should be presented early in the introduction, and perhaps also in the abstract. If there are caveats, then they should be discussed before data are presented. A clear literature trail and the current accepted (or consensus) mechanism for LRRK2 activity is necessary to better understand the context for these data.

      We agree on the reviewer’s opinion. We will address this point in a revised version of the manuscript.

      • Following on the above concepts, I find it interesting that the authors mention monomeric cytosolic states, and kinase-active oligomers (dimers??), with citations. Again here, it would be useful to be more precise. Are dimers (oligomers?) only formed at the membrane? That would suggest mechanisms involving lipid or membrane-attached protein interactions. Also, what do the authors mean by oligomers? Are there more than dimers found localized to the membrane?

      There are multiple studies that have shown that LRRK2 is mainly monomeric in the cytosol while it forms mainly dimeric or higher oligomeric states at membrane (James et al., Biophys. J. 2012, 102, L41–L43; Berger et al., Biochemistry, 2010, 49, 5511–5523). However, we agree with the reviewer that it remains to be determined if the dimeric form is the most active state at the membrane, or a higher oligomeric state. Especially since a recent study shows that LRRK2 can form active tetramers only when bound to Rab29 (Zhu et al., bioRxiv, 2022, DOI: 10.1101/2022.04.26.489605). We will clarify and address these points in the introduction of a revised version of the manuscript.

      • Fig 5 is a key part of their findings, regarding the auto-phosphorylation induced monomer formation of LRRK2. From these two bar graphs, the authors state unequivocally that the 'monomer/dimer equilibrium is abolished', and therefore, that the underlying mechanism might be increased monomerization (through maintenance of a GTP-locked state). My view is that the authors should temper these conclusions with caveats. One is that there are still plenty of dimers in the auto-phosphorylated WT, and also in the T1343A mutant. Why is that the case? Can the authors explain why only perhaps a 10% shift is sufficient? Secondly, the T1343A mutant appears to have fewer overall dimers to begin with, so it appears to readers that 'abolition' is mainly due to different levels prior to ATP treatment at 30 deg. I feel these various issues need to be clarified in a revised manuscript, with additional supporting data. Finally, on a minor note, I presume that there are no statistically significant differences between the two sets of bar graphs on the right panel. It would be wise to place 'n.s.' above the graphs for readers, and in the figure legend, so readers are not confused.

      Starting with the monomer-dimer equilibrium we are convinced that there is more than the phosphorylation of T1343 (see response to reviewer 1). Therefore a 10% shift in our assay most likely underestimate the effect seen in cells.

      Consistently, the T1343A mutants show a similar increase in Rab10 phosphorylation assay as the G2019S mutant. This thus shows that the identified feedback mechanism plays an important role in a cellular context. We will explain this in more detail in a revised version of the manuscript. Concerning the bar diagram, we will add the “n.s.” indication in a future version of the manuscript.

      • Figure 6B, Westerns of phosphorylation, the lanes are not identified and it is unclear what these data mean.

      We apologize for this mistake and will add the correct labeling in a revised version of the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Major concerns/weakness:

      1) All the results in Fig. 2 utilized two glioma lines SF188 and Res259. The authors should repeat all these experiments in a couple of H3.3K27M DMG lines by deleting the H3.3K27M mutation first.

      We thank the referee for his/her comments that will help us to strengthen our conclusions.

      The reviewer's proposal is interesting, but this approach to deletion of the K27M mutation rather answers the question of the role of the BMP pathway in maintaining the phenotype of DMG cells. Our aim in the first part of this article (with Res and SF188) is rather to study how the BMP pathway can participate in installing a particular cellular state at the time of expression of the K27M mutation. In other words, the underlying idea is to define the phenotypic changes specifically associated with activation of the BMP pathway when epigenetic modifications are induced by expression of the K27M mutation. We have chosen the SF188 and Res259 models to remain in a glial context, but it would indeed be interesting to test the effect of this synergy in other models, closer to the cells of origin of DMG. In any case, these models should make it possible to answer the question of the cellular state transition at the moment of K27M expression, even if the reciprocal question of the reversibility of this state proposed by the reviewer is also of interest for understanding the oncogenic synergy between BMP/K27M.

      2) Fig. 3. The experiments of BMP2 treatment should be repeated in other H3.3K27M DMG lines using H3.1K27M ACVR1 mutant tumor lines as controls.

      We will provide the results of these experiments in a revised version. The use of mutant ACVR1 lines is interesting, but their control status seems questionable, as the addition of BMPs could have a cumulative effect on the effect of the mutation, notably by activating other receptors in the pathway.

      Minor concerns:

      Fig.2A. BMP2 expression increased in H3.3K27M SF188 cells. Therefore, the statement "whereas BMP2 and BMP4 expressions are not significantly modified (Figure 2A and Figure 2-figure supplement A-B)" is not accurate.

      The referee is absolutely right and we will correct this statement in the revised version.

      Reviewer #2 (Public Review):

      [...] The paper is well-written and easy to follow with a robust experimental plan and datasets supporting the claims. While previous work (acknowledged by the authors) indicated activation of BMP in H3K27M tumors, wild type for the ACVR1 mutation this paper is a nice addition and provides further mechanistic cues as to the importance of the BMP pathway and specific members in these deadly brain cancers. The effect of these BMPs in quiescence and invasion is of particular interest.

      We thank the referee for his/her supportive comments.

      A few suggestions to clarify the message are provided below:

      1- In thalamic diffuse midline gliomas, the BMP pathway should not be activated as it is in the pons. The authors should identify thalamic tumors in the datasets they explored and patients-derived cell lines from thalamic tumors available to investigate whether this pathway is active across all H3.3K27M mutants in the brain midline or specifically in tumors from the pons.

      The referee's question is an interesting one, and we will try to see if we can determine tumor’s location from the public data we've used. We will nevertheless try to determine whether the inter-patient variability observed in the level of activation of the BMP pathway may be due, in particular, to different tumor locations.

      2 - There are ~20% H3.3K27M tumors that carry an ACVR1 mutation and similar numbers of H3.1K27M that are wild type for this gene. Can the authors identify these outliers in their datasets and assess the activation of BMP2 and 7 or other BMP pathway members in this context?

      Indeed, defining the level of activation of the pathway in this type of H3.3K27M ACVR1 mutant or H3.1K27M ACVR1 wt tumors would be extremely interesting, but no samples of this type are a priori included in the datasets analyzed. Instead, we will try to define the phenotype of cell lines of this type in response to BMP.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      1. The manuscript study would be improved by further discussion of the mechanistic relationship between this class of sex-biased DHS and the other 2/3 of liver DHS that also show male-biased accessibility but whose chromatin does not respond directly to GH-stimulated STAT5.

      Response: We added a new paragraph to the Discussion (lines 608-618) discussing our novel finding that sex-biased H3K36me3 marks uniquely distinguish Static sex-biased DHS from Dynamic sex-biased DHS (see Fig. 6C) in light of a recent study in a different biological system showing that H3K36me3 marks comprise an important mechanism for maintaining cell type-specific identity by inhibiting the spread of H3K27me3 repressive marks at cell type-specific enhancers [Nat Cell Biol, 25 (2023) 1121-1134]. Further, we now discuss the potential mechanistic significance of this mark in insuring the sex-biased chromatin accessibility at Static sex-biased DHS:

      “Finally, we discovered that sex-biased H3K36me3 marks are a unique distinguishing feature of static sex-biased DHS, with male-biased H3K36me3 marks being highly enriched at static male-biased DHS but not at dynamic male-biased DHS, and female-biased H3K36me3 marks highly enriched at static female-biased DHS (Fig. 6C). H3K36me3 marks are classically associated with the demarcation of actively transcribed genes [50] but are also used to maintain cell type identity by inhibiting the spread of H3K27me3 repressive marks at cell type-specific enhancers [35, 51]. The enrichment of H3K36me3 marks at static male-biased DHS described here could thus be an important mechanism to maintain sex-dependent hepatocyte identity by keeping static male-biased enhancers constitutively open and free of H3K27me3 repressive marks in male liver, and similarly for H3K36me3 marks enriched at static female-biased DHS in female liver. Further study is needed to elucidate the underlying mechanisms whereby these and the other sex-specific histone marks discussed above are deposited on chromatin in a sex-dependent and site-specific manner and the roles that GH plays in regulating these epigenetic events”.

      1. Previous studies, including those in the Waxman lab (PMIDs: 26959237, 18974276, 35396276) suggest castration of males or gonadectomy of both sexes eliminates most sex differences in mRNA expression in mouse liver, and/or that androgens such as DHT or testosterone administered in adulthood potentially reverses the effects of gonadectomy and/or masculinizes liver gene expression. It is not clear from the present discussion whether the GH/STAT5 cyclic effects to masculinize chromatin status require the presence of androgens in adulthood to masculinize pituitary GH secretion. Are there analyses of the present (or past) data that might provide evidence about a dual role for GH and androgen acting on the same genes? For example, are sex-biased DHS bound by androgen-dependent factors or show other signs of androgen sensitivity? Are histone marks associated with DHS regulated by androgens? Moreover, it would help if the authors indicate whether they believe that the "constitutive" static sex differences in the larger 2/3 set of male-biased DHS are the result of "constitutive" (but variable) action of testicular androgens in adulthood. Although the present study is nicely focused on the GH pulse-sensitive DHS, is there mechanistic overlap in sex-biasing mechanisms with the larger static class of sex-biased liver DHS?

      Response: The Reviewer poses an intriguing set of question regarding the potential role of androgens in directly regulating, perhaps by working together with GH or GH-activated STAT5 at the level of chromatin, to co-regulate the set of Static male-biased DHS. We have now addressed these questions in full in a new Discussion paragraph, entitled, “Pituitary GH secretory patterns vs. gonadal steroids as regulators of sex-biased liver chromatin accessibility and gene expression” (lines 640-661), as follows:

      “While testosterone has a well-established role in programming hypothalamic control of pituitary GH secretory patterns [9-11], it is also possible that androgens and estrogens could regulate sex differences in hepatocytes directly at the epigenetic or transcriptional level. However, our findings support the proposal that plasma GH patterns, and not gonadal steroids, dominate epigenetic control of liver sex differences. First, the ability of a single exogenous plasma GH pulse to rapidly reopen dynamic male-biased DHS closed by hypophysectomy – in the face of ongoing ablation of pituitary stimulated gonadal steroid production and secretion – implicates GH signaling per se in the direct regulation of chromatin accessibility for this class of male-biased DHS. Second, GH regulates the sex bias of static male-biased DHS as well, as evidenced by their widespread closure in male liver following continuous GH infusion (Table S2E). It is important to note, however, that hepatocyte-specific knockout of androgen receptor (AR) does, in fact, dysregulate ~15% of sex-biased genes, albeit with a much lower effect size than global AR knockout [52] due to the systemic disruption of the somatotropic axis and circulating GH secretory profiles [53, 54]. Conceivably, AR could regulate these genes by a direct binding mechanism, acting either alone or in concert with GH-activated STAT5 to keep chromatin open constitutively at a subset of static male-biased DHS, of which 32% undergo at least partial closure in male liver following hypophysectomy (Fig. 4C). Estrogen receptor (ERa) likely plays only a minor role in regulating sex-biased liver DHS enhancers, given the lack of effect of hepatocyte-specific ERa knockout on sex-biased liver gene expression [22] and our finding that only 12% of static female-biased DHS close in female liver following hypophysectomy, which decreases circulating estradiol levels [55].”.

      Reviewer #2 (Public Review):

      The Reviewer did not raise any points of criticism.

      Reviewer #2 Recommendations:

      Line 121. "highly enriched for genes of the corresponding sex bias" is unclear. Does this mean that the genes near the DHS have the same bias in level of transcription as the bias in open chromatin? Please clarify.

      Response: Text was changed to: “were highly enriched for mapping to genes showing the corresponding sex bias in the level transcription, but not for genes whose expression shows the opposite sex bias”.

      Line 161. "STAT5 activity-dependent patterns" seems not to be supported by the data. The patterns correlate with STAT5 activity, but the authors can't conclude that they depend on STAT5 activity based on these data alone.

      Response: Text was changed to: “patterns of DNase-released fragments that correlate with STAT5 activity”

      Line 171. "identify genomic regions where chromatin dynamically opens or closes in male mouse liver in response to GH pulse activation of STAT5" This statement assumes a causal relationship between STAT5 and the status of differential sites. The data do not support this assumption of causality, because the data correlate STAT5 with status of the differential sites.

      Response: Text was changed to: “identify genomic regions where chromatin dynamically opens or closes in male mouse liver in close association with GH pulse activation of STAT5”.

      Line 176. The "binary pattern" in figure 2D seems not to be as binary as the authors suggest. The blue and red samples overlap in their distribution, and the lower green samples are intermediate between most of the blue and red samples. The "arbitrary" dotted line suggests the binary status, but this line is less convincing because it is arbitrary and drawn by eye; some samples don't obey the binary dichotomy.

      Response: Text was changed to: “This pattern, where individual male mouse livers largely show either high or low DNase-seq read count distributions at the top differential genomic sites, was also seen…”.

      Line 224 "independent" also implies causality.

      Response: No changes were made.

      Line 284. The effects of hypophysectomy on liver chromatin accessibility is attributed here to the loss of GH secretions. Hypophysectomy will also reduce testicular androgen secretion. To what extent can the results of Hypox be attributed to STAT5-dependent mechanisms as opposed to the loss of androgens?

      Response: This question is now discussed in full in the new Discussion section, entitled, “Pituitary GH secretory patterns vs. gonadal steroids as regulators of sex-biased liver chromatin accessibility and gene expression” (lines 640-661), as noted above.

      Line 505. "euthanized between plasma GH pulses". The authors are making an inference here because I do not think they measured GH levels. It would be more accurate to say that the time of euthanasia is inferred to be between GH pulses based on the measurement of STAT5 which is GH-dependent.

      Response: Text was changed to: “a time inferred to be between plasma GH pulses”.

      Reviewer #3 Recommendations:

      In Figure 1A the differences between female-biased enhancers and sex-independent enhancers seem greater than those comparing female-biased insulators and sex-independent insulators, and yet only the latter are significant. Please could you clarify?

      Response: Figure legend was corrected to indicate that Enhancers + Weak Enhancers were analyzed as a single group. Furthermore, the location of the Enhancer asterisks above the bars on the figure was adjusted to reflect this.

      Line 257, I could not find Table S1B.

      Response: Text in Figure legend was corrected to specify Table S7A as the source of this data.

      Line 265 "BCL6 binding was also enriched at dynamic sex-independent DHS (Table S7B)." The p-value of this enrichment was particularly high. Could this have a biological correlation?

      Response: We cannot rule out that possibility.

      Line 277 "identified a Fox family factor as a close match for one of the top enriched motifs in the set of 278 static but not in the set of dynamic male-biased DHS", Maybe authors could add that this holds true for FOXI1 and not for FOXD1.

      Response: Text was changed to specify FOXI1 as the factor.

      Line 368, please clarify the affirmation because in Table 1A we do not see the data of dynamic and static male-biased DHS, but only male-biased, female-biased, and sex-independent DHS subsets.

      Response: Text was corrected to read: “Our initial analyses revealed no major differences between dynamic and static male-biased DHS regarding the distribution of enhancer vs insulator vs promoter classifications (Fig. S7A) or their overall chromatin state distributions (Fig. S7B)”.

      Figure 7A and 7B. It would visually help the reader if in E1, E2, etc. you could include the short definitions (as in Figure 1B: Inactive, Inactive, Low signal, etc.)

      Response: We thank the reviewer for this suggestion, and have now added the X-axis labels suggested by the Reviewer.

      Line 570 The sentence was difficult to read "similar to E6, but unlike E6," Maybe removing the comma after "unlike E6" would help.

      Response: Text has been edited to avoid this cumbersome construct. It now reads: “…characterized by a high frequency of same activating chromatin marks as chromatin state E6, i.e., H3K27ac and H3K4me1 (E9) or H3K27ac alone (E10), but unlike E6 they are both deficient in…”.

      Other changes include revisions to the Abstract to take into account the new discussion concerning the impact of sex-biased H3K36me3 marks along with related and other revisions to the Discussion, and a revision to the manuscript Title to better capture its main message.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and effort to review our manuscript. We have provided a response to their thoughtful questions below. In our revised manuscript, we have expanded the Discussion to comment on the significance of reversible modification of APC with polyubiquitin, and how the APC transport defect might be rescued (lines 335 to 346). A new Supplementary Figure 3 has been added to show a replicate DUB assay and the uncropped gel of Figure 1C in the main text.

      Reviewer #1 (Recommendations For The Authors):

      To address the weaknesses outlined below, I have the following comments and suggestions for experiments:

      1) Functional link between mouse phenotypes and proposed mechanism: could the authors rescue neuron/glia cell density or motor defects by restoring axonal trafficking of APC?

      We have shown that inhibition of glycogen synthase kinase 3 (GSK3) abolished APC ubiquitylation (PMID 22761442). Etienne-Manneville and Hall have reported that GSK3 inactivation promotes APC association with microtubule plus ends to drive polarised astrocyte migration (PMID 12610628). It is therefore conceivable that treating Trabid mutant neurons with a GSK3 inhibitor could suppress APC ubiquitylation, restore APC transport, and rescue the defective axon growth. GSK3 has multiple targets so there are caveats to using potent inhibitors of this kinase. But such an experiment is integral to a future study aimed at rescuing Trabid mutant mouse phenotypes by GSK3 inhibition.

      Does perturbation of APC trafficking phenocopy the defects of TRABID p.R438W and p.A451V knock in mice during neurodevelopment? I appreciate that these experiments might not be easily feasible.

      Presently we do not know how to directly perturb APC transport (besides generating a Trabid mutation). Speculatively, APC phosphosite mutants which mimic constitutive phosphorylation by GSK3 might accumulate polyubiquitin, aggregate, and exhibit disrupted axonal transport. We predict that such APC mutants will cause neurodevelopmental abnormalities in mouse models.

      Thus, alternatively, could the authors provide evidence from unbiased proteomic approaches that APC is a major substrate of TRABID- and STRIPAK-dependent deubiquitylation during neurodevelopment? E.g., what are the changes in the ubiquitylome of neural progenitor cells isolated from mouse embryos with TRABID mutant alleles and is APC amongst the top dysregulated hits? What are the changes in the interactome of TRABID p.A451V and is the STRIPAK complex a major interactor that is lost?

      We are generating antibodies capable of immunoprecipitating endogenous Trabid from mouse cells. This antibody tool will allow us to characterise the Trabid-STRIPAK complex using advanced ubiquitin proteomic approaches to determine interactors and changes to the ubiquitylome of Trabid mutant cells.

      2) Related to the point 1, given that TRABID has been reported to be a regulator of immune signaling pathways (PMID: 26808229, 37237031), can the authors exclude a contribution of this function to the observed phenotypes during neurodevelopment?

      We have not observed any cellular or tissue phenotypes in young or aged Trabid mutant mice indicative of immune system dysregulation. We and others have shown that Trabid deficiency has no impact on the transcription of interferon and NF-B-stimulated genes or cytokine production in mouse and human cells (PMID 18281465; 17991829; unpublished). Nevertheless, a formal investigation is required to determine any changes to immune signalling pathways in our Trabid mutant mice.

      3) Based on previously published interactions, the authors propose that TRABID uses the STRIPAK complex to recruit its substrate APC. Could the authors provide experimental evidence for this by using their cellular model in Figure 4? Would depleting components of the STRIPAK complex in HEK 293T cells stably transfected with DOX-inducible WT-TRABID stabilize APC ubiquitylation upon dox induction?

      We have demonstrated that RNAi-mediated depletion of all 3 striatin proteins in HEK293T cells increased the levels of ubiquitin-modified APC (PMID 23277359). Moreover, depleting Trabid and the 3 Striatins together strongly increased the ubiquitin-modified APC pool, consistent with our model that Trabid and STRIPAK function together to deubiquitylate APC. In our inducible system, we would likely need to eliminate the expression of the STRIPAK component that directly recruits Trabid to achieve a null effect of Trabid overexpression on APC deubiquitylation. Experiments are in progress to determine which STRIPAK component binds directly to Trabid.

      4) Related to point 3, given that A451, the residue that mediates STRIPAK binding is in close proximity to the catalytic cysteine residue, how do the authors envision STRIPAK binding and OTU-dependent cleavage activity to work together at a structural level?

      A451 resides at the back of the active site in a pocket hypothesised to accommodate a short peptide from an interacting protein. The A451V mutant AnkOTU domain purified from bacteria retained full DUB activity, suggesting that Trabid’s ability to cleave polyubiquitin is independent of its ability to bind STRIPAK. Striatin proteins contain WD40 repeats which is a protein fold that binds ubiquitin (PMID 21070969). While the DUB- and STRIPAK-binding activities of Trabid might not be coupled structurally, it is plausible that Striatin could modulate Trabid’s ubiquitin linkage specificity in cells through allosteric interactions with the ubiquitin chain on the substrate.

      5) Is it known why APC needs to be reversibly modified with ubiquitin to be transported in axons and how increased APC ubiquitylation leads to impaired transport or could the authors speculate on this?

      We have shown that APC ubiquitin modification correlated with its binding to Axin in the -catenin destruction complex (PMID 22761442). Conversely, non-ubiquitin-modified APC accumulates in membrane protrusions (PMID 23277359). From this we have proposed that ubiquitin regulates the distribution of APC between its two major functional pools in cells. Chronic APC ubiquitylation in Trabid deficient/mutant neurons might result in increased APC sequestration into Axin destruction complexes and/or promote spurious interactions with ubiquitin binding proteins that cause APC to aggregate, and therefore retard its transport in axons.

      Additional minor comments to consider:

      • Figure 1C: What are the protein smears in the in vitro assays of A541V 15min and CS 120min? I would assume that contaminants from the protein preparations should be the same across different conditions and in particular across different time points of the same Trabid mutant.

      In replicate DUB assays using the same AnkOTU protein preparations we did not detect any smears (Supplementary Figure 3A). It is unclear what caused the smears in Figure 1C, but it is plausible that contaminants in specific tubes/assays are contributing factors.

      • Figure 1D: why is the amount of AnkOTU protein reduced for WT, R438W, and A541 in a time-dependent manner?

      With increasing incubation time in DUB assays, adducts of various molecular weights may form between ubiquitin and the AnkOTU domain. It is plausible that some of these adducts are non-gel-resolved high molecular weight aggregates that sequester some of the AnkOTU proteins. These aggregates, which could have been retained in the loading wells, were presumably washed away during our silver staining procedure hence we do not see them in the full-length gel (Supplementary Figure 3B).

      Reviewer #2 (Recommendations For The Authors):

      • The partial penetrance of the mouse knockin phenotype is confusing, especially as this is evident on an apparently inbred background. Can authors explain the factors that contribute to these differences?

      Low mutant Trabid protein expression in distinct neural crest or progenitor populations could contribute to the reduced penetrance of the cell number phenotype. APC dysfunction in Trabid mutant cells might also impact its role as a negative regulator of the Wnt signalling pathway which regulates neuronal and glial cell fates in the developing brain (PMID 9845073). It is conceivable that in some Trabid mutant mice where APC dysfunction is mild (due to low levels of mutant Trabid protein expression), compensatory mechanisms overcome APC’s reduced function in Wnt signalling and cytoskeleton organization to permit normal brain development. A future study to investigate perturbations of Wnt signalling pathways in Trabid mutant mice is warranted.

      • The use of the term 'hemizygous' is confusing, as it typically refers to when one copy of a gene is present as in X-linked conditions. Might the authors mean 'heterozygous'?

      All instances of ‘hemizygous’ in the manuscript have been amended to ‘heterozygous’.

      • Fig. 3A y-axis units is confusing. Do the authors mean number of TH+ SNc neurons evident per section?

      We have amended the y-axis in Fig. 3A to indicate number of TH+ neurons evident per section.

      • Since the TH phenotype is one of the phenotypes that is partially penetrant, did authors include both penetrant and non-penetrant mice in Fig. 3 and other figures? Shouldn't there be error bars in Fig. 3A, since multiple mice were presumably used for analysis for each condition?

      Each data point in Fig. 3A represents one mouse in a set of littermate mice with the indicated age, sex, and genotype. Generating midbrain SNc sections at similar bregma positions across wild-type and mutant littermate brains for accurate IHC comparison proved challenging. Unanticipated technical issues limited the quantification of equivalent midbrain sections to 3 sets of littermate mice from each respective R438W or A451V mutant colony. The cell number reduction is more obvious in some mutants than others, but the effect is observed across all ages and gender, providing confidence that the phenotype is robust. In Fig. 2 we have included only mutant mice with clearly fewer brain cells than wild-type littermates. We have not performed comprehensive IHC analysis of brains from all the mice used for the rotarod assay in Fig. 3E, but predict that mutant mice have a spectrum of neural/glial cell deficits in one or more brain areas that adversely impacted the motor circuitry causing their impaired motor function.

    1. Author Response

      We thank the Editors and the Reviewers for their comments on the importance of our work “showing a new role of caveolin-1 as an individual protein instead of the main molecular component of caveolae” in building membrane rigidity and also for constructive and thoughtful remarks that shall allow to improve the manuscript.

      Indeed, we here establish the contributing role of caveolin-1 to membrane mechanics by a molecular mechanism that needs to be further addressed. To that respect, we thank the reviewers for suggesting avenues to improve the presentation and discussion of our hypotheses based on results of theoretical model and independent biophysical measurements in tube pulling from plasma membrane spheres, which concur to support the key role of caveolin-1 in building membrane rigidity.

      To fulfill the recommendations of the reviewers we will amend the manuscript as discussed below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Because of the role of membrane tension in the process, and that caveloae regulate membrane tension, the authors looked at the formation of TEMs in cells depleted of Caveolin1 and Cavin1 (PTRF): They found a higher propensity to form TEMs, spontaneously (a rare event) and after toxin treatment, in both Caveolin 1 and Cavin 1. They show that in both siRNA-Caveolin1 and siRNA-Cavin1 cells, the cytoplasm is thinner. They show that in siCaveolin1 only, the dynamics of opening are different, with notably much larger TEMs. From the dynamic model of opening, they predict that this should be due to a lower bending rigidity of the membrane. They measure the bending rigidity from Cell-generated Giant liposomes and find that the bending rigidity is reduced by approx. 50%.

      Strengths:

      They also nicely show that caveolin1 KO mice are more susceptible to death from infections with pathogens that create TEMs.

      Overall, the paper is well-conducted and nicely written. There are however a few details that should be addressed.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Morel et al. aims to identify some potential mechano-regulators of transendothelial cell macro-aperture (TEM). Guided by the recognized role of caveolar invaginations in buffering the membrane tension of cells, the authors focused on caveolin-1 and associated regulator PTRF. They report a comprehensive in vitro work based on siRNA knockdown and optical imaging approach complemented with an in vivo work on mice, a biophysical assay allowing measurement of the mechanical properties of membranes, and a theoretical analysis inspired by soft matter physics.

      Strengths:

      The authors should be complimented for this multi-faceted and rigorous work. The accumulation of pieces of evidence collected from each type of approach makes the conclusion drawn by the authors very convincing, regarding the new role of cavolin-1 as an individual protein instead of the main molecular component of caveolae. On a personal note, I was very impressed by the quality of STORM images (Fig. 2) which are very illuminating and useful, in particular for validating some hypotheses of the theoretical analysis.

      Weaknesses:

      While this work pins down the key role of caveolin-1, its mechanism remains to be further investigated. The hypotheses proposed by the authors in the discussions about the link between caveolin and lipids/cholesterol are very plausible though challenging. Even though we may feel slightly frustrated by the absence of data in this direction, the quality and merit of this paper remain.

      In the current study, we did not find the technical conditions allowing us to properly address the role of cholesterol in the dynamics of TEM due to adverse effects of cholesterol depletion with methyl-beta-cyclodextrin on the morphology of HUVEC. To answer the Reviewer remark, we will mention our attempts to address a role of cholesterol in the dynamics of TEM in the results section. Moreover, we will thoroughly discuss in the section related to data of tube pulling experiments from PMS that caveolin-1 by controlling membrane lipid composition, may indirectly affect membrane rigidity (see comments below about the presence or absence of caveolin-1 in the tubes pulled from PMS and our hypotheses about a direct or indirect role of caveolin-1 in the control of membrane rigidity).

      The analogy with dewetting processes drawn to derive the theoretical model is very attractive. However, although part of the model has already been published several times by the same group of authors, the definition of the effective membrane rigidity of a plasma membrane including the underlying actin cortex, was very vague and confusing.

      In the revised manuscript, we will clearly define the membrane bending rigidity parameter, which was missing in the current version. The membrane bending rigidity is defined as the energy required to locally bend the membrane surface. In a liposome, a rigorous derivation leads to a relationship between the membrane tension relation and the variation of the projected area, which are related by the bending rigidity: this relationship is known as the Helfrich law. This statistical physics approach is only rigorously valid for a liposome, whereas its application to a cell is questionable due to the presence of cytoskeletal forces acting on the membrane. Nevertheless, application of the Helfrich law to cell membranes may be granted on short time scales, before active cell tension regulation takes place (Sens P and Plastino J, 2015 J Phys Condens Matter), especially in cases where cytoskeletal forces play a modest role, such as red blood cells (Helfrich W 1973 Z Naturforsch C). The fact that the cytoskeletal structure and actomyosin contraction are significantly disrupted upon cell intoxication-driven inhibition of the small GTPase RhoA supports the applicability of Helfrich law to describe TEM opening. Because of the presence of proteins, carbohydrates, and the adhesion of the remaining actin meshwork after toxin treatment, we expect the Helfrich relationship to somewhat differ from the case of a pure lipidic membrane. We account for these effects via an “effective bending rigidity”, a term used in the detailed discussion of the model hypotheses, which corresponds to an effective value describing the relationship between membrane tension and projected area variation in our cells. These considerations will be included in the revised manuscript.

      Here, for the first time, thanks to the STORM analysis, the authors show that HUVECs intoxicated by ExoC3 exhibit a loose and defective cortex with a significantly increased mesh size. This argues in favor of the validity of Helfrich formalism in this context. Nonetheless, there remains a puzzle. Experimentally, several TEMs are visible within one cell. Theoretically, the authors consider a simultaneous opening of several pores and treat them in an additive manner. However, when one pore opens, the tension relaxes and should prevent the opening of subsequent pores. Yet, experimentally, as seen from the beautiful supplementary videos, several pores open one after the other. This would suggest that the tension is not homogeneous within an intoxicated cell or that equilibration times are long. One possibility is that some undegraded actin pieces of the actin cortex may form a barrier that somehow isolates one TEM from a neighboring one.

      As pointed by the Reviewer, we expect that membrane tension is neither a purely global nor a purely local parameter. Opening of a TEM will relax membrane tension over a certain distance, not over the whole cell. Moreover, once the TEM closes back, membrane tension will increase again. This spatial and temporal localization of membrane tension relaxation explains that the opening of a first TEM does not preclude the opening of a second one. On the other hand, membrane tension is not a purely local property. Indeed, we observe that when two TEMs enlarge next to each other, their shape becomes anisotropic, as their enlargement is mutually hampered in the region separating them. We account for this interaction by treating TEM membrane relaxation in an additive fashion. We emphasize that this simplified description is used to predict maximum TEM size, corresponding to the time at which TEM interaction is strongest. As the reviewer points out, it would be more questionable to use this additive treatment to predict the likelihood of nucleation of a new TEM, which is not done here.

      Could the authors look back at their STORM data and check whether intoxicated cells do not exhibit a bimodal population of mesh sizes and possibly provide a mapping of mesh size at the scale of a cell?

      To address the question raised by the Reviewer we decided to plot the whole distribution of mesh sizes in addition to the average value per cell. We did not observe a bimodal distribution but rather a very heterogeneous distribution of mesh size going up to a few microns square in all conditions of siRNA treatments. Moreover, we did not observe a specific pattern in the distribution of mesh size at the scale of the cell, with very large mesh sizes being surrounded by small ones. We also did not observe any specific pattern for the localization of TEM opening, as described in the paper, making the correlation between mesh size and TEM opening difficult.

      In particular, it is quite striking that while bending rigidity of the lipid membrane is expected to set the maximal size of the aperture, most TEMs are well delimited with actin rings before closing. Is it because the surrounding loose actin is pushed back by the rim of the aperture? Could the authors better explain why they do not consider actin as a player in TEM opening?

      Actin ring assembly and stiffening is indeed a player in TEM opening, and it is included in our differential equation describing TEM opening dynamics (second term on the left-hand side of Eq. 3). In some cases, actin ring assembly is the dominant player, such as in TEM opening after laser ablation (ex novo TEM opening), as we previously reported (Stefani et al. 2017 Nat comm). In contrast, here we investigate de novo TEM opening, for which we expect that bending rigidity can be estimated without accounting for actin assembly, as we previously reported (Gonzalez-Rodriguez et al. 2012 Phys Rev Lett). Such a bending rigidity estimate (Eq. 5) is obtained by considering two different time scales: the time scale of membrane tension relaxation, governed by bending rigidity, and the time scale of cable assembly, governed by actin dynamics. We expect the first-time scale to be shorter, and thus the maximum size of de novo TEMs to be mainly constrained by membrane tension relaxation. The discussion of these two different time scales will be added to the revised manuscript.

      Instead of delegating to the discussion the possible link between caveolin and lipids as a mechanism for the enhanced bending rigidity provided by caveolin-1, it could be of interest for the readership to insert the attempted (and failed) experiments in the result section. For instance, did the authors try treatment with methyl-beta-cyclodextrin that extracts cholesterol (and disrupts caveolar and clathrin pits) but supposedly keeps the majority of the pool of individual caveolins at the membrane?

      We will state in the results section that we could not find appropriate experimental conditions allowing us to deplete cholesterol with methyl-beta cyclodextrin without interfering with the shape of HUVECs, thereby preventing the proper analysis of TEM dynamics.

      Tether pulling experiments on Plasma membrane spheres (PMS) are real tours de force and the results are quite convincing: a clear difference in bending rigidity is observed in controlled and caveolin knock-out PMS. However, one recurrent concern in these tether-pulling experiments is to be sure that the membrane pulled in the tether has the same composition as the one in the PMS body. The presence of the highly curved neck may impede or slow down membrane proteins from reaching the tether by convective or diffusive motion. Could the authors propose an experiment to demonstrate that caveolin-1 proteins are not restricted to the body of the PMS and can access to the nanometric tether?

      As pointed out by the reviewer, a concern with tube pulling experiments is related to the dynamics of equilibration of membrane composition between the nanotube and the rest of the membrane. In our experiments, we have waited about 30 seconds after tube pulling and after changing membrane tension. We have checked that after this time, the force remained constant, implying that we have performed experiments of tube pulling from PMS in technical conditions of equilibrium that ensure that lipids and membrane proteins had enough time to reach the tether by convective or diffusive motion. We will add a representative example of force vs time plot in our revision. In principle, this could be further checked using cells expressing GFP-caveolin-1 to generate PMS as done in Sinha et al., 2011: a steady protein signal in the tube will further confirm the equilibration, provided that caveolin is recruited in the nanotube due to mechanical reasons. Indeed, since caveolin-1 is inserted in the cytosolic leaflet of the plasma membrane, when a nanotube is pulled towards the exterior of the cell as in our experiments, we can expect 2 situations depending on the ability of caveolin-1 to deform membranes, which is not clear, in particular after the paper of Porta et al, Sci. Adv., 2022. i) If caveolin-1 (Cav1) does not bend membranes, it could be recruited in the nanotubes, at a density similar to the PMS body. The tube force measurement in this case would reflect the bending rigidity of the PMS membrane. Then, Cav1 could stiffen membrane either as a stiff inclusion at high density or/and by affecting lipid composition, as suggested in our text. ii) If Cav1 bends the membrane (i.e. it has a non-zero spontaneous curvature), it should create a positive curvature considering the geometry of the caveolae, opposite to the curvature of the nanotubes that we pull, and thus be excluded of the nanotubes. In this case, the force would reflect the bending rigidity of the membrane depleted of Cav1 and should be the same in both types of experiments (WT and Cav1 depleted conditions) if the lipid composition remains unchanged upon Cav1 depletion. Our measurements suggest again that Cav1 depletion affects the plasma membrane composition, probably by reducing the quantity of sphingomyelin and cholesterol. Note that the presence of a very reduced concentration of Cav1 as compared to the plasma membrane has been reported in tunneling nanotubes (TNT) connecting two neighboring cells (A. Li et al., Front. Cell Dev. Biol., 2022). These TNTs have typical diameters of similar scale than diameters of tubes pulled from PMS. Some of us have addressed these specific questions related to Cav-1 spontaneous curvature and its effect on the lipid composition of the plasma membrane in two separate manuscripts (in preparation). They represent comprehensive studies by themselves that clarify these points. We propose to add this discussion in the manuscript, with perspectives on future studies, but stressing the point that the presence of Cav1 stiffens plasma membranes, and that the exact origin of this effect must be further investigated.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors characterize S. enterica WbaP biochemically and structurally. The enzyme catalyzes the initial step in O antigen biosynthesis by transferring a phospho-galactosyl unit from UDP-galactose to undecaprenyl-phosphate. This initial primer is then extended by other glycosyltransferases to form the O antigen repeat unit.

      To preserve the biologically functional unit of WbaP, the authors chose a 'detergent-free' purification method based on membrane extraction using SMALP polymers. The obtained material was characterized biochemically and by single-particle cryo-electron microscopy.

      Strengths:

      The authors were able to isolate WbaP in a catalytically active and oligomeric form and determined a low-resolution cryo-EM structure of the dimeric complex. Using a disulfide cross-linking approach and other biophysical methods, the authors validated an AlphaFold predicted WbaP model used to interpret the experimental cryo-EM map.

      Weaknesses:

      The rationale for using SMALP to extract WbaP from the membrane was to 'preserve' the native lipid bilayer surrounding the protein. However, the physical properties of the lipids co-purifying with the protein are unclear. The volume of the EM map assigned to the SMALP polymers suggests a more micellar character.

      Overall, the obtained cryo-EM map appears to be at fairly low resolution. Based on Figure 6, individual helices are not resolved, suggesting an overall resolution significantly below the stated 4.1 Å. Thus, the presented structure is the one of an AlphaFold WbaP model.

      I believe the UMP titration analysis could be improved. The authors assume that a 'domain of unknown function (DUF)' binds UMP and regulates the enzyme's activity. UMP, a reaction product of WbaP, may also inhibit the enzyme competitively. Therefore, deleting the DUF for the UMP inhibition studies could help with data interpretation.

      We appreciate the reviewer’s careful analysis of our manuscript, and their attention to detail regarding the structural data. In a revised version of this manuscript, we will modify the discussion section to include a brief section focused on the liponanoparticle itself, comparing to other experimental structures in SMALP. Investigating the lipid microenvironment in SMALPs around both Lg- and Sm-PGTs is of great interest to our group. We have published initial data related to PglC from Campylobacter, but a systematic analysis of co-purified lipids from the growing number of SMALP-solubilized PGTs is an exciting future direction for this project. Expression and analysis of truncated constructs containing the catalytic domain of Lg-PGTs (including WbaP) has been attempted in our laboratory, with no success. This limits our ability to decouple DUF-mediated modulation of activity from interactions in the catalytic domain. Efforts to address this challenge are underway but will be the focus of future publications. Regarding the overall resolution – for transparency - we will add a new figure that shows the local resolution throughout the experimental map.

      Reviewer #2 (Public Review):

      Summary:

      The authors focused on delivering a comprehensive structural characterization of WbaP, a membrane-bound phosphoglycosyl transferase from Salmonella that is instrumental in bacterial glycoconjugate synthesis. Notably, the authors employed SMALP-200, an amphipathic copolymer, to extract WbaP in the form of native lipid bilayer nanodiscs. They then determined its oligomerization state through cross-linking and procured higher-resolution structural data via cryo-electron microscopy (cryo-EM). While the authors successfully characterized WbaP in a native-like lipid bilayer setting, and their findings support this, the paper's claim of introducing a novel methodology is not robust. The real contribution of this work lies in the newfound insights about WbaP's structure.

      Strengths:

      The manuscript provides novel insights into WbaP's structure and oligomerization state, highlighting potentially significant interactions. The methodologies employed represent state-of-the-art practices in the field. Most of the drawn conclusions are well-supported by either experimental or computational data, with a few exceptions noted below.

      Weaknesses:

      • Organization: The manuscript's organization lacks clarity. The authors seem to describe their processes in the sequence they occurred rather than a logical flow, leading to potential confusion. For instance, the authors delve into a series of inconclusive experiments to determine the oligomerization state of WbaP, utilizing techniques like SEC, SEC-MALS, mass photometry, and mass spectrometry. They then transition to cryo-EM but subsequently return to address the oligomerization issue, which they conclusively resolve using cross-linking experiments. Following this, they shift their focus to interpreting and discussing the structural features obtained from the cryo-EM data.

      • Ambiguous and incorrect statements: There are instances of vague and at times inaccurate statements. Using more precise terminology like "native nanodiscs" or "lipid bilayer nanodiscs" would enhance clarity compared to the term "liponanoparticles." The claim on page 8 concerning the refractive index increment of SMA polymers needs rectification. The real reason why SEC-MALS cannot provide absolute particle masses in this case is that using two independent concentration detectors (typically, absorbance and refractive index), the decomposition of elution profiles is necessarily limited to two chemical species of a known molar or specific absorbance and refractive index. Thus, it is clear that nanodiscs containing a protein, a polymer, and a chemically undefined mixture of native lipids cannot be analyzed by this technique.

      • Overstating of technical aspects: The technical aspects seem overstated. While the extraction of membrane proteins into native lipid bilayer nanodiscs and their characterization by cross-linking and cryo-EM are standard (and were published before by the same authors in ref. 29), the authors appear to promote them as groundbreaking. The statement that this study presents a novel, universal strategy and toolkit for examining small membrane proteins within liponanoparticles seems overstated, especially given the previous existence of similar methods.

      We appreciate the reviewer’s careful consideration of the steps that were taken and how they were presented. However, we need to reinforce that although the initial biophysical experiments do not provide the exact oligomeric state of the WbaP, they provide important new data. Together these data support that the intact liponanoparticle is large enough to accommodate a higher order oligomerization state along with native lipids and stabilizing SMA polymer – this was not known at the outset and led to Fig 2D showing the first demonstration of dimer that was then validated via XLMS and disulfide crosslinking. The process was logical and essential to this work. We recognize the reviewer’s point on the SEC-MALs experiment and will adjust the text accordingly.

      We sought to distinguish the stabilization method used here from canonical MSP nanodiscs by using the term styrene maleic acid liponanoparticle (SMALP). The term SMALP is widely used in literature utilizing this technology, thus the use of other terms may lead to confusion.

      Our manuscript in PExpPur was focused on enabling expression of sufficient quality and quantity for sophisticated downstream biophysical applications – that MS was intended to be enabling to the greater membrane protein community and is highly recognized and appreciated in “its own right.” This work presents the first in class structure of the large monoPGTs. Further only a single structure of the PGT domain itself has been solved and appears as an experimental structure in the PDB (also from our group) addressing the enigmatic additional domains and potential physiological relevance. It is also noteworthy that the Lg-monoPGTs dominate the superfamily. This is also the first time that any protein in SMALP has been characterized using direct mass technology, which provided the most accurate mass determination of the intact liponanoparticle/protein complex.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a detailed analysis of a set of molecular dynamics computer simulations of several variants of a T-cell receptor (TCR) in isolation and bound to a Major Histocompatibility Complex with peptide (pMHC), with the aim of improving our understanding of the mechanism T cell activation in immunity. By analyzing simulations of peptide mutants and partially truncated TCRs, the authors find that native peptide agonists lead to a so-called catch-bond response, whereby tensile force applied in the direction of separation between TCR/pMHC appears to strengthen the TCR/pMHC interface, whereas mutated peptides exhibit the more common slip-bond response, in which applied force destabilizes the binding interface. Using various computational metrics and simulation statistics, the authors propose a model in which tensile force preferentially suppresses thermal fluctuations in the variable α domain of the TCR (vs the β domain) in a peptide-dependent manner, which orders and strengthens the binding interface by bringing together the complementarity-determining regions (CDRs) in the TCR variable chains, but only if the peptide is correctly matched to the TCR.

      R1-0. The study is detailed and written clearly, and conclusions appear convincing and are supported by the simulation data. However, the actual motions at the molecular or amino-acid level of how the catch-bond vs slip bond response originates remain somewhat unclear, and will probably warrant further investigations. Specific hypotheses that could be testable in experiments, such as predictions of which peptide (or TCR) mutations or which peptides could generate a catch-vs-slip response or activation, would have especially strengthened this study.

      Catch bonds have been observed in different αβ TCRs that differ in sequence when paired with their matching pMHC. Thus, there should be a general principle that apply irrespective of particular TCR sequences, as summarized in Fig. 8. The predictive capacity of this model in terms of understanding experiments is explained in our reply R0-3. Here, we discuss about designing specific point mutations to TCR that have not been studied previously. In our simulations, we can identify high-occupancy contacts that are present mainly in the high-load case as target for altering the catch bond behavior. An example is V7-G100 between the peptide and Vβ (Fig. 2C, bottom panel). The V7R mutant peptide is a modified agonist that we have already studied, where R7 forms hydrogen bonds and nonpolar contacts with residues other than βG100, albeit with lower occupancy (page 11, lines 280–282 and page 32, Fig. 5–figure supplement 2B). Instead of the V7R mutation to the peptide, mutating βG100 to other residues may lead to different effects. For example, compared to G100A, mutation to a bulkier residue such as G100F may cause opposing effects: It may induce steric mismatch that destabilizes the interface. Conversely, a stronger hydrophobic effect might increase the baseline bond lifetime. Also, mutating G100 to a polar residue may have even greater effect, leading to a slip bond or absence of measurable binding.

      As the reviewer suggested in R1-5, it will also be interesting to crosslink Vα and Cα by a disulfide bond to suppress its motion. Again, there are different possible outcomes. The lack of Vα-Cα motion could stabilize the interface with pMHC, resulting in a longer bond lifetime. Conversely, if the disulfide bond alters the V-C angle, it would have an opposite effect of destabilizing the interface by tilting it relative to the loading direction, similar to the dFG mutant in Appendix 1 (page 24).

      To make better predictions, simulations of such mutants should to be performed under different conditions and analyzed, which would be beyond the scope of the present study.

      Change made:

      • Page 14, Concluding Discussion, lines 395–402: We added a discussion about using simulations for designing and testing point mutants.

      Reviewer #2 (Public Review):

      In this work, Chang-Gonzalez and co-workers investigate the role of force in peptide recognition by T-cells using a model T-cell/peptide recognition complex. By applying forces through a harmonic restraint on distances, the authors probe the role of mechanical pulling on peptide binding specificity. They point to a role for force in distinguishing the different roles played by agonist and antagonist peptides for which the bound configuration is not clearly distinguishable. Overall, I would consider this work to be extensive and carefully done, and noteworthy for the number of mutant peptides and conditions probed. From the text, I’m not sure how specific these conclusions are to this particular complex, but I do not think this diminishes the specific studies.

      I have a couple of specific comments on the methodology and analysis that the authors could consider:

      R2-1. 1) It is not explained what is the origin of force on the peptide-MHC complex. Although I do know a bit about this, it’s not clear to me how the force ends up applied across the complex (e.g. is it directional in any way, on what subdomains/residues do we expect it to be applied), and is it constant or stochastic. I think it would be important to add some discussion of this and how it translates into the way the force is applied here (on terminal residues of the complex).

      As explained in our reply R0-1, force on the TCRαβ-pMHC complex arises during immune surveillance where the T-cell moves over APC. Generated by the cellular machinery such as actin retrograde flow and actomyosin motility, the applied force fluctuates, which would be on top of spontaneous fluctuation in force by thermal motion. This has been directly measured for the T-cell using a pMHC-coated bead via optical tweezers (see Feng et al., 2017, Fig. 1) and by DNA tension sensors (Liu, et al., 2016, Fig. 4; already cited in the manuscript). The direction of force also fluctuates that is longitudinal on average (see R1-6). How force distributes across the molecule is a great question, for which we plan to develop a computational method to quantify.

      Changes made.

      • Pages 3–4, newly added Results section ‘Applying loads to TCRαβ-pMHC complexes:’ We included the origin of force and its fluctuating nature, and the question of how loads are distributed across the molecule.

      • The reference (Feng et al., 2017) has been added in the above section.

      R2-2. 2) In terms of application of the force, I find the use of a harmonic restraint and then determining a distance at which the force has a certain value to be indirect and a bit unphysical. As just mentioned, since the origin of the force is not a harmonic trap, it would be more straightforward to apply a pulling force which has the form -F*d, which would correspond to a constant force (see for example comment articles 10.1021/acs.jpcb.1c10715,10.1021/acs.jpcb.1c06330). While application of a constant force will result in a new average distance, for small forces it does so in a way that does not change the variance of the distance whereas a harmonic force pollutes the variance (see e.g. 10.1021/ct300112v in a different context). A constant force could also shift the system into a different state not commensurate with the original distance, so by applying a harmonic trap, one could be keeping ones’ self from exploring this, which could be important, as in the case of certain catch bond mechanisms. While I certainly wouldn’t expect the authors to redo these extensive simulations, I think they could at least acknowledge this caveat, and they may be interested in considering a comparison of the two ways of applying a force in the future.

      Thanks for the suggestions and references. The paper by Stirnemann (2022) is a review including different computational methods of applying forces, mainly constant force and constant pulling velocity (steered molecular dynamics; SMD). The second one by Gomez et al., (2021) is a rather broad review of mechanosensing where discussion about computer simulation was mainly on SMD. In the third one by Pitera and Chodera (2012), potential limitations of using harmonic potentials in sampling nonlinear potential of mean force (PMF) are discussed.

      In the above references, loads or restraints are used to study conformational transitions or to sample the PMF, which are different from the use of positional restraints in our work. As explained in R0-1, positional restraint better mimics reality where the terminal ends of TCR and pMHC are anchored on the membranes of respective cells. Also, the concern raised by the reviewer about ruling out different states would be applicable to the case when there are multiple conformational states with local free energy minima at different extensions. Here, we are probing changes in the conformational dynamics (deformation and conformational fluctuation), rather than transitions between well-defined states.

      In Pitera and Chodera (2012) and also in other approaches such as umbrella sampling, the spring constant of the harmonic potential should be chosen sufficiently soft so that sampling around the neighborhood of the center of the potential can be made. On the other hand, if the harmonic potential is much stiffer than the local curvature of the PMF, although sampling may suffer, local gradient of the PMF, i.e, the force about the center of the potential, can be made. This has been studied earlier by one of us in Hwang (2007), which forms the basis for using a stiff harmonic potential for measuring the load on the TCRαβ-pMHC complex. The 1-kcal/(mol·˚A2) spring constant used in our study (page 17, line 540) was selected such that the thermally driven positional fluctuation is on the order of 0.8 ˚A. Hence, it is sufficiently stiff considering the much larger size of the TCRαβ-pMHC complex and the flexible added strands.

      Changes made:

      • Page 4, lines 117–119, newly added Results section ‘Applying loads to TCRαβ-pMHC complexes:’ The above explanation about the use of stiff harmonic restraint for measuring forces is added.

      • The 4 references mentioned above have been added to the above section.

      R2-3. 3) For the PCA analysis, I believe the authors learn separate PC vectors from different simulations and then take the dot product of those two vectors. Although this might be justified based on the simplified coordinate upon which the PCA is applied, in general, I am not a big fan of running PCA on separate data sets and then comparing the outputs, as the meaning seems opaque to me. To compare the biggest differences between many simulations, it would make more sense to me to perform PCA on all of the data combined, and see if there are certain combinations of quantities that distinguish the different simulations. Alternatively and probably better, one could perform linear discriminant analysis, which is appropriate in this case because one already knows that different simulations are in different states, and hence the LDA will directly give the linear coordinate that best distinguishes classes.

      As explained in R0-2, triads and BOC models are assigned to the same TCR across different simulations in identical ways. For the purpose of examining the relative Vα-Vβ and V-C motions, we believe comparing them across different simulations is a valid approach. When the motions are very distinct, it would be possible to combine all data and perform PCA or LDA to classify them. However, when behaviors differ subtly, analysis on the combined data may not capture individual behaviors. By analogy, consider two sets of 2-dimensional data obtained for the same system under different conditions. If each set forms an elliptical shape with the major axis differing slightly in direction, performing PCA separately on the two sets and comparing the angle between the major axes informs the difference between the two sets. If PCA were performed on the combined data (superposition of two ellipses forming an angle), it will be difficult to find the difference. LDA would likewise be difficult to apply without a very clear separation of behaviors.

      As also explained in R0-2, PCA is just one of multiple analyses we carried out to establish a coherent picture. The main use of PCA to this end was to compare directions of motion and relative amplitude of the motion among the subdomains.

      Changes made:

      • Page 6, lines 171–175 and page 8, lines 226–227: The rationale for applying PCA on triads and BOC models in different simulations are explained.

    1. Author Response

      Reviewer #1 (Public Review):

      In this exciting and well-written manuscript, Alvarez-Buylla and colleagues report a fascinating discovery of an alkaloid-binding protein in the plasma of poison frogs, which may help explain how these animals are able to sequester a diversity of alkaloids with different target sites. This work is a major advance in our knowledge of how poison frogs are able to sequester and even resist such a panoply of alkaloids. Their study also adds to our understanding of how toxic animals resist the effects of their own defenses. Although target site insensitivity and other mechanisms acting to prevent the binding of alkaloids to their targets (often ion channels) are well characterized now in poison frogs, less is known regarding how they regulate the movement of toxins throughout the animal and in blood in particular. In the fugu (pufferfish) a protein binds saxitoxin and tetrodotoxin and in some amphibians possibly the protein saxiphilin has been proposed to be a toxin sponge for saxitoxin. However, little is known about poison frogs in particular and if toxin-binding proteins are involved in their sequestration and auto-resistance mechanisms.

      The authors use a clever approach wherein a fluorescently labeled probe of a pumiliotoxin analog (an alkaloid toxin sequestered by some poison frogs) is able to be crosslinked to proteins to which it binds. The authors then use sophisticated mass spectroscopy to identify the proteins and find an outlier 'hit' that is a serpin protein. A competition assay, as well as mutagenesis studies, revealed that this ~50-60 kDa plasma protein is responsible for binding much of the pumiliotoxin and a few other alkaloids known to be sequestered in the in vivo assay, but not nicotine, an alkaloid not sequestered by these frogs.

      In general, their results are convincing, their methods and analyses robust and the writing excellent. Their findings represent a major breakthrough in the study of toxin sequestration in poison frogs. Below, a more detailed summary and both major and minor constructive comments are given on the nature of the discoveries and some ways that the manuscript could be improved.

      Many thanks for this positive summary of our work! We greatly appreciate your time and thoroughness in giving us feedback.

      Detailed Summary

      The authors functionally characterize a serine-protease inhibitor protein in Oophaga sylvatica frog plasma, which they name O. sylvatica alkaloid-binding globulin (OsABG), that can bind toxic alkaloids. They show that OsABG is the most highly expressed serpin in O. sylvatica liver and that its expression is higher than that of albumin, a major small molecule carrier in vertebrates. Using a toxin photoprobe combined with competitive protein binding assays, their data suggest that OsABG is able to bind specific poison frog toxins including the two most abundant alkaloids in O. sylvatica skin. Their in vitro isolation of toxin-bound OsABG shows that the protein binds most free pumiliotoxin in solution and suggests that OsABG may play an important role in its sequestration. The authors further show that mutations in the binding pocket of OsABG remove its ability to bind toxins and that the binding pocket is structurally similar to that of other vertebrate serpins.

      These results are an exciting advance in understanding how poison frogs, which make and use alkaloids as chemical defenses, prevent self-intoxication. The authors provide convincing evidence that OsABG can function as a toxin sponge in O. sylvatica which sets a compelling precedent for future work needed to test the role of OsABG in vivo.

      The study could be improved by shifting the focus to O. sylvatica specifically rather than the convergent evolution of sequestration among different dendrobatid species. The reason for this is that most of the results (aside from some of the photoprobe binding results presented in Fig. 1 and Fig. 4) and the proteomics identification of OsABG itself are based on O. sylvatica. It's unclear whether ABG proteins are major toxin sponges in D. tinctorius or E. tricolor since these frogs may contain different toxin cocktails. The competitive binding results suggest that putative ABG proteins in D. tinctorius and E. tricolor have reduced binding affinity at higher toxin concentrations than ABG proteins in O. sylvatica. Although molecular convergence in toxin sponges may be at play in the dendrobatid poison frogs, more work is needed in non-O. sylvatica species to determine the extent of convergence.

      We understand and appreciate you raising this concern. As is partially described in the “essential revisions” section above, we have been more cautious throughout the results and discussion to not describe the plasma binding in E. tricolor and D. tinctorius as definitively due to ABG proteins, and to shift the overall focus to O. sylvatica.

      Major constructive comments:

      Although the protein gels in Fig.1-2 show clearly the role of ABG, a ~50 kDa protein, it's unclear whether transferrin-like proteins, which are ~80 kDa, may also play a role because the gels show proteins between 39-64 kDa (Fig.1). The gel in Fig.2A is specific to one O. sylvatica and extends this range, but the gel does not appear to be labeled accordingly, making it unclear whether other larger proteins could have been detected in addition to ABG. Clarifying this issue would facilitate the interpretation of the results.

      Thank you for this suggestion, please see our response above in the “essential revisions” section.

      There is what seems to be a significant size difference between the O. sylvatica bands and bands from the other toxic frog species, namely D. tinctorius and E. tricolor. Could the photoprobe be binding to other non-ABG proteins of different sizes in different frog species? Given that O. sylvatica bands are bright and this species was the only one subject to proteomics quantification, a possible conclusion may be that the ABG toxin sponge is a lineage-specific adaptation of O. sylvatica rather than a common mechanism of toxin sequestration among multiple independent lineages of poison frogs. It would be helpful if the authors could address this observation of their binding data and the hypothesis flowing from that in the manuscript.

      Thank you for this suggestion, please see our response above in the “essential revisions” section.

      Figure 1B: The species names should be labeled alongside the images in the phylogeny. In addition, please include symbols indicating the number of times toxicity has evolved (for example, once in the ancestors of O. sylvatica and D. tinctorius frogs and once in the ancestors of E. tricolor frogs).

      These suggested changes have been added to Figure 1B. We were not able to fit the full species names into the figure, instead we added an abbreviated version that is spelled out completely in the figure caption.

      Figure 4B-C: Photoprobe binding results in the presence of epi and nicotine appear to be missing for D. tinctorius and those in the presence of PTX and nicotine are missing for D. tricolor. Adding these results would make for a more complete picture of alkaloid binding by ABG in non-O. sylvatica species.

      Thank you for this suggestion, please see our response above in the “essential revisions” section.

      Using recombinant proteins with mutations at residues forming the binding pocket of O. sylvatica ABG (as inferred from docking simulations), the authors found that all binding pocket mutations disrupted photoprobe binding completely in vitro (L221-222, Fig. 4E). However, there is no information presented on non-binding pocket mutations. Mutations outside of the binding pocket would presumably maintain photoprobe binding - barring any indirect structural changes that might disrupt binding pocket interactions with the photoprobe. This result is important for the conclusion that the binding pocket itself is the sole mediator of toxin interactions. The authors do show that one binding pocket mutation (D383A) results in some degree of photoprobe binding (Fig. 4E) but more detail on the mutations in the binding pocket per se being causal would be helpful.

      Thank you for this suggestion, please see our response above in the “essential revisions” section.

      Please include concentrations in the descriptions of gel lanes in the main figures. The relative concentrations of the photoprobe and other toxins (eg., PTX, DHQ, epi, and nic) are essential for interpreting the competitive binding images. For example, this was done in Fig. S1 (e.g., PB + 10x PTX).

      The photoprobe and competitor concentrations have been added beneath the gels in Figures 1, 4, and 6 as suggested. Additionally, in the crosslinking experiments involving purified protein the amount of protein per well has been added to the top of the TAMRA gel.

      For clarity, the section "OsABG sequesters free PTX in solution with high affinity" could be presented directly after the section titled "Proteomic analysis identifies an alkaloid-binding globulin". The former highlights in vitro experiments confirming the binding affinity of the ABG protein identified in the latter.

      While we see how this rearrangement might work, we think that the current order of figures creates a more compelling story and provides the evidence in a more intuitive manner. For instance, it is necessary to show that recombinant protein recapitulates the plasma photoprobe results and that binding pocket mutants disrupt photoprobe binding (Figure 4), prior to showing the direct binding assays with the recombinant wild type and mutant proteins. For this reason, we believe that this rearrangement might cause confusion, and are leaving it as is.

      Fig. 6E-F should be included as part of Fig. 1 or 2. Although complementary to the RNA sequencing data, these protein results are more closely related to the results in the first two figures which show the degree of competitive binding affinity of PB in the presence of different toxins. The expanded competitive binding results for total skin alkaloids and the two most abundant skin alkaloids from wild samples are most appropriate here.

      We understand the reasoning behind this, however we feel that including these results in Figure 6 is more appropriate and that moving it would disrupt the flow of the story. The identification of ABG and its binding activity happened before we fully understood the alkaloid profiles of wild-collected O. sylvatica, therefore we did not think to test additional alkaloids like histrionicotoxin and indolizidines till we saw that these were very abundant on the skin of field collected poison frogs. Furthermore, we would like to leave this section at the end because we feel it contributes important ecological relevance that we want to leave readers with.

    1. Author Response

      Reviewer #1 (Public Review):

      This work aims to evaluate the use of pressure insoles for measurements that are traditionally done using force platforms in the assessment of people with knee osteoarthritis and other arthropathies. This is vital for providing an affordable assessment that does not require a fully equipped gait lab as well as utilizing wearable technology for personalized healthcare.

      Towards these aims, the authors were able to demonstrate that individual subjects can be identified with high precision using raw sensor data from the insoles and a convolutional neural network model. The authors have done a great job creating the models and combining an already available public dataset of force platform signals and utilizing them for training models with transferable ability to be used with data from pressure insoles. However, there are a few concerns, regarding substantiating some of the goals that this manuscript is trying to achieve.

      In addressing these concerns, if the results are further corroborated using the suggestions provided to the authors, this provides an exciting tool for identifying an individual's gait patterns out of a cluster of data, which is extremely useful for providing identifiable labels for personalized healthcare using wearable technologies.

      Thank you for this enthusiasm for our work, and we hope that our responses are adequate to address what we can of these comments. Please note that we have made every effort to address comments that we can and appreciated the detailed feedback you provided.

      Reviewer #2 (Public Review):

      The authors aimed to investigate whether digital insoles are an appropriate alternative to laboratory assessment with force plates when attempting to identify the knee injury status. The methods are rigorous and appropriate in the context of this research area. The results are impressive, and the figures are exceptional. The findings of this study can have a great impact on the field, showing that digital insoles can be accurately used for clinical purposes. The authors successfully achieved their aims.

      We thank the reviewer for this enthusiasm and hope our edits adequately address the points the reviewer made to strengthen the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, the authors describe the development of a machine-learning model to be used for gait assessment using insole data. They first developed a machine learning model using an existing, large data set of ground reaction forces collected during walking with force plates in a lab, from healthy adults and a group of people with knee injuries. Subsequently, they tested this model on ground reaction forces derived from insoles worn by a group of 19 healthy adults and a group of n=44 people with knee osteoarthritis (OA). The model was able to accurately identify individuals belonging to the knee OA group or the healthy group using the ground reaction forces during walking. Note: I do not have expertise on machine learning and will therefore refrain from reviewing the ML methods that were applied in this paper.

      Strengths: The authors successfully externally validated the trained model for GRF on insole data. Insole data carries potentially rich information, including the path of the CoP during the stance phase. The additional value of insoles over force plates in itself is clear, as insoles can be used independently of laboratory facilities. Moreover, insoles provide information on the COP path, which can have added value over other mobile assessment methods such as inertial sensors.

      Limitations: The second ML model, using only insole data to identify knee arthropathy from healthy subjects, was trained on a small sample of subjects. Although I have no background in ML, I can imagine that external validation in an independent and larger sample is needed to support the current findings.

      Gait speed has a major influence on the majority of gait-related outcomes. Slow or more cautious gait, due to pain or other causes, is reflected in vertical GRF's with less pronounced peaks. A difference in gait speed between people with pain in their knee (due to injury) and healthy subjects can be expected. This raises the question of what the added value of a model to estimate vertical GRF is over a simpler output (e.g. gait speed itself). Moreover, the paper does not elucidate what the added value of machine learning is over a simpler statistical model.

      This is a good point, however, clinically we are interested in weight bearing and difference in pressure related metrics in this musculoskeletal group, which speed will simply not provide. So we are looking at additional metrics.

      There are numerous publications suggesting that non-speed related metrics are important to predict disease progression in a variety of conditions (e.g., D’Lima DD, Fregly BJ, Pail S, Steklov N, Colwell CW. Knee joint forces: prediction, measurement and significance. Proc Inst Mech Eng H. 2012:226:95–102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3324308/). In OA, the vector on ground force in medial knee OA (not vertical) creates torque and that is correlated with disease progression. We have modified the text throughout to address these points.

      In line with this issue, the current analyses are not strongly convincing me that the model described resulted in an identification of knee arthropathy-specific signature. Only knee arthropathy vs healthy (relatively young) subjects was compared, and we cannot rule out that this group only reflects general cautious, slow, or antalgic gait. As such, the data does not provide any evidence that the tool might be valuable to identify people with more or less severity of symptoms, or that the tool can be used to discriminate knee osteoarthritis from hip, or ankle osteoarthritis, or even to discriminate between people with musculoskeletal diseases and people with neurological gait disorders. This substantially limits the relevance for clinical (research) practice. In short, the output of the model seems to be restricted to "something is going on here", without further specification. Further development towards more specific aims using the insole data may substantially amplify clinical relevance.

      While no dataset (or model) is perfect, we feel that this is the first time that this model has been developed and applied in this cohort/clinical context, and of course acknowledge that future work is needed to further validate and examine how clinically meaningful this model is.

      We have broken out and added to a Study limitations section within the manuscript to reflect these caveats more clearly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Trebino et al. investigated the BRAF activation process by analysing the interactions of BRAF N-terminal regulatory regions (CRD, RBD, and BSR) with the C-terminal kinase domain and with the upstream regulators HRAS and KRAS. To this end, they generated four constructs comprising different combinations of N-terminal domains of BRAF and analysed their interaction with HRAS as well as conformational changes that occur. By HDX-MS they confirmed that the RBD is indeed the main mediator of interaction with HRAS. Moreover, they observed that HRAS binding leads to conformational changes exposing the BSR to the environment. Next, the authors used OpenSPR to determine the binding affinities of HRAS to the different BRAF constructs. While BSR+RBD, RBD+CRD, and RBD bound HRAS with nanomolar affinity, no binding was observed with the construct comprising all three domains. Based on these experiments, the authors concluded that BSR and CRD negatively regulate binding to HRAS and hypothesised that BSR may confer some RAS isoform specificity. They corroborated this notion by showing that KRAS bound to BRAF-NT1 (BSR+RBD+CRD) while HRAS did not. Next, the authors analysed the autoinhibitory interaction occurring between the N-terminal regions and the kinase domain. Through pulldown and OpenSPR experiments, they confirm that it is mainly the CRD that makes the necessary contacts with the kinase domain. In addition, they show that the BSR stabilizes these interactions and that the addition of HRAS abolishes them. Finally, the D594G mutation within the KD of BRAF is shown to destabilise these autoinhibitory interactions, which could explain its oncogenic potential.

      Overall, the in vitro study provides new insights into the regulation of BRAF and its interactions with HRAS and KRAS through a comprehensive in vitro analysis of the BRAF N-terminal region. Also, the authors report the first KD values for the N- and C-terminal interactions of BRAF and show that the BSR might provide isoform specificity towards KRAS. While these findings could be useful for the development of a new generation of inhibitors, the overall impact of the manuscript could probably be enhanced if the authors were to investigate in more detail how the BSR-mediated specificity of BRAF towards certain RAS isoforms is achieved. Moreover, though the very "clean" in vitro approach is appreciated, it also seems useful to examine whether the observed interactions and conformational changes occur in the full-length BRAF molecule and in more physiological contexts. Some of the results could be compared with studies including full-length constructs.

      Public Response: We would like to express our gratitude for your valuable feedback on our manuscript. Your insightful suggestions have significantly improved the quality and completeness of our research. In response to your comments, we have conducted additional experiments and incorporated new data into the revised manuscript.

      To gain a deeper understanding of how the BSR-mediated specificity of BRAF towards certain RAS isoforms is achieved, we performed HDX-MS to investigate the impact of KRAS interactions on the BSR. Our findings indicate that when KRAS is bound to BRAF NT2, there is no significant difference in hydrogen-deuterium exchange rates in the BSR compared to the apo-NT2 state (Figure 4). This observation contrasts with the effect of HRAS binding, where peptides from the BRAF-BSR exhibit an increased rate change, suggesting that HRAS induces a conformationally more dynamic state (Figure 2).

      Our results align with the conclusions of Terrell et al. in their 2019 publication, which propose that isoform preferences in the RAS-RAF interaction are driven by opposite charge attractions between BRAF-BSR and KRAS-HVR, promoting the interaction.1 Our data offers a potential mechanistic explanation, suggesting that HRAS disrupts the conformational stability of the BSR provided by the RBD, while KRAS-HVR restores stability and enhances interaction favorability. It is important to note that our results do not directly confirm a long-lasting interaction between the BRAF-BSR and KRAS-HVR, but they do not rule out the possibility of a transient, low-affinity interaction or close proximity between the two.

      Furthermore, our binding kinetics measurements conducted using OpenSPR support these findings. Particularly, in the case of NT1, when the CRD accompanies the BSR and RBD, no interactions with HRAS were observed. Additionally, we quantified the binding affinities between NT3:KRAS and NT4:KRAS, demonstrating that they are equally strong and that the presence of the BSR or CRD does not singularly affect the primary RBD interaction, consistent with HRAS. The BSR appears to exert an inhibitory effect on HRAS when the entire N-terminal region (BSR+RBD+CRD) is present. The BSR-mediated specificity is achieved through a coordinated interplay with the CRD.

      Moreover, we have addressed your concern regarding the physiological relevance of our conclusions. In response, we utilized active, full-length (FL) BRAF purified from HEK293F cells in OpenSPR experiments. Our findings indicate that FL-BRAF behaves similarly to BRAF-NT1, as it does not bind to HRAS but binds to KRAS with a deviation comparable to NT1. We have demonstrated that post-translational modifications or native intramolecular interactions do not alter our initial results. Several literature sources, employing cell systems or expressing proteins from insect or mammalian cells, further support the findings presented in our study.2–5

      Thank you once again for your constructive feedback, which has contributed significantly to the refinement of our work.

      For the author:

      Major points:

      1. Figure 1D: Negative control is missing.

      Response: We have incorporated the negative control into this figure as suggested.

      1. Figure 3F and G: negative controls (GST only) are missing.

      Response: We have incorporated the negative control into this figure as suggested.

      1. The authors demonstrate that BRAF NT1 (BSR+RBD+CRD) interacts with KRAS but not HRAS in SPR experiments (Figure 4). What about the conformational change that affects the positioning of BSR when NT2 (BSR+RBD) binds to HRAS (Figure 2)? Does it also occur with KRAS or not? When a rate change is observed between free protein and bound protein in HDX, particularly when this rate change results in a sigmoidal curve that closely parallels the reference curve, it signifies that all residues within the peptide share a uniform protection factor. This suggests that they collectively undergo conformational changes at the same rate, likely due to a concerted opening as a cohesive unit. In the context of our time plots, we observe this distinctive characteristic in the curves derived from the BSR peptides, indicating that HRAS binding perturbs this region, alters its flexibility, and induces a coordinated conformational shift. This compelling evidence strongly supports our assertion that HRAS instigates a reorientation of the BSR.

      Response: In response to the reviewer's comments, we conducted additional experiments to explore whether KRAS elicits any comparable alterations in the H-D exchange of the BSR within BRAF-NT2. Our findings indicate that KRAS does not induce a similar conformational change in the BSR. We have detailed these results in the Results section under the heading "BSR Differentiates the BRAF-KRAS Interaction from the BRAF-HRAS Interaction" and have included corresponding panels in Figure 4 to visually illustrate these observations.

      1. Related to point 3: The authors mention that the HVR domain is responsible for isoform-specific differences. Does the BSR interact with the HVR domain of KRAS (but not HRAS)?

      Response: It has been suggested by Terrell and colleagues1 that the BRAF-BSR and KRASHVR are directly responsible for the isoform specific interactions. We have no direct evidence confirming an interaction between the HVR and BSR. However, we deduce the possibility of such interaction based on previous research findings. Our HDX-MS experiments have demonstrated that the BRAF-BSR does not engage with HRAS. In our new HDX-MS experiments involving KRAS, we observed that the presence of KRAS does not lead to any discernible increase or decrease in the rate of deuterium exchange within the BRAF-BSR. It is important to emphasize that the absence of a rate change does not necessarily negate the occurrence of binding; rather, it might indicate a transient interaction with an affinity level below the detection threshold of HDX-MS.

      Given that the only major difference between H- and K-RAS isoforms is the HVR, we hypothesize that binding differences between BRAF and RAS isoforms can be attributed to the HVR. Notably, BRAF-NT3 resembles CRAF, which also behaves in line with the findings from Terrell et al. in which the BSR is not present to impact RAS-RAF association. We have updated some of the discussion section to include the new results and draw relevant conclusion.

      We mention in the text in the results section, “The HVR is an important region for regulating RAS isoform differences, like membrane anchoring, localization, RAS dimerization, and RAF interactions6… These results, combined with HDX-MS results, which showed that the BSR is exposed when bound to HRAS, suggest that the electrostatic forces surrounding the BSR promote BRAF autoinhibition and the specificity of RAF-RAS interactions.”

      We also write in the discussion, “However, BRET assays suggest that CRAF does not show preference for either H- or KRAS, while BRAF appears to prefer KRAS.1 This preference is suggested to result from the potential favorable interactions between the negatively charged BSR of BRAF and the positively charged, poly-lysine region of the HVR of KRAS1… Our binding data provide additional examples of isoform-specific activity. We speculate that diminished BRAF-NT1 binding to HRAS and increased BSR exposure upon HRAS binding may be due to electrostatic repulsion between HRAS and the BSR. Our full-length KRAS and its interaction with NT1 support the hypothesis that the BSR attenuates fast binding to HRAS but not to KRAS.”

      1. The authors might consider including NRAS in their study to give more weight to this interesting aspect.

      Response: While this suggestion is intriguing and could contribute to the expanding body of literature on RAS signaling, particularly in the context of NRAS-mutant tumors, we believe that delving into this topic would be beyond the scope of the present manuscript.

      1. Figure 6A: In this pulldown experiment the authors wish to demonstrate that binding of HRAS abolishes the autoinhibitory binding between NT1 and the kinase domain. However, the experimental design (i.e., pulldown of RAS) does not allow us to assess whether NT1 and KD are bound to each other in these conditions at all. The authors should rather pull down the KD and show that the interaction with NT1 is abolished when RAS is added.

      Response: We appreciate your suggestion. The experimental design for this study was intentionally structured to focus on the specific subset of NT1 that interacts with HRAS. The BRAF N-terminal region has the capacity to bind both HRAS and KD, resulting in two distinct populations within BRAF-NT1: NT1:KD and NT1:HRAS, although we believe the ratio between those two populations is not 1:1. If we were to design the experiment by isolating either the KD or NT1, it would lead to the observation of both populations simultaneously, making it challenging to distinguish between them. Our pulldown experiments are performed under the same conditions (i.e. all the proteins were maintained in a molar ratio of 1:1 and exposed to the same buffer components), and we rely on pulldown assays, such as those depicted in Figure 5, to clearly demonstrate the binding interactions between NT1 and KD.

      1. The authors have chosen a purely in vitro approach for their interaction studies, which initially makes sense for the addressed questions. However, since the BRAF constructs studied are only fragments and neither BRAF nor K/HRAS has any posttranslational modifications, the question arises to what extent the findings obtained hold up in vivo. Therefore, the manuscript would greatly benefit from monitoring the described interactions in full-length proteins and in cells or at least with proteins purified from cells.

      Response: Thank you for your valuable suggestion, which we take very seriously to enhance the quality of our manuscript. Upon carefully reviewing your comments, we conducted additional experiments involving full-length, wild-type BRAF (FL-BRAF) that was purified from mammalian cells, encompassing the post-translational modifications and scaffolding proteins such as 14-3-3 (Supplementary Fig 8A). We have incorporated the findings from these OpenSPR experiments into the revised manuscript within the Results Section titled "BSR Differentiates the BRAF-KRAS Interaction from the BRAFHRAS Interaction" and Figure 4. In summary, our results with FL-BRAF affirm the extension of our initial observations. Both NT1 and FL-BRAF interact with KRAS with comparable affinities, and neither NT1 nor FL-BRAF demonstrates an interaction with HRAS using OpenSPR. These results underscore that BRAF fragments accurately represent active, fully processed BRAF, lending support to our in vitro approach.

      Moreover, the conserved interactions we report in this manuscript are supported by literature. The interaction between RAF-RBD and RAS has been extensively documented, spanning investigations conducted in both insect and mammalian cell lines. For instance, Tran et al. (2021) utilized mammalian expression systems to explore the role of RBD in mediating BRAF activation through RAS interaction, identifying the same binding surfaces that we highlighted using HDX-MS.2 They quantified the KRAS-CRAF interaction yielding binding affinities in the low nanomolar range, similar to our findings for BRAF-NT:KRAS OpenSPR.2 In the manuscript text, we compared the binding affinity of BRAF residues 1245 purified from insect cells3 to our BRAF 1-227 (NT2 from E. coli), noting that the published value falls within the standard deviation of our experimental value. Additionally, our results align with the autoinhibited FL-BRAF:MEK:14-3-3 structure, which was expressed in Sf9 insect cells and reveals the central role of the CRD in maintaining autoinhibition through interactions with KD.4 In 2005, Tran and colleagues revealed specific domains within the BRAF N-terminal region are involved in binding to KD through Co-IP experiments conducted in mammalian cells.5

      While we are fully aware of the limitations of taking a purely in vitro approach to study the role of BRAF regulatory domains in RAS-RAF interactions and autoinhibition, as well as to quantify the affinity of these interactions, we emphasize that this approach enables us to dissect and examine the specific regions of RAF that are under investigation. As we write in the manuscript: “Our in vitro studies were conducted using proteins purified from E. coli, which lack the membrane, post-translational modifications, and regulatory, scaffolding, or chaperone proteins that are involved in BRAF regulation. Nonetheless, our study provides a direct characterization of the intra- and inter-molecular protein-protein interactions involved in BRAF regulation, without the complications that arise in cell-based assays.” We have added the following comment to clarify the advantages of our in vitro approach and the challenges associated with cell-based assays: “… without the complications and false-positives that can arise in cell-based assays, which often cannot distinguish between proximity and biochemical interactions.”

      Once again, we appreciate your insight feedback, which has contributed significantly to the improvement of our manuscript.

      Minor:

      1. Page 7, paragraph 2, line 6: It should probably read "BRAF autoinhibition" not "BRAF autoinhibitory".

      Response: Thank you for bringing this to our attention. We have fixed this typo.

      1. Figure 3G: In the first lane (time point 0 min) there is no input band for His/MBP-NT1. Probably a mistake when cropping the image from the original photo.

      Response: We sincerely appreciate your diligence in identifying cropping errors, and we have taken comprehensive measures to review the manuscript and correct any such errors. Regarding this specific figure, it is important to note that NT1 was not added at the "0" minute time point, which explains the absence of an input band at that stage. To avoid any confusion, we have revised the notation from "0" to "-" for clarity.

      Reviewer #2 (Public Review):

      In the manuscript entitled 'Unveiling the Domain-Specific and RAS Isoform-Specific Details of BRAF Regulation', the authors conduct a series of in vitro experiments using Nterminal and C-terminal BRAF fragments (SPR, HDX-MS, pull-down assays) to interrogate BRAF domain-specific autoinhibitory interactions and engagement by H- and KRAS GTPases. Of the three RAF isoforms, BRAF contains an extended N-terminal domain that has yet to be detected in X-ray and cryoEM reconstructions but has been proposed to interact with the KRAS hypervariable region. The investigators probe binding interactions between 4 N-terminal (NT) BRAF fragments (containing one more NT domain (BRS, RBD, and CRD)), with full-length bacterial expressed HRAS, KRAS as well as two BRAF C-terminal kinase fragments to tease out the underlying contribution of domainspecific binding events. They find, consistent with previous studies, that the BRAF BSR domain may negatively regulate RAS binding and propose that the presence of the BSR domain in BRAF provides an additional layer of autoinhibitory constraints that mediate BRAF activity in a RAS-isoform-specific manner. One of the fragments studied contains an oncogenic mutation in the kinase domain (BRAF-KDD594G). The investigators find that this mutant shows reduced interactions with an N-terminal regulatory fragment and postulate that this oncogenic BRAF mutant may promote BRAF activation by weakening autoinhibitory interactions between the N- and C-terminus.

      While this manuscript sheds light on B-RAF specific autoinhibitory interactions and the identification and partial characterization of an oncogenic kinase domain (KD) mutant, several concerns exist with the vitro binding studies as they are performed using taggedisolated bacterial expressed fragments, 'dimerized' RAS constructs, lack of relevant citations, controls, comparisons and data/error analysis. Detailed concerns are listed below.

      1. Bacterial-expressed truncated BRAF constructs are used to dissect the role of individual domains in BRAF autoinhibition. Concerns exist regarding the possibility that bacterial expression of isolated domains or regions of BRAF could miss important posttranslational modifications, intra-molecular interactions, or conformational changes that may occur in the context of the full-length protein in mammalian cells. This concern is not addressed in the manuscript.

      Response: Reviewer 1 raised a similar concern, and we have duplicated our response below for your reference:

      Thank you for your valuable suggestion, which we take very seriously to enhance the quality of our manuscript. Upon carefully reviewing your comments, we conducted additional experiments involving full-length, wild-type BRAF (FL-BRAF) that was purified from mammalian cells, encompassing the post-translational modifications and scaffolding proteins such as 14-3-3 (Supplementary Fig 8A). We have incorporated the findings from these OpenSPR experiments into the revised manuscript within the Results Section titled "BSR Differentiates the BRAF-KRAS Interaction from the BRAF-HRAS Interaction" and Figure 4. In summary, our results with FL-BRAF affirm the extension of our initial observations. Both NT1 and FL-BRAF interact with KRAS with comparable affinities, and neither NT1 nor FL-BRAF demonstrates an interaction with HRAS using OpenSPR. These results underscore that BRAF fragments accurately represent active, fully processed BRAF, lending support to our in vitro approach.

      Moreover, the conserved interactions we report in this manuscript are supported by literature. The interaction between RAF-RBD and RAS has been extensively documented, spanning investigations conducted in both insect and mammalian cell lines. For instance, Tran et al. (2021) utilized mammalian expression systems to explore the role of RBD in mediating BRAF activation through RAS interaction, identifying the same binding surfaces that we highlighted using HDX-MS.2 They quantified the KRAS-CRAF interaction yielding binding affinities in the low nanomolar range, similar to our findings for BRAF-NT:KRAS OpenSPR.2 In the manuscript text, we compared the binding affinity of BRAF residues 1245 purified from insect cells3 to our BRAF 1-227 (NT2 from E. coli), noting that the published value falls within the standard deviation of our experimental value. Additionally, our results align with the autoinhibited FL-BRAF:MEK:14-3-3 structure, which was expressed in Sf9 insect cells and reveals the central role of the CRD in maintaining autoinhibition through interactions with KD.4 In 2005, Tran and colleagues revealed specific domains within the BRAF N-terminal region are involved in binding to KD through Co-IP experiments conducted in mammalian cells.5

      While we are fully aware of the limitations of taking a purely in vitro approach to study the role of BRAF regulatory domains in RAS-RAF interactions and autoinhibition, as well as to quantify the affinity of these interactions, we emphasize that this approach enables us to dissect and examine the specific regions of RAF that are under investigation. As we write in the manuscript: “Our in vitro studies were conducted using proteins purified from E. coli, which lack the membrane, post-translational modifications, and regulatory, scaffolding, or chaperone proteins that are involved in BRAF regulation. Nonetheless, our study provides a direct characterization of the intra- and inter-molecular protein-protein interactions involved in BRAF regulation, without the complications that arise in cell-based assays.” We have added the following comment to clarify the advantages of our in vitro approach and the challenges associated with cell-based assays: “… without the complications and false-positives that can arise in cell-based assays, which often cannot distinguish between proximity and biochemical interactions.”

      Once again, we appreciate your insight feedback, which has contributed significantly to the improvement of our manuscript.

      1. The experiments employ BRAF NT constructs that retain an MBP tag and RAS proteins with a GST tag. Have the investigators conducted control experiments to verify that the tags do not induce or perturb native interactions?

      Response: Thank you for highlighting this important issue. We have conducted control experiments whenever feasible, particularly in cases where tags were not required for visualization, immobilization, or where cleave sites were present. We have subsequently included these control experiments in the supplementary figures and accompanying text within the manuscript.

      It is essential to note that many of the techniques employed in this manuscript rely on tags, such as immobilizing proteins onto NTA OpenSPR sensors and employing various resins/beads for pulldown assays. Utilizing tags for protein immobilization in OpenSPR applications offers distinct advantages, including homogeneous and site-specific immobilization of the protein, ensuring that binding sites remain accessible for the study of protein-protein interactions (PPIs) of interest. Furthermore, in all BRAF-RAS SPR experiments, the MBP protein serves as the reference channel "blocking" protein. This reference channel is instrumental in mitigating any potential false-positive signals resulting from binding interactions with the MBP protein. Any such signal is subsequently subtracted out during data analysis.

      To provide a comprehensive understanding of these aspects, we have incorporated these details into the manuscript text for clarity:

      “Maltose bind protein (MBP) is immobilized on the OpenSPR reference channel, which accounts for any non-specific binding or impacts to the native PPIs that may result from the presence of tags. Kinetic analysis is performed on the corrected binding curves, which subtracts any response in the reference channel.”

      We describe the control experiment to examine whether His/MBP-tag affects NT1 binding with BRAF-KD: “Similarly, we removed the His/MBP-tag from BRAF-NT1 through a TEV protease cleavage reaction and flowed over untagged NT1. Kinetic analysis confirmed that the interaction is preserved with the KD=13 nM (Supplemental Figure 6F).”

      We show that the GST-tag does not affect KRAS interactions with NTs in supplemental figure 6. We purified full-length, His/MBP-KRAS and subsequently removed the tag through TEV cleavage. BRAF-NT interactions are preserved with untagged KRAS. GST alone, also does not interact with BRAF-NTs. We updated the text in the results section “BSR differentiates the BRAF-KRAS interaction from the BRAF-HRAS interaction.”

      Additionally, Vojtek and colleagues used the same fusion-protein combinations (GSTRAS and MBP-RAF) in pulldown experiments and also found no perturbations from these tags.8

      1. The investigators state that the GST tag on the RAS constructs was used to promote RAS dimerization, as RAS dimerization is proposed to be key for RAF activation. However, recent findings argue against the role of RAS dimers in RAF dimerization and activation (Simanshu et al, Mol. Cell 2023). Moreover, while GST can dimerize, it is unclear whether this promotes RAS dimerization as suggested. In methods for the OpenSPR experiments probing NT BRAF:RAS interactions, it is stated that "monomeric KRAS was flowed...". This terminology is a bit confusing. How was the monomeric state of KRAS determined and what was the rationale behind the experiment? Is there a difference in binding interactions between "monomeric vs dimeric KRAS"?

      Response: Thank you for conducting such a comprehensive review of our manuscript and for identifying the mention of "monomeric KRAS" in the experimental section, which was inadvertently included and should not have been present. This terminology originally referred to a series of experiments involving "monomeric" KRAS that were initially considered for inclusion in the main body of the manuscript but were subsequently removed before submission. Furthermore, we adjusted the terminology to prevent any confusion or unwarranted implications.

      To clarify, this "monomeric" construct refers to the tagless, full-length KRAS variant that was confirmed to exist in a monomeric state through Size Exclusion Chromatography, eluting at a volume equivalent to 21 kDa. We have incorporated the findings from experiments involving this untagged KRAS variant into the supplementary figures to provide supporting evidence, particularly in response to comment #2, that the GST-tag does not interfere with native interactions. Supplementary Figure 1 illustrates that both GST-HRAS (45 kDa) and GST-KRAS (45 kDa) elute as dimers in solution, at approximately 90 kDa. It is important to note that the main text figures primarily feature the GST-tagged, "dimeric" RAS constructs. Our research results do not suggest any significant differences between "monomeric," untagged KRAS and "dimeric" GST-tagged KRAS, indicating that the binding kinetics between RAS and RAF are not influenced by oligomerization state (Supplementary Fig 6). To mitigate any potential confusion, we have made the necessary distinctions in the text and have revised the methods description to accurately reflect these aspects.

      While the recent findings summarized by Simanshu and colleagues were published concurrently with our manuscript submission, we would like to address this comment in the following manner. The authors assert that RAS does not engage in dimerization through the G domain, a hypothesis that contrasts with certain prior research findings. Instead, they propose that the plasma membrane plays a pivotal role in the clustering of RAS. Furthermore, the authors mention the involvement of RAS "dimerization" in RAF dimerization and activation in the subsequent statements:

      “Recruitment of two RAF proteins by RAS proteins in close proximity facilitate RAF activation but are not required for RAF dimerization.”

      “However, the PM recruitment of two RAF proteins by two non-dimerized but co- localized RAS proteins would serve equally well to promote RAF dimerization. Moreover, recent work on the activation cycle of RAF dimers (ref 20–23) argues strongly against a role for RAS dimers while revealing regulation by the 14-3-3 and SHOC2-MRAS- PP1C complexes. (Ref 24)”

      The primary focus of our study centers on elucidating the intricate details of the RAS-RAF interaction and the mechanisms underlying RAF autoinhibition, rather than emphasizing RAF dimerization as the sole pathway to RAF activation. It is important to recognize that RAF activation encompasses multiple steps, including RAS-mediated relief of RAF autoinhibition.

      To mimic physiological conditions as closely as possible, we employed a GST-tag on RAS in our experiments. It's worth noting that GST has a dimerization property,9 which brings RAS molecules into close proximity to one another, effectively emulating conditions akin to the plasma membrane. Our primary objective is not solely to facilitate interactions by bringing RAS into close proximity. Instead, our aim is to replicate cellular conditions to the greatest extent feasible, especially within the predominantly in vitro framework of our studies. Furthermore, we have revised the sentence pertaining to HRAS as follows: “As verified by size exclusion chromatography (Supplementary Fig 1A), the GST-tag dimerizes and forces HRAS into close proximity to recapitulate physiological conditions. (ref. 35)”

      1. The investigators determine binding affinities between GST-HRAS and NT BRAF domains (NT2 7.5 {plus minus} 3.5; NT3 22 {plus minus} 11 nM) by SPR, and propose that the BRS domain has an inhibitory role HRAS interactions with the RAF NT. However, it is unclear whether these differences are statistically meaningful given the error.

      Response: Thank you for bringing up this matter for further discussion. We are fully aware that these distinctions (NT2 and NT3), considering the overlapping error, lack statistical significance. Our conclusion points toward the most notable differences occurring when comparing NT1 to either NT2 or NT3, highlighting that the presence of the BSR has an inhibitory effect, particularly when the CRD is also present. It's important to note that we did not directly compare NT2 and NT3 to each other. Our comparison primarily elucidates that BSR without the CRD, and conversely, CRD without the BSR, do not exhibit the inhibitory effect. This collective evidence leads to the conclusion that all three domains collaboratively play a role in negatively regulating BRAF against HRAS.

      1. It is unclear why NT1 (BSR+RBD+CRD) was not included in the HDX experiments, which makes it challenging to directly compare and determine specific contributions of each domain in the presence of HRAS. Including NT1 in the experimental design could provide a more comprehensive understanding of the interplay between the domains and their respective roles in the HRAS-BRAF interaction. Further, excluding certain domains from the constructs, such as the BSR or CRD, may overlook potential domain-domain interactions and their influence on the conformational changes induced by HRAS binding.

      Response: We acknowledge that incorporating NT1 into the HDX experiments would have provided clearer insights into the specific contributions of each domain. Originally, it was our intention to include NT1 in these experiments. Unfortunately, we encountered challenges with the HDX experiments when it came to BRAF-NT1, as it yielded a significantly low sequence coverage after MS/MS analysis. We made multiple attempts to address this issue, which included additional protein purifications involving reducing agents, increasing the concentration of reaction buffer components, and extending the incubation time with reducing agents before injection. Despite these efforts, we were unable to obtain the desired sequence coverage for NT1. Consequently, we switched our approach to analyze NT2 and NT3 as the next best alternative.

      1. The authors perform pulldown experiments with BRAF constructs (NT1: BSR+RBD+CRD, NT2: BSR+RBD, NT3: RBD+CRD, NT4: RBD alone), in which biotinylated BRAF-KD was captured on streptavidin beads and probed for bound His/MBP-tagged BRAF NTs. Western blot results suggest that only NT1 and NT3 bind to the KD (Figure 5). However, performing a pulldown experiment with an additional construct, CRD alone, it would help to determine whether the CRD alone is sufficient for the interaction or if the presence of the RBD is required for higher affinity binding. This additional experiment would strengthen the authors' arguments and provide further insights into the mechanism of BRAF autoinhibition.

      Response: We are grateful for this valuable suggestion, and in response, we have taken the initiative to clone and purify a CRD-only construct (NT5) to strengthen our arguments. Subsequently, we conducted OpenSPR experiments to measure the binding affinity between NT5 and KD. Our findings clearly indicate that the CRD alone is not sufficient to mediate the autoinhibitory interactions and that the presence of the RBD is indeed necessary. These results have been incorporated into Figure 5 and are described within the Results Section for enhanced clarity and support.

      1. While the investigators state that their findings indicate that H- and KRAS differentially interact with BRAF, most of the experiments are focused on HRAS, with only a subset on KRAS. As SPR & pull-down experiments are only conducted on NT1 and NT2, evidence for RAS isoform-specific interactions is weak. It is unclear why parallel experiments were not conducted with KRAS using BRAF NT3 & NT4 constructs.

      Response: We sincerely appreciate your suggestion, which has contributed to enhancing the overall robustness of the evidence regarding isoform-specific differences between H- and K-RAS. In response, we performed additional experiments involving NT3 and NT4. The outcomes of these experiments have been integrated into Figure 4, and we have provided a comprehensive description of these results within the Results section “BSR differentiates the BRAF-KRAS interaction from the BRAF-HRAS interaction” of the manuscript.

      1. The investigators do not cite the AlphaFold prediction of full-length BRAF (AFP15056-F1) or the known X-ray structure of the BRAF BRS domain. Hence, it is unclear how Alpha-Fold is used to gain new structural information, and whether it was used to predict the structure of the N-terminal regulatory or the full-length protein.

      Response: We greatly appreciate the reviewer’s commitment to upholding good scientific practices and ensuring the inclusion of relevant citations in publications. In our original manuscript, we employed the UniProt ID P15056 to reference the specific AlphaFold structure used in our study. This was clarified as follows: "Since the full-length structure of BRAF is still unresolved, we applied the AlphaFold Protein Structure Database for a model of BRAF to display the conformation of the N-terminal domains and the HDX-MS results.40,41” Additionally, we referenced AlphaFold using the two citations recommended on their website (references 35 and 36 in the original manuscript). To prevent any potential confusion in the future, we have incorporated "AF-P15056-F1," as suggested.

      We are sorry for any misunderstanding that may have arisen regarding the use of AlphaFold for gaining new structural insights. Our sole intention was to utilize AlphaFold as a tool for modeling HDX, as a full-length structure of BRAF, encompassing the entire N-terminal domain, remains unavailable. We have taken steps to clarify our objectives in the manuscript to ensure the purpose of our AlphaFold utilization is unambiguous.

      Furthermore, we wish to emphasize that our utilization of AlphaFold was never intended to exclude the known X-ray structure of the BRAF-BSR domain. In our revised text, we have added clarity to our purposes and cited the Lavoie et al. Nature publication from 2018, which provides alignment between the X-ray structure and the AlphaFold model, thereby enhancing the confidence in the latter.

      1. In HDX-MS experiments, it is unclear how the authors determine whether small differences in deuterium uptake observed for some of the peptide fragments are statistically significant, and why for some of the labeling reaction times the investigators state " {plus minus} HRAS only" for only 3 time points?

      Response: First, in reference to the question about " ‘{plus minus} HRAS only’ for only 3 time points,” we write:

      “Both constructs were incubated with and without GMPPNP-HRAS in D2O buffer for set labeling reaction times (NT3: 2 sec [NT3 ± HRAS only], 6 sec [NT3 ± HRAS only], 20 sec, 30 sec [NT3 ± HRAS only], 60 sec, 5 min, 10 min, 30 min, 90 min, 4.5 h, 15 h, and 24 h)...”

      We realize how this can be confusing. To avoid such confusion, we fixed the text to read instead:<br /> “Both constructs were incubated with and without GMPPNP-HRAS in D2O buffer for set labeling reaction times (NT3: 2 sec, 6 sec, 20 sec, 30 sec, 60 sec, 5 min, 10 min, 30 min, 90 min, 4.5 h, 15 h, 45 h and 24 h at RT; NT2: 20 sec, 60 sec, 5 min, 10 min, 30 min, 90 min, 4.5 h, 15 h, and 24 h at RT)...”

      Next, with regard to assessing significance, we determine it by closely examining a consistent trend in smooth time course plots. To establish this trend, we rely on the presence of more than four overlapping peptides, each with multiple charge states, within a specific sequence range. When we observe multiple peptides showing even a small difference in rate exchange, we can confidently infer that structural changes have taken place. This confidence stems from the inherent reliability and redundancy in the data analysis approach we have employed.11,12 It is noteworthy that our focus is primarily on reporting the binding or no binding, rather than quantifying the magnitude of exchange. As such, conducting multiple replicates or statistical testing is not deemed necessary.13,14 This is true for multiple reasons:

      1) Instead of small deuterium changes (y-axis), we are focusing on the x-axis changes, which provides a slowing factor and how much that H-D exchange rate has changed.

      • In a publication investigating the ideal HDX-MS data set, the author explains, “with the availability of high resolution HDX-MS raw data, it may be the time to shift the data analysis paradigm from determination of centroid values and presentation of deuteration levels to deconvolution of isotope envelopes and presentation of exchange rates.” 15

      • Presentation of data through rate changes provides a physical chemistry measurement, as opposed to a relative measurement with percent deuteration. For example, slowing with a factor of 10 equates to the energy in 1 kCal. By quick visual estimation, we see a slowing factor of about 2 when RAS is bound to the BRAF-RBD.

      • We made some changes to the text to clear up any confusion about measuring D uptake vs rate.

      2) Looking at sigmoidal curves only—the “smooth time course” shows that the timedependent deuterium changes are not random, artifacts, or false positives/negatives. When parallel sigmoidal curves are present, any x-axis change is a measure of H-D exchange. Only plots with a smooth time course are used to make conclusions about BRAF’s conformational changes or binding interfaces.

      3) Wide time range- the extended time also confirms that any observed difference is reliable and accurate. This extended time frame provides coverage for deuteration levels from 0 to 100% for peptides. A smooth time course is present in complete coverage.

      • A narrow time window is a common flaw in HDX-MS studies14,15

      4) The rate change is observed at multiple time points (at least 4 for each peptide), which are all independent reactions, and show reproducibility of change

      5) Many overlapping peptides show the same pattern- the exchange rate difference is observed in at least 4 peptide time plots without contradictory evidence within the sequence range.

      • We included the complete set of peptide time plots in the supplemental materials.

      6) The many other peptide time plots that do not show any difference with and without RAS is a form of reproducibility, that no difference means no difference.

      1. The investigators find that KRAS binds NT1 in SPR experiments, whereas HRAS does not. However, the pull-down assays show NT1 binding to both KRAS and HRAS. SI Fig 5 attributes this to slow association, yet both SPR (on/off rates) and equilibrium binding measurements are conducted. This data should be able to 'tease' out differences in association.

      Response: Thank you for bringing up this important point. It's crucial to note that the experiments conducted at slow flow rates generated low responses, making it challenging to perform kinetic analyses effectively. Consequently, we are unable to provide accurate equilibrium binding measurements (on/off rates) for NT1 and HRAS. Regrettably, comparing the association rates between KRAS and HRAS is not feasible due to the differing flow rates employed. We have addressed this limitation in the manuscript as follows:

      “We therefore immobilized NT1 and flowed over HRAS at a much slower flow rate (5 µL/min), during which we saw minimal but consistent binding (Supplementary Fig 5A). The low response and long timeframe of each injection, however, makes the dissociation constant (KD) unmeasurable and incomparable to our other NT-HRAS OpenSPR results.”

      1. The model in Figure 7B highlights BSR interactions with KRAS, however, BSR interactions with the KRAS HVR (proximal to the membrane) are not shown, as supported by Terrell et al. (2019).

      Response: Thank you for the suggestion. We reoriented the BSR closer to HVR of KRAS rather than G-domain.

      1. The investigators state that 'These findings demonstrate that HRAS binding to BRAF directly relieves BRAF autoinhibition by disrupting the NT1-KD interaction, providing the first in vitro evidence of RAS-mediated relief of RAF autoinhibition, the central dogma of RAS-RAF regulation. However, in Tran et al (2005) JBC, they report pulldown experiments using N-and C-terminal fragments of BRAF and state that 'BRAF also contains an N-terminal autoinhibitory domain and that the interaction of this domain with the catalytic domain was inhibited by binding to active HRAS'. This reference is not cited.

      Response: We appreciate the concern raised regarding our statement. We want to clarify that it was never our intention to disregard this JBC publication, and we apologize for any misunderstanding caused by our phrasing. We recognize that our initial statement was contentious, and we have removed the word "first" from the phrase "first in vitro evidence." In the section of the discussion where we originally cited the Tran et al. (2005) publication, we have revised the language to eliminate "first" and have rephrased the sentence, as provided below:

      “Our in vitro binding studies align with previous implications that RAS relieves RAF autoinhibition shown through cell-based coIP’s.5”

      1. In Fig 2, panels A and C, it is unclear what the grey dotted line in is each plot.

      Response: Thank you for drawing our attention to the additional explanation needed here. The gray dotted lines represent the maximum deuterium exchange. We added the following description to the figure 2 legend:

      “Gray dotted lines represent the theoretical exchange behavior for specified peptide that is fully unstructured (top) or for specified peptide with a uniform protection factor (fraction of time the residue is involved in protecting the H-bond) of 100 (lower).”

      1. In Fig 3, error analysis is not provided for panel E.

      Response: We added the standard deviation values to this panel. We additionally added these for Fig 4C and Fig 5B.

      1. How was RAS GMPPNP loading verified?

      Response: Ras loading is a well-established protocol with a solid foundation in the literature.16– 21 We followed this accepted method for nucleotide exchange. Our controls, as evident in pulldown and OpenSPR experiments (fig 1C, 4E), unequivocally demonstrate that GMPPNPloaded RAS is active, while unloaded RAS is inactive, as evidenced by the absence of no binding. We also added supplemental figure 6E to show that inactive (unloaded) GST-KRAS does not bind to BRAF during OpenSPR analysis. To exemplify this, we included binding curves of 1 µM GST-KRAS- GMPPNP and -GDP flowed over NTA-immobilized BRAF-NT2 at a flow rate of 30 µl/min.

      References

      (1) Terrell, E. M.; Durrant, D. E.; Ritt, D. A.; Sealover, N. E.; Sheffels, E.; Spencer-Smith, R.; Esposito, D.; Zhou, Y.; Hancock, J. F.; Kortum, R. L.; Morrison, D. K. Distinct Binding Preferences between Ras and Raf Family Members and the Impact on Oncogenic Ras Signaling. Mol. Cell 2019, 76 (6), 872-884.e5. https://doi.org/10.1016/j.molcel.2019.09.004.

      (2) Tran, T. H.; Chan, A. H.; Young, L. C.; Bindu, L.; Neale, C.; Messing, S.; Dharmaiah, S.; Taylor, T.; Denson, J. P.; Esposito, D.; Nissley, D. V.; Stephen, A. G.; McCormick, F.; Simanshu, D. K. KRAS Interaction with RAF1 RAS-Binding Domain and Cysteine-Rich Domain Provides Insights into RAS-Mediated RAF Activation. Nat. Commun. 2021, 12 (1176), 1–16. https://doi.org/10.1038/s41467-021-21422-x.

      (3) Fischer, A.; Hekman, M.; Kuhlmann, J.; Rubio, I.; Wiese, S.; Rapp, U. R. B- and C-RAF Display Essential Differences in Their Binding to Ras: The Isotype-Specific N Terminus of B-RAF Facilitates Ras Binding. J. Biol. Chem. 2007, 282 (36), 26503–26516. https://doi.org/10.1074/jbc.M607458200.

      (4) Park, E.; Rawson, S.; Li, K.; Kim, B. W.; Ficarro, S. B.; Pino, G. G. Del; Sharif, H.; Marto, J. A.; Jeon, H.; Eck, M. J. Architecture of Autoinhibited and Active BRAF–MEK1–14-3-3 Complexes. Nature 2019, 575 (7783), 545–550. https://doi.org/10.1038/s41586-0191660-y.

      (5) Tran, N. H.; Wu, X.; Frost, J. A. B-Raf and Raf-1 Are Regulated by Distinct Autoregulatory Mechanisms. J. Biol. Chem. 2005, 280 (16), 16244–16253. https://doi.org/10.1074/jbc.M501185200.

      (6) Prior, I. A.; Hancock, J. F. Ras Trafficking, Localization and Compartmentalized Signalling. Semin. Cell Dev. Biol. 2012, 23 (2), 145–153.

      (7) Herrmann, C.; Martin, G. A.; Wittinghofer, A. Quantitative Analysis of the Complex between P21 and the Ras-Binding Domain of the Human Raf-1 Protein Kinase. J. Biol. Chem. 1995, 270 (7), 2901–2905. https://doi.org/10.1074/jbc.270.7.2901.

      (8) Vojtek, A. B.; Hollenberg, S. M.; Cooper, J. A. Mammalian Ras Interacts Directly with the Serine/Threonine Kinase Raf. Cell 1993, 74 (1), 205–214. https://doi.org/10.1016/00928674(93)90307-C.

      (9) Parker, M. W.; Bello, M. Lo; Federici, G. Crystallization of Glutathione S-Transferase from Human Placenta. J. Mol. Biol. 1990, 213 (2), 221–222. https://doi.org/10.1016/S00222836(05)80183-4.

      (10) Inouye, K.; Mizutani, S.; Koide, H.; Kaziro, Y. Formation of the Ras Dimer Is Essential for Raf-1 Activation. J. Biol. Chem. 2000, 275 (6), 3737–3740. https://doi.org/10.1074/JBC.275.6.3737.

      (11) Z. Y. Kan, X. Ye, J. J. Skinner, L. Mayne, S. W. E. ExMS2: An Integrated Solution for Hydrogen-Deuterium Exchange Mass Spectrometry Data Analysis. Anal Chem 2019, 91 (11), 7474–7481.

      (12) Mayne, L.; Kan, Z. Y.; Sevugan Chetty, P.; Ricciuti, A.; Walters, B. T.; Englander, S. W. Many Overlapping Peptides for Protein Hydrogen Exchange Experiments by the Fragment Separation-Mass Spectrometry Method. J. Am. Soc. Mass Spectrom. 2011, 22 (11), 1898–1905. https://doi.org/10.1007/S13361-011-0235-4.

      (13) Ye, X.; Lin, J.; Mayne, L.; Shorter, J.; Englander, S. W. Hydrogen Exchange Reveals Hsp104 Architecture, Structural Dynamics, and Energetics in Physiological Solution. Proc. Natl. Acad. Sci. 2019, 116 (15), 7333–7342. https://doi.org/10.1073/pnas.1816184116.

      (14) Ye, X.; Lin, J.; Mayne, L.; Shorter, J.; Englander, S. W. Structural and Kinetic Basis for the Regulation and Potentiation of Hsp104 Function. Proc. Natl. Acad. Sci. 2020, 117 (17), 9384–9392. https://doi.org/10.1073/pnas.1921968117.

      (15) Hamuro, Y. Determination of Equine Cytochrome c Backbone Amide Hydrogen/Deuterium Exchange Rates by Mass Spectrometry Using a Wider Time Window and Isotope Envelope. J. Am. Soc. Mass Spectrom. 2017, 28 (3), 486–497. https://doi.org/10.1007/s13361-016-1571-1.

      (16) Herrmann, C.; Horn, G.; Spaargaren, M.; Wittinghofer, A. Differential Interaction of the Ras Family GTP-Binding Proteins H-Ras, Rap1A, and R-Ras with the Putative Effector Molecules Raf Kinase and Ral-Guanine Nucleotide Exchange Factor. J. Biol. Chem. 1996, 271 (12), 6794–6800. https://doi.org/10.1074/jbc.271.12.6794.

      (17) Miller, A. F.; Halkides, C. J.; Redfield, A. G. An NMR Comparison of the Changes Produced by Different Guanosine 5’-Triphosphate Analogs in Wild-Type and Oncogenic Mutant P21ras. Biochemistry 1993, 32 (29), 7367–7376. https://doi.org/10.1021/bi00080a006.

      (18) Amendola, C. R.; Mahaffey, J. P.; Parker, S. J.; Ahearn, I. M.; Chen, W. C.; Zhou, M.; Court, H.; Shi, J.; Mendoza, S. L.; Morten, M. J.; Rothenberg, E.; Gottlieb, E.; Wadghiri, Y. Z.; Possemato, R.; Hubbard, S. R.; Balmain, A.; Kimmelman, A. C.; Philips, M. R. KRAS4A Directly Regulates Hexokinase 1. Nature 2019. https://doi.org/10.1038/s41586019-1832-9.

      (19) John, J.; Sohmen, R.; Feuerstein, J.; Linke, R.; Wittinghofer, A.; Goody, R. S. Kinetics of Interaction of Nucleotides with Nucleotide-Free H-Ras P21. Biochemistry 1990, 29 (25), 6058–6065. https://doi.org/10.1021/bi00477a025.

      (20) Dharmaiah, S.; Tran, T. H.; Messing, S.; Agamasu, C.; Gillette, W. K.; Yan, W.; Waybright, T.; Alexander, P.; Esposito, D.; Nissley, D. V.; McCormick, F.; Stephen, A. G.; Simanshu, D. K. Structures of N-Terminally Processed KRAS Provide Insight into the Role of N-Acetylation. Sci. Reports 2019 91 2019, 9 (1), 1–15. https://doi.org/10.1038/s41598-019-46846-w.

      (21) Rathinaswamy, M. K.; Gaieb, Z.; Fleming, K. D.; Borsari, C.; Harris, N. J.; Moeller, B. E.; Wymann, M. P.; Amaro, R. E.; Burke, J. E. Disease-Related Mutations in PI3Kγ Disrupt Regulatory C-Terminal Dynamics and Reveal a Path to Selective Inhibitors. Elife 2021, 10. https://doi.org/10.7554/eLife.64691.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you again to the reviewers and editors for all constructive feedback. We have made several edits to the manuscript and data to address concerns raised during the initial review and strengthen the completeness of this study. Please find below our response to each, with referee comments in black and our responses in blue.

      eLIFE Assessment:

      The authors report that Dbp5 functions in parallel with Los1 in tRNA export, in a manner dependent on Gle1 and requiring the ATPase cycle of Dbp5, but independent of Mex67, Dbp5's partner in mRNA export. The evidence for this conclusion is still incomplete, as is the biochemical evidence that Dbp5 interacts directly with tRNA in vitro with Gle1 and co-factor InsP6 triggering Dbp5 ATPase activity in the Dbp5-tRNA complex. The evidence that Dbp5 interacts with tRNA in cells independently of Los1, Msn5 and Mex67 is, however, solid.”

      Thank you for the constructive feedback and assessment of our article. We have made several improvements to the quality of data (Figure 1E, Figure 3C, Figure 4), added additional tRNA Northern Blot/FISH targets to further generalize observed phenotypes beyond pre-tRNAIleUAU (Supplement 1C/D/E/F), provided growth assays for los1Δ/msn5 Δ/dbp5R423A (Supplement 1B), add added data showing gle1-4/los1Δ double mutants phenocopy los1Δ/dbp5R423A to further support the involvement of Gle1 and the Dbp5 ATPase cycle in tRNA export (Figure 5D).

      Additionally, we added quantification to assess the extent of overexpression of Dbp5 mutants in Figure 3 and a discussion of how these mutants alter the localization of the protein to better assess how they may impact tRNA export (lines 211-226). Furthermore, several minor edits to the text/figures have been made to remove typos and improve readability (e.g., labels of FISH/Northern data in Figure 1). Additional edits include adjusting the text and the model presented in Figure 6 to improve conclusions drawn from our data. This includes lines 106-107 and lines 366-371 which clarifies that the Dbp5 mediated tRNA export pathway may not be entirely independent of Mex67.

      Reviewer #1 (Public Review):

      "At least one result suggests that the idea of these pathways in parallel may be too simplistic as deletion of the LOS1 gene, which is not essential decreases the interaction of tRNA export substrate with Dbp5 (Figure 2A). If the two pathways were working in parallel, one might have expected removing one pathway to lead to an increase in the use of the other pathway and hence the interaction with a receptor in that pathway…. The obvious missing experiment here with respect to genetics is the test of whether deletion of the MSN5 gene in the cells, which combines deletion of LOS1 and the dbp5_R423A allele, shown in Figure 1D would be lethal…. The authors provide evidence of a model where the helicase Dbp5 plays a role in tRNA export from the nucleus. Further evidence is required to determine whether Dbp5 could function in the same pathway as the previously defined tRNA export receptors, Los1 and Msn5. There are genetic tests that could be performed to explore this question. Some of the biochemistry presented would show when Los1 is absent that the interaction of Dbp5 with tRNA decreases, which could support a model where Dbp5 plays a role in coordination with Los1”

      Author Response: We thank the reviewers for this suggestion and consideration. We have added data showing growth phenotypes for the los1Δ/msn5Δ/dbp5R423A triple mutants. We discuss possible explanations and alternative hypothesis for why these triple mutants are viable and the observed reduction in Dbp5-pre-tRNA interaction in the context of los1Δ (lines 128131; lines 172-174).

      Reviewer #1 (Public Review):

      “While some of the binding assays show rather modest band shifts (Figure 4B for example), the data in Figure 4A showing that there is no binding detected unless a non-hydrolyzable ATP analogue is employed, argues for specificity in nucleic acid binding. The question that does arise is whether the binding is specific for tRNA.”

      Author Response: We have adjusted brightness/contrast of the EMSAs in Figure 4 to allow for better visualization of band shifts. Additionally, a discussion of the specificity of Dbp5-nucleic acid binding and the observed tRNA binding has been added (lines 313-322)

      Reviewer #1 (Public Review):

      “With the exception of the binding studies, which also employ a mixture of yeast tRNAs, this study relies primarily on a single tRNA species to come to the conclusions drawn. Many other studies have used multiple tRNAs to explore whether pathways characterized are generalizable to other tRNAs.“

      Author Response: We have added additional tRNA targets for FISH/Northerns in Supplement 1C/D/E/F)

      Reviewer #2 (Public Review):

      “There are some pieces of data that are misinterpreted. (Figure 1A and B look the same; in Fig 1E, the DAPI staining is abnormal; in Fig 4 the bands can't be seen.)”

      Author Response: Thank you for your constructive feedback. We have replaced FISH images to improve DAPI staining (Figure 1E), adjusted EMSAs to allow for better visualization of band shifts. (Figure 4), improved Northern Blots for quality (Figure 3C), and rearranged Figure 1A/B for readability. We maintain that the results from Figure 1A/B are not misinterpreted but agree that the readability of the figure was poor and have adjusted labels/formatting accordingly. The results of these experiments show that the deletion of Los1 does not alter Dbp5 localization and conversely loss of Dbp5 does not alter Los1 localization. As such the localization patterns under loss-of-function conditions look the same as wild-type for each protein respectively.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their service and are pleased to see that they were positive about the overall study. The reviewers provided several very good suggestions that we feel have improved the revised manuscript. In response to their suggestions, we have added four new figures of additional data (Figure 1, Supplement 2; Figure 2, Supplement 2; Figure 3, Supplements 1 and 2) in this revision. We have addressed the specific review comments/suggestions point-by-point below. Text changes in the manuscript are indicated in red with line numbers indicated.

      Public Reviews:

      Reviewer #1 (Public Review):

      This important study from Jahncke et al. demonstrates inhibitory synaptic defects and elevated seizure susceptibility in multiple models of dystroglycanopathy. A strength of the paper is the use of a wide range of genetic models to disrupt different aspects of dystroglycan protein or glycosylation in forebrain neurons. The authors use a combination of immunohistochemistry and electrophysiology to identify cellular migration, lamination, axonal targeting, synapse formation/function, and seizure phenotypes in forebrain neurons. This is an elegant study with extensive data supporting the conclusions. The role of dystroglycan and the dystrophin glycoprotein complex (DGC) in cellular migration and synapse formation are of broad interest.

      • A strength of this paper is the use of several transgenic mouse lines with mutations in genes involved in glycosylation of dystroglycan. Knockout of POMT2 abolishes the majority of dystroglycan glycosylation, while point mutations in B4GAT and FKRP presumably produce more minor changes in glycosylation. This is a powerful approach to inves5gate the role of glycosylation in dystroglycan function. However, the authors do not address how mutations in these genes may affect glycosylation or expression of proteins other than dystroglycan. It is possible, even likely, that some of the phenotypes observed are due to changing glycosylation in any number of other proteins. The paper would be strengthened by addressing this possibility more directly.

      We are glad to see that the reviewer appreciated the range of transgenic models used to define the role of Dag1 glycosylation. It is certainly possible that glycosylation of proteins other than Dag1 is affected by deletion of Pomt2, B4Gat1 and/or FKRP. Indeed, Cadherin and Plexin proteins undergo Omannosylation in the brain. However, recent work has shown that these proteins are not dependent on Pomt1/2 for their O-mannosylation, and use an alternative glycosylation pathway. Therefore, they unlikely to contribute to the phenotypes we observed in our Pomt2, B4Gat1 and/or FKRP mutants. Furthermore, we did not observe any phenotypes in these models that was not also observed in the Dag1 conditional knockouts. We have clarified this point in the results section (lines 117-121) with additional references, and added the caveat that Pomt2, B4gat1, and Fkrp could play a role in the glycosylation of proteins other than Dag1.

      • It would be helpful to have a more clear description of how dystroglycan glycosylation is altered in B4GAT1M155T or FKRPP448L mice. For example, Figure 1 makes it appear that the distal sugar moieties are missing, however, the IIH6 antibody, which binds to terminal matriglycan repeats on the glycan chain, recognizes dystroglycan in these mutants.

      We apologize for the confusion caused by our schematic in Figure 1. We have adjusted the opacity of the schematic in Figure 1A to better illustrate that the matriglycan chain is s5ll present, albeit at reduced levels, in the B4Gat1 and FKRP mutants. In addition, this is directly shown in the western blot in Figure 1B.

      • In Figure 1, the authors use the IIH6 antibody, which recognizes the terminal portion of the dystroglycan glycan chain, to label dystroglycan in the hippocampus. As expected, Emx1Cre,POMT2cKO mice, which lack glycosylation of dystroglycan, do not show any labelling. However, this experiment does not reveal anything about dystroglycan expression, only that the IIH6 antibody no longer recognizes dystroglycan. It would be very helpful in interpreting the later results to know whether the level and pattern of dystroglycan expression is normal or absent in the POMT2cKO mice, perhaps using another antibody that does not target the glycosylated region. For example, figure 3 shows reduced axon targeting to the cell body layer in POMT2cKO, however, it is unclear whether this is due to absence/mislocalization of dystroglycan at the cell surface, or if dystroglycan expression is normal, but glycosylation is directly required for axon targeting.

      Addressed in the “Recommendation for Authors” section below

      • In Figures 3 and 5, the authors use CB1R labelling to measure axon targeting and synapses formation. However, it is not clear how the authors measure axon targeting and synapses number separately using the same CB1R antibody. In addition, figure 3 shows reduced CB1R labelling in Dag1cyto pyramidal cell layer, but Figure 5 shows no change in CB1R labelling in the same mice. These results would appear to be contradictory.

      In Figure 3, the data reflects fluorescent intensity of CB1R+ axons measured across the en5re hippocampal depth. In contrast, the synapse number in Figure 5 is measured as VGat+ and CB1R+ puncta (axonal swellings) within the pyramidal cell layer (SP). The discrepancy between these measurements in the Dag1Cyto mutants likely reflects a change in the distribution of the synaptic contacts in SP (ie: increased contacts in the upper portion of the SP relative to the bottom). This is clarified in the text, lines 315-319.

      • The authors measure spontaneous IPSCs (sIPSC) in CA1 pyramidal neurons to measure inhibitory synaptic function. This measure assesses inhibitory synaptic input from all sources, but dystroglycan mutations primarily impairs synapses arising from CCK+/CB1R interneurons, leaving synapses arising from PV or other interneurons relatively unchanged. To assess changes in CCK+/CB1R interneurons the authors apply the cholinergic receptor agonist Carbachol (which selectively activates CCK+/CB1R interneurons) and measure the change in sIPSC amplitude and frequency. While this is an interesting and reasonable experiment, the observed effects could be due to altered carbachol sensitivity in the transgenic mice. Control experiments showing that the effect of Carbachol on excitability of CCK+/CB1R interneurons is similar across mouse lines is missing.

      The reviewer is correct that we did not show that CCK/CB1R+ interneurons have the same sensitivity to CCh in controls and the various mutants. Indeed, this is something we have struggled with over the course of the study, and is an inherent limitation of the current study. Unfortunately, these cells are relatively sparse in the CA1, and therefore patching onto presumptive CCK/CB1R+ INs at random to test this directly is not feasible. There are also no genetic or viral tools that we are aware of at this time to fluorescently label these cells for targeted recordings (this would need to be a Cre-independent transgenic mouse line since we are using Cre to delete Dag1 and Pomt2). We tried to assess this by measuring c-fos immunohistochemistry staining as a proxy for activity in response to CCh. Briefly, we incubated acute slices with NBQX, SR95531, and Kynurenic Acid to block synaptic activity, and added CCh in the bath for 30, 60, and 90 minutes to induce CCK/CB1R+ INs firing. Slices were then fixed and stained for c-fos and NECAB1 to identify the CCK/CB1R+ interneurons.

      Unfortunately, we had a very difficult time imaging these slices, and we were not confident in our ability to localize c-fos+/NECAB1+ cells. We have clarified that this is an inherent limitation to the study in the text, lines 394-396.

      • Earlier work has shown that selective deletion of dystroglycan from pyramidal neurons produces near complete loss of CCK+/CB1R interneurons and synapse formation, a more severe deficit than observed here using a more widespread Cre-driver. This finding is surprising, as generally more wide-spread gene deletion results in more severe, not less severe, phenotypes. The authors make the reasonable claim that more wide-spread gene deletion better mimics human pathologies. However, possible speculation on why this is the case for dystroglycan could provide insight into the nature of CNS deficits in different forms of dystroglycanopathies.

      The reviewer is correct that previous work from both our lab and others have shown that deletion of Dag1 from only pyramidal neurons with NEX-cre leads to a complete loss of CCK/CB1R+ INs, and is thus more severe than the phenotype seen with the broader deletion of Dag1 with Emx1-Cre. We were also surprised by this result, so we also generated Dag1;Nestin-Cre mice. These mice show an iden5cal phenotype as the Dag1;Emx1-Cre mutants (new data; Figure 3, Supplement 1; text lines 226-233). This makes us confident in the validity of the Dag1;Emx-Cre mutants with regards to modeling the human disease. We do not know why the NEX-Cre line shows a more severe phenotype; it is possible that this is due to an unknown epistatic interaction between Dag1 and NEX-Cre.

      Reviewer #2 (Public Review):

      The manuscript by Jahncke and colleagues is centered on the CCK+ synaptic defects that are a consequence of Dystroglycanopathy and/or impaired dystroglycan-related protein function. The authors use conditional mouse models for Dag1 and Pomt2 to ablate their function in mouse forebrain neurons and demonstrate significant impairment of CCK+/CB1R+ interneuron (IN) development in addition to being prone to seizures. Mice lacking the intracellular domain of Dystroglycan have milder defects, but impaired CCK+/CB1R+ IN axon targeting. The authors conclude that the milder dystroglycanopathy is due to the par5ally reduced glycosylation that occurs in the milder mouse models as opposed to the more severe Pomt2 models. Additionally, the authors postulate that inhibitory synaptic defects and elevated seizure susceptibility are hallmarks of severe dystroglycanopathy and are required for the organization of functional inhibitory synapse assembly.

      The manuscript is overall, fairly well-written and the description of the phenotypic impact of disruption of Dystroglycan forebrain neurons (and similar glycosyltransferase pathway proteins) demonstrate impairment in axon targeting and organization.

      There are some questions with regards to interpretation of some of the results from these conditional mouse models.

      • The study is mostly descriptive, and some validation of subunits of the dystroglycanglycoprotein complex and laminin interactions would go towards defining the impact of disruption of dystroglycan's function in the brain.

      Addressed in the “Recommendation for Authors” section below

      • The statistics and basic analysis of the manuscript appear to be appropriate and within parameters for a study of this nature.

      • Some clarification between the discrepancies between the Walker Warburg Syndrome (WWS) patient phenotypes and those observed in these conditional mouse models is warranted. This manuscript has the potential to be impactful in the Dystroglycanopathy and general neurobiology fields.

      Addressed in the “Recommendation for Authors” section below

      Reviewer #3 (Public Review):

      The study presents a systematic analysis of how a range of dystroglycan mutations alter CCK/CB1 axonal targeting and inhibition in hippocampal CA1 and impact seizure susceptibility. The study follows up on prior literature identifying a role for dystroglycan in CCK/CB1 synapse formation. The careful assay includes comparison of 5 distinct dystroglycan mutation types known to be associated with varying degrees of muscular dystrophy phenotypes: a forebrain specific Dag1 knockout in excitatory neurons at 10.5, a forebrain specific knockout of the glycosyltransferase enzyme in excitatory neurons, mice with deletion of the intracellular domain of beta-Dag1 and 2 lines with missense mutations with milder phenotypes. They show that forebrain glutamatergic deletion of Dag1 or glycosyltransferase alters cortical lamination while lamination is preserved in mice with deletion of the intracellular domain or missense mutation.

      The study extends prior works by identifying that forebrain deletion of Dag1 or glycosyltransferase in excitatory neurons impairs CCK/CB1 and not PV axonal targeting and CB1 basket formation around CA1 pyramidal cells. Mice with deletion of the intracellular domain or missense mutation show limited reductions in CCK/CB1 fibers in CA1. Carbachol enhancement of CA1 IPSCs was reduced both in forebrain knockouts. Interestingly, carbachol enhancement of CA1 IPSCs was reduced when the intracellular domain of beta-Dag1was deleted, but not I the missense mutations, suggesting a role of the intracellular domain in synapse maintenance. All lines except the missense mutations, showed increased susceptibility to chemically induced behavioral seizures. Together, the study, is carefully designed, well controlled and systematic. The results advance prior findings of the role for dystroglycans in CCK/CB1 innervations of PCs by demonstrating effects of more selective cellular deletions and site specific mutations in extracellular and intracellular domains. The interesting finding that deletion of intracellular domain reduces both CB1 terminals in CA1 and carbachol modulation of IPSCs warrants further analysis. Lack of EEG evaluation of seizure latency is a limitation.

      Specific comments

      • Whether CCK/CB1 cell numbers in the CA1 are differentially affected in the transgenic mice is not clarified.

      This is a good point; we have now addressed this in Figure 3, Supplement 2 (new data; text lines 234-245). In brief, using two different markers (NECAB1 and NECAB2), we see no change in the number of CCK+/CB1R+ INs in the mutant mice.

      • 2. Whether basal synaptic inhibition is altered by the changes in CCK innervation is not examined.

      We apologize for the confusion. This is addressed in the text, lines 371-375:

      “Notably, even baseline sIPSC frequency was reduced in Dag1cyto/- mutants (2.27±1.70 Hz) compared to WT controls (4.46±2.04 Hz, p = 0.002), whereas baseline sIPSC frequencies appeared normal in all other mutants when compared to their respective controls.”

      Reviewer #1 (Recommendations For The Authors):

      Line 321- CCH-mediated CHANGE in sIPSC amplitude...

      This has been corrected (now line 356)

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      • Disruption of the dystroglycan (and subsequent glycosyltransferase proteins) in the brain would likely impact laminin localization and cytoskeletal stability of the dystroglycanprotein complex. The authors should assess (via immunolabeling) the disruption laminin using laminin IF in the various conditional mouse model forebrain sections.

      We have stained brains from Dag1, Pomt2, and Dag1cyto mutants with an antibody to Laminin (new data; Figure 2, Supplement 2; text lines 191-205). Briefly, the data clearly shows that laminin staining is abnormal on the pial surface and in the blood vessels of the Dag1;Emx1-cre mutants. This is less severe in the Pomt2;Emx1 mutants, and normal in the Dag1cyto mutants. We also examined higher magnification of laminin staining in hippocampal SP around the pyramidal cells. Laminin in the region was diffuse (not synaptically localized) and there was no difference between any of the mutants and their respective controls (data not shown).

      • 2. The biggest question(s) I have is if the synaptic defects that were measured (Fig 6) in the spontaneous inhibitory post-synaptic currents (sIPSCs) could be rescued as a function of the glycosylation of dystroglycan? While ribitol/CDP-ribose has been shown to enhance alpha-dystroglycan glycosylation and total glycosylation, it might be appropriate here. NADplus exogenous supplementation has been (Ortez-Cordero et al., eLife, 2021) has a faster acting effect on glycosylation of dystroglycan and may work in this context. Can the authors add NADplus prior to their CCK+/CB1R+ IN recordings and evaluate synaptic current effects to determine if glycosylation rescue can actually occur?

      We are very much interested in the potential to rescue synaptic defects in the various mutants, and this is an active area of study for us going forward. However, we do not think the suggested experiments involving ribitol/NADplus supplementation are likely to work in our specific experiments with these models. In Dag1;Emx1-Cre and Pomt2;Emx1-Cre mice, which show the most dramatic phenotype, there is no O-mannosyl chain ini5ated for ribitol to act upon. In the Dag1Cyto mice, matriglycan is normal and therefore ribitol supplementation is unlikely to have an effect. In B4Gat1 and FKRP mutants, while matriglycan is reduced, there is no significant functional synaptic defect observed. Therefore, even if ribitol was able to increase matriglycan in these two mutants, we would be unable to detect a functional difference. As a side note, while the NADplus supplementation is an interesting idea, the previous study cited did these experiments in vitro in cell lines, so it is not clear if this would have the same effect in vivo. In addition, the time frame that they analyzed was following 24-72 hours of supplementation in cultured cells, which led to ~10% increase in IIH6 at 24 hours. We are unable to incubate acute slices for that amount of time prior to our recordings.

      • 3. Minor point. Genetic abbreviation for POMT2 should be "Pomt2", unless some other justification is provided by the authors. I believe the other mutations introduced (e.g. FKRP P448L are humanized mutations).

      This has been corrected throughout

      • 4. While dystroglycan glycosylation using the IIHC6 antibody is important for proper localization, the core DAG-6F4 monocloncal antibody (DSHB Iowa Hybridoma Bank) would inform you if there is actual disruption in the amount of dystroglycan protein translation and/or production in the forebrain. Can the authors address this question on total dystroglycan production?

      This is a great suggestion. We obtained both the DAG-6F4 monoclonal antibody from DSHB and a monoclonal antibody to alpha-Dag1 from Abcam (45-3) and tried using them for immunostaining, but they did not work with brain tissue. However, we were able to use an antibody to beta-Dag1 (Leica, B-DG-CE) for immunostaining. This new data is included in Figure 1, Supplement 2 (text lines 134-140) and shows that as expected, beta-Dag1 is completely gone in Dag1;Emx1-Cre and Dag1Cyto mutants. In the Pomt2;Emx1-Cre mutants, betaDag1 is present but no longer has the punctate appearance consistent with synaptic localization. We have added a section in the discussion expanding on the interpretation of the data, lines 449-462.

      • 5. Please comment more on the structural changes in the forebrain and the presence or lack thereof cobblestone (e.g. lissencephaly) in the POMT2 mutant mice (and the other dystroglycanopathy models)? There appears to be some discordance with that and the human Walker Warburg Syndrome (WWS) patients.

      The Pomt2;Emx1-cre mutants show a cobblestone phenotype (identical to the Dag1;Emx1-Cre mutants), see Figure 2. This is consistent with these two models having a complete loss of Dag1 function, and therefore modeling the most severe forms of dystroglycanopathy (WWS, MEB). In contrast, the B4Gat1 and FKRP mutants show relatively normal cortical migration because these mutants are hypomorphic and therefore retain some degree of functional Dag1. These two mice model a milder form of dystroglycanopathy. We have clarified this on lines 188-190 and 573-578.

      • 6. Line 577. Minor typo, statement ended in a comma, versus a period.

      Done

      • 7. Methods. Please report on the sex of the mice used in the experiments.

      Mice of both sexes were used throughout the study. This has been clarified in the methods section, and we have added information regarding how many mice of each sex were used in each experiment in supplemental table 1

      Reviewer #3 (Recommendations For The Authors):

      Additional Specific Comments,

      • Although authors include n slice/animals and other details in the methodology, including data as % changes and n (slices/animals) in results will greatly improve the readability.

      We have clarified that only one cell per slice was used for physiological recordings (Figure 6) in the methods section, as CCh does not wash out.

      • 2. IPSCs are measured as inward currents in high chloride with AMPA blockers which is appropriate. However, Mg was appears to be low (1 mM) in cutting solution. Was this the case in the recording solution. If so, why were NMDA blockers not used.

      To clarify, 10mM Mg was included in the cutting solution, and 1mM Mg was included in the recording solution. When the cell is clamped at -70mV, 1mM Mg2+ is sufficient to block NMDA receptors: haps://www.nature.com/ar5cles/309261a0

    1. Author Response

      Reviewer 1:

      1. The missing mouse gender information will be incorporated into the revised manuscript. For flow cytometry, two male and two female mice of each genotype were used. For single cell RNA sequencing, two female and one male mouse of each genotype were used. For the bulk RNA sequencing four male cd47−/− mice and four male wildtype mice were used.

      2. The bulk RNA sequencing analysis identified elevated expression of erythropoietic genes in CD8+ spleen cells from cd47−/− versus wildtype mice that were obtained using magnetic bead depletion of all other lineages. Therefore, we used the same Miltenyi negative selection kit as the first step to prepare the cells for single cell RNA sequencing. These untouched cells were then depleted of most mature CD8 T cells using a Miltenyi CD8a(Ly2) antibody positive selection kit. An important consideration underlying this approach was recognizing that the commercial magnetic bead depletion kits used for preparing specific immune cell types are optimized to give relatively pure populations of the intended immune cells using wildtype mice. Our previous experience studying NK cell development in the cd47−/− mice taught us that NK precursors, which are rare in wildtype mouse spleens, accumulate in cd47−/− spleens and were not removed by the antibody cocktail optimized for wildtype spleen cells (Nath et al Front Immunol 2018). The present data indicate that erythroid precursors behave similarly.

      3. Anemia is a prevalent side effect of several CD47 therapeutic antibodies being developed for cancer therapy. Anemia would be expected to induce erythropoiesis in bone marrow and possibly at extramedullary sites. Human spleen cells are not accessible to directly evaluate extramedullary erythropoiesis in cancer patients, but analysis of circulating erythroid precursors or liquid biopsy methods could be useful to detect induction of extramedullary erythropoiesis by these therapeutics. We are currently investigating the ability of CD47 antibodies to directly induce erythropoiesis using a human in vitro model.

      Reviewer 2:

      1. The reviewer asked, “whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model.” Our data supports both a direct role for CD47 and an indirect role resulting from the response to anemic stress. We cited our previous publications describing increased Sox2+ stem cells in spleens of Cd47 and Thbs1 knockout mice, but we neglected to emphasize another study where we found that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). Taken together, our published data demonstrates that loss of CD47 results in an intrinsic protection of hematopoietic stem cells from genotoxic stress. This function of CD47 is thrombospondin-1-dependent and is consistent with the up-regulation of early erythroid precursors in the spleens of both knockout mice but cannot explain why the Thbs1−/− mice have fewer committed erythroid precursors than wildtype. We cited studies that documented increased red cell turnover in cd47−/− mice but less red cell turnover in Thbs1−/− mice compared to wildtype mice. Increased red cell clearance in cd47−/− mice is mediated by loss of the “don’t eat me” function of CD47 on red cells. In wildtype mice, clearance is augmented by thrombospondin-1 binding to the clustered CD47 on aging red cells (Wang, Aging Cell 2020). Thus, anemic stress in the mouse strains studied here decreases in the order cd47−/− > WT > Thbs−/−. This is consistent with the increased committed erythroid progenitors reported here in cd47−/− spleens and decreased committed progenitors in the Thbs1−/− spleens.

      2. The cd47−/− mice used for the current study are the same strain as those reported by Lindberg et al in 1996, with additional backcrossing onto a C57BL/6 background.

    1. Author Response

      We are grateful to the editor and the reviewers for recognizing the importance of our theoretical study on the mechanisms of centrosome size control. We appreciate their thoughtful critiques and suggested improvements, all of which we intend to address in the revised manuscript as outlined below. We acknowledge that the experimental evidence supporting the proposed theory is currently incomplete. We anticipate that our study will serve as inspiration for future experiments aimed at testing the proposed theory.

      As noted by both reviewers, our model is built on the assumption that the diffusion of molecular components is much faster than any reactive time scales. To explore the impact of diffusion on centrosome size regulation, we are presently working on a spatial model of centrosome growth within a spatially extended system. Our objective is to analyze the influence of diffusion, and we plan to integrate these findings into the revised manuscript.

      To address the concerns raised by both the reviewers regarding the applicability of our model to various organisms, we plan to revise the manuscript to clearly delineate the parameter ranges within which our model could be relevant for different organisms such as C. elegans or Drosophila. While centrosomal components may vary among different organisms, the underlying pathways of interactions exhibit similarities. Leveraging the generality of our theory, it has the capability to capture diverse centrosomal growth behaviors contingent on the parameter choices. Our objective is to emphasize these distinctions, illustrating how the modulation of growth cooperativity and enzyme concentration can influence size regulation and size scaling behaviors. Given the limited availability of quantitative experimental data across diverse organisms, we recognize the challenge in directly comparing our theory with data. Nevertheless, we are committed to presenting a thorough motivation for such comparisons to prevent any confusion or readability issues.

      We acknowledge the reviewers' concerns regarding the limited details provided on the simulation methods and the rationale behind the choice of model parameters. To address this, we will provide detailed explanations on the stochastic simulations, how the model parameters were calibrated, accompanied by appropriate references for the selected parameter values. Additionally, we thank reviewer 1 for the excellent suggestion to incorporate a linear stability analysis of the ordinary differential equations underlying the model. This analysis will offer valuable insights into how the physical parameters of the model influence the tendency to produce equal-sized centrosomes, and we are committed to including this in the revised manuscript. Additionally, we thank reviewer 2 for proposing the use of Polo pulse dynamics to more precisely constrain the parameter regime for centrosome growth dynamics in Drosophila. We will strive to incorporate this into the revised manuscript, recognizing the challenge of quantitatively interpreting centrosome size or subunit concentration values from experimental data on fluorescence intensities. We also plan to discuss enzyme pulse dynamics in C. elegans in the revised manuscript, as it presents a valuable prediction from our model.

      We disagree with reviewer 1's assertion that Reference 8 (Zwicker et al., PNAS 2014) effectively addresses the robustness of centrosome size equality in the presence of positive feedback. The linear stability analysis presented in Figure 5 of Reference 8 demonstrates stability of centrosome size around the fixed point, leading to the inference that Ostwald ripening can be inhibited by the catalytic activity of the centriole. In our manuscript (see Supplementary Figure 3), we demonstrate that the existence of the stable fixed point does not necessarily give rise to equal-sized centrosomes due to the slow dynamics of the solution around the fixed point. With an appreciable amount of positive feedback in the growth dynamics, the solution moves very slowly around the fixed point (similar to a line attractor), and cannot reach the fixed point within a biologically relevant timescale leaving the centrosomes at unequal sizes. Therefore, we argue that the model in Reference 8 lacks a robust mechanism for size control in the presence of autocatalytic growth. Additionally, we wish to emphasize that the choice of initial size difference in our model does not qualitatively alter the results for robustness in centrosome size equality, as shown in Supplementary Figure 3. Nevertheless, we acknowledge the need for a quantitative analysis of the dependence of size regulation on the initial discrepancy in centrosome size. We will incorporate such an analysis into the revised manuscript to strengthen our conclusions. Reviewer 2 has questioned the dismissal of the non-cooperative growth model, suggesting that minor adjustments in that model, such as incorporating size-dependent addition or loss rates due to surface assembly/disassembly, could potentially maintain equally sized organelles with sigmoidal growth dynamics. However, this conclusion is inaccurate. Any auto-regulatory positive feedback would result in size inequality, unless the positive feedback is shared between the organelles. The introduction of size-dependent addition rates due to surface-mediated assembly, would result in auto-regulatory positive feedback, leading to unequal sizes. We have explored a similar scenario of growth dynamics involving assembly and disassembly throughout the pericentriolic material volume in Supplementary Section II, demonstrating significant size inequality in that model and a lack of robustness in size control. We will provide a detailed response to this point in our reply, along with an explicit examination of the surface assembly model.

      In addition to the aforementioned modifications, we will revise the section discussing the predictions of the proposed model in the revised manuscript to rectify any lack of clarity in testable model predictions. We aim to provide clearer demonstrations of how our model predictions differ from those of previous models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the 3 reviewers and the editorial team for agreeing that our work is rigorous and valuable for the fields of olfaction and developmental biology. We provide a revised version of the manuscript that addresses major concerns raised by the reviewers and adheres to their suggestions.

      Specifically:

      -We clarify what is novel in this work and we cover the appropriate literature.

      -We tone down the language and interpretation of our data

      -We clarify the categorization of zones and improve the readability to the best of our ability.

      We have also made every effort to address minor points raised by the 3 reviewers and made clarifications wherever requested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In order to find small molecules capable of enhancing regenerative repair, this study employed a high throughput YAP-activity screen method to query the ReFRAME library, identifying CLK2 inhibitor as one of the hits. Further studies showed that CLK2 inhibition leads to AMOTL2 exon skipping, rendering it unable to suppress YAP.

      The novelty of the study is that it showed that inhibition of a kinase not previously associated with the HIPPO pathway can influence YAP activity through modification of mRNA splicing. The major arguments appear solid.

      We thank the Reviewer for their thoughtful assessment of this work. We have fully addressed each comment below in a point-by-point fashion.

      There are several noteworthy points when assessing the results. In Figure S1C, 100nM drug was toxic to cells at 72 hours and 1nM drug suppressed cell proliferation by 60%. Yet such concentrations were used in Figure 1B and C to argue CLK2 inhibition liberates YAP activity (which one would assume will increase cellular proliferation). In Figure 1C it appears that 1nM drug treatment led to some kind of cellular stress, as cells are visibly enlarged. In Figure 1D, 1nM drug, which would have suppressed cell growth by 60%, did not affect YAP phosphorylation. Taken together, it appears even though CLK2 inhibitor (at high concentrations) liberates YAP activity, its toxicity may override the potential use of this drug as a YAP-activator to salve tissue regenerative repair, which was one of the goals hinted in the background section.

      We do not claim that CLK2 inhibition is useful as a YAP activator, either as a precise pharmacological tool or as a therapeutic mechanism for inducing regenerative repair. Instead, the key finding of this work is to describe a novel, unanticipated cellular mechanism for activating YAP, one that should be considered when optimizing pharmacological candidates that modulate alternative splicing for diseases where potential proliferation is undesirable.

      However, to address this point, we have included additional experimentation. Specifically, we show that cytotoxicity with compound treatment at 24 hours, a timepoint at which we perform most evaluation of alternative splicing induced by compound, is considerably less than that observed at 72 hours. Now included as Figure S1C, this panel shows while the compound displays some cytotoxicity at ~1 nM at 72 hours, the half maximal inhibitory potency at 24 hours is ~300 nM. As such, we believe there is not incongruity between YAP activity, cellular proliferation, and SM04690-induced cytotoxicity. It is simply such that higher concentrations of compound, and thus increased engagement of CLK2 and other targets of the inhibitor, result in a cumulative cytotoxic effect over time.

      In Figure 2D, at 100nM concentration, the drug did not appear to affect AMOTL2 splicing. Even though at higher concentrations it did, this potentially put into question whether YAP activity liberated by this drug at 1nM (Fig 2A), 10-50nM (Fig 2C) concentrations is caused by altered AMOTL2 splicing. Discussions should be provided on the difference in drug concentrations in these experiments. Does the drug decay very fast, and is that why later studies required higher dose?

      We believe this comment is in reference to Fig. 3D, and we argue that, while faint, there is the presence of AMOTL2 splicing at 100 nM SM04690 treatment as seen by a faint lower molecular weight band. However, to further understand the extent to which AMOTL2 is alternatively spliced in response to compound treatment, we performed RT-qPCR analysis of AMOTL2 splicing with an expanded concentration response. These results indicate that high magnitude exon skipping of AMOTL2 occurs starting at 10 nM with 24-hour treatment of compound (now in the manuscript as Fig. S4A). This result matches with our data in Fig. 2C, wherein YAP phosphorylation begins decreasing at 10 nM SM04690 treatment.

      Likely impact of the work on the field: this study presented a high throughput screen method for YAP activators and showed that such an approach works. The hit compound found from ReFRAME library, a CLK2 inhibitor, may not be actually useful as a YAP activator, given its clear toxicity. Applying this screen method on other large compound libraries may help find a YAP activator that helps regenerative repair. The finding that CLK2 inhibition could alter AMOTL2 splicing to affect HIPPO pathway could bring a new angle to understanding the regulation of HIPPO pathway.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have screened the ReFRAME library and identified candidate small molecules that can activate YAP. The found that SM04690, an inhibitor of the WNT signaling pathway, could efficiently activate YAP through CLK2 kinase which has been shown to phosphorylate SR proteins to alter gene alternative splicing. They further demonstrated that SM04690 mediated alternative splicing of AMOTL2 and rendered it unlocalized on the membrane. Alternatively spliced AMOTL2 prevented YAP from anchoring to the cell membrane which results in decreased YAP phosphorylation and activated YAP. Previous findings showed that WNT signaling more or less activates YAP. The authors revealed that an inhibitor of WNT signaling could activate YAP. Thus, these findings are potentially interesting and important. However, the present manuscript provided a lot of indirect data and lacked key experiments.

      We thank the Reviewer for their thorough review of this work. We have responded to each comment below.

      Major points:

      1. In Figure S3, since inhibition of CLK2 resulted in extensive changes in alternative splicing, why did the authors choose AMOTL2? How to exclude other factors such as EEF1A1 and HSPA5, do they affect YAP activation? Angiomotin-related AMOTL1 and AMOTL2 were identified as negative regulators of YAP and TAZ by preventing their nuclear translocation. It has been reported that high cell density promoted assembly of the Crumbs complex, which recruited AMOTL2 to tight junctions. Ubiquitination of AMOTL2 K347 and K408 served as a docking site for LATS2, which phosphorylated YAP to promote its cytoplasmic retention and degradation. How to determine that alternative splicing rather than ubiquitination of AMOTL2 affects YAP activity? Does AMOTL2 Δ5 affect the ubiquitination of AMOTL2? Does overexpression of AMOTL2 Δ5Δ9 cause YAP and puncta to co-localize?

      AMOTL2 is the relevant cellular target, because among the entire transcriptome it was the third most alternatively spliced in response to CLK2 inhibition (Fig. S3). No other targets relevant to the Hippo pathway were identified.

      We have shown that overexpression of exon skipped AMOTL2 (Fig. 3F) recapitulates the effect of compound, indicating that splicing per se is what drives the YAP activation phenotype. While AMOTL2 is ubiquitinated, these established sites of ubiquitination do not lie within exons 5 or 9. Thus, we anticipate that ubiquitination is less likely a driving factor in the observed phenotype. The manuscript is written as not to exclude this as a possibility, but it is downstream of what we describe, and we believe out of scope to explore this further in this preliminary report.

      1. The author proposed that AMOTL2 splicing isoform formed biomolecular condensates. However, there was no relevant experimental data to support this conclusion. AMOTL2 is located not only on the cell membrane but also on the circulating endosome of the cell, and the puncta formed after AMOTL2 dissociation from the membrane is likely to be the localization of the circulating endosome. The author should co-stain AMOTL2 with markers of circulating endosomes or conduct experiments to prove the liquidity of puncta to verify the phase separation of AMOTL2 splicing isoform.

      We do not claim AMOTL2 forms biomolecular condensates. Instead, we hypothesize in the Discussion section that AMOTL2 could possibly phase separate into biomolecular condensates based on its similarity to AMOT, which has been shown to phase separate and form cytoplasmic puncta (PMID: 36318920). AMOT has also been shown to colocalize with endosomes (PMID: 25995376), which also appear as puncta.

      1. The localization of YAP in cells is regulated by cell density, and YAP usually translocates to the nucleus at low cell density. In Figure 2E, the cell densities of DMSO and SM04690-treated groups are inconsistent. In Figure 4A, the magnification of t DMSO and SM04690-treated groups is inconsistent, and the SM04690treated group seems to have a higher magnification.

      In immunofluorescence experiments, cells were plated at the same density and grown for the same amount of time before treatment. Additionally, within an experiment, images were taken at the same magnification. Any apparent differences in cell density are due to effects of the compound.

      1. There have been many reports that the WNT signaling pathway and the Hippo signaling pathway can crosstalk with each other. The authors should exclude the influence of the WNT signaling pathway by using SM04690.

      While the WNT pathway has been shown to influence Hippo pathway activity, we have shown a direct effect of CLK2 inhibition by SM04690. Any WNT potential pathway effects are in addition to the splicing-based mechanism we described.

      Reviewer #3 (Public Review):

      This study on drug repurposing presents the identification of potent activators of the Hippo pathway. The authors successfully screen a drug library and identify two CLK kinase inhibitors as YAP activators, with SM04690 targeting specifically CLK2. They further investigate the molecular basis of SM04690-induced YAP activity and identify splicing events in AMOTL2 as strongly affected by CLK2 inhibition. Exon skipping within AMOTL2 decreases the interactions with membrane bound proteins and is sufficient to induce YAP target gene expression. Overall the study is well designed, the conclusions are supported by sufficient data and represent an exciting connection between alternative splicing and the HIPPO pathway. The specificity of the inhibitor towards CLK2 and the mode of action via AMOTL2 could be supported by further data:

      We thank the Reviewer for their close examination of our work. We respond below.

      1. The inconsistent inhibitor concentrations and varying results reported in the paper can be distracting. For instance, the response of endogenous targets to 100 nM concentration is described as a >5-fold increase in Figure 2B, whereas it is reported as a 1-1.5-fold response to 1000 nM in Figure 2D. This inconsistency should be addressed and clarified to provide a more accurate and reliable representation of the findings.

      In Figure 2D, we have transduced cells with lentivirus, which most likely suppresses their responsiveness to compound treatment. We have addressed the issue of varying inhibitor concentrations in response to Reviewer 1.

      1. In the absence of a strong inhibitor induced YAP target gene expression (Figure 2D), it is difficult to conclude the dependency on YAP expression, as investigated by siRNA mediated knockdown. In a similar experiment, the dependency of the inhibitor on CLK2 expression could be confirmed

      While the sample with Scramble virus does not respond to the same extent that WT HEK293A cells do (e.g., Fig. 2B), there is still responsiveness to compound. Likewise, YAP knockdown cells display statistically significant decreases in YAP-controlled transcripts. This decrease of transcript is therefore sufficient evidence that SM04690 requires YAP for its activity. We have shown that multiple CLK2 inhibitors recapitulate the effect of SM04690, abrogating the need to show dependency of CLK2.

      1. To further support the conclusion that CLK2 is the direct target of SM04690, it would be informative to investigate the effects of CLK1/4 inhibition on AMOTL2 exons (for example within RNA-seq data). If CLK1/4 inhibitors do not induce changes in AMOTL2 exons, it would strengthen the evidence for CLK2's role as the direct target. Including the results in the discussion would enhance the comprehensiveness of the study.

      We showed that CLK1/4 inhibition with small molecules ML167 and TG003 does not affect YAP activity in our luciferase reporter assay (Fig. S2D), which we believe is sufficient evidence that CLK1/4 is neither the direct target of SM04690 nor relevant to the splicing mechanism we describe.

      1. It would be important to determine the specific dose of SM04690 required to induce changes in AMOTL2 splicing. The authors observe that AMOTL2 protein levels appear unaffected at doses below 50 nM in Figure 3D, while YAP target genes are already affected at 20 nM in Figure 3G. Although Western blotting may not be the most sensitive method to detect minor changes in splicing, performing PCR experiments at lower doses could provide more insight into the splicing changes. Therefore, it is suggested that the authors include PCR experiments at lower doses to determine if changes in splicing are visible and to better establish the relationship between splicing and gene expression changes.

      We agree with the Reviewer that this experiment is essential to better understand splicing changes with SM04690 treatment. Accordingly, we have added RT-qPCR-based analysis of AMOTL2 exon inclusion at lower concentrations between 10 nM and 100 nM (Fig. S4A). We included a similar discussion in response to a point from Reviewer 1.

      Reviewer #1 (Recommendations For The Authors):

      As stated in the public review section, it will be helpful to discuss the differences in drug concentration. Although no one should require or expect a perfect drug dose match throughout any study, in this study the drug dose clearly demarcated when CLK2 inhibitor help/hurt proliferation, when CLK2 inhibitor was able to affect YAP phosphorylation, and when CLK2 inhibitor was able to affect AMOTL2 splicing. This is not to challenge the major conclusions of the paper, but it is hard to ignore if no discussion is provided.

      Several suggestions on data presentation:

      1. Scale bar information is missing in Fig. 2E, 4A and 4B.

      We have corrected this mistake in the revised manuscript.

      1. For Fig.3 D and 3E, it's better if kD information was labeled alongside the AMOTL2 Western blot.

      Thank you for the suggestion; we have added the appropriate labeling.

      1. It's better to label Figure2D as sh YAP-1, sh YAP-2; Figure 3A as sh CLK2-1, sh CLK2-2 etc. Currently they are all labeled shRNA-1, shRNA-2, which can be confusing.

      We have altered the labeling for clarity as requested.

      Reviewer #3 (Recommendations For The Authors):

      1. The use of asterisks in Figure 2D is unclear, especially their placement on the "Scramble" sample.

      We have amended the asterisks and have also added more detail to the figure legend.

      1. When designing primers for splicing-sensitive PCR, it is recommended that the skipping isoform is larger than 100 bp. This will help to avoid quantitative issues with ethidium bromide staining. In the results part, the text reads as if only the skipping isoform is present after SM04690 treatment.

      This experiment was performed to confirm the presence of exon skipping in the treated samples. Accordingly, we did not optimize the ethidium bromide staining of the lower bp bands. We will take the size of the isoform into consideration in any future experiments. We thank the reviewer for catching the textual error and have amended the text in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      My main request is to show the phylogeny in the main text, so the reader knows what nodes are being compared.

      Full phylogeny was added to the main text as Fig. 2. Additionally, phylogenetic tree in Newick format is presented as a Supplementary file 2.

      I also suggest the authors check their figure legends carefully. At least in figure one, I think there is some mix-up with the letter labelling of the panels.

      Our mistake. Figure legend was corrected. In this version of the manuscript Figure 1 was split into Fig. 1 and Fig. 3. Corrected version is presented in the legend to Fig. 3.

      And lastly, I urge the authors to deposit the tree, alignment, and reconstructed sequences in a public repository.

      Alignment in fasta format and phylogenetic tree in Newick format were added as supplementary files to the publication (supplementary file 1 and supplementary file 2, respectively). Reconstructed sequences (both Most likely and AltAll variants) were shown as a figure supplement (Figure 3 – figure supplement 2). Posterior probabilities for all positions of the reconstructed sequences were added as a supplementary file (supplementary file 3).

      Reviewer #2 (Recommendations For The Authors):

      -I find the term "secondarily single sHsp" to be a little confusing, especially because it is often used in relation to IbpA/B, but it is just IbpA in another species. I think it would be more clear for the reader to consistently refer to it as Erwiniaceae IbpA vs Escherichia IbpA, or something similar.

      In the introduction we clarified (page 4 lines 11-13) that the term “secondarily single” IbpA refers to IbpA that lacks partner protein as a result of ibpB gene loss. This is in opposition to “single-protein” IbpA from a clade in which gene duplication leading to creation of two – protein sHsp system did not occur (like Vibrionaceae or Aeromonadaceae) - see Obuchowski et al., 2019.

      -Figure 1B. The labels are not defined. What is L? A and B refer to IbpA and IbpB but this should be made more clear to the reader. Why is this panel only referred to in the Introduction and not the Results? Why is there a second panel for E.amy, rather than including it in the same panel, as for other experiments? What are the error bars? (That goes for every error bar in the paper, none are defined).

      Labels in Fig.1B were corrected; “L” was used in reference to “luciferase alone” and it has been corrected for consistency to “no sHsp”. The sHsps activity measurements (obtained in the same experiment) were split into two separate panels as a correspondence to the two branches of the simplified tree in Fig. 1. The figure was modified to make it clearer and avoid confusion. Definitions of error bars were added to this and other figures.

      -"AncA0 exhibited sequestrase activity on the level comparable to IbpA from Escherichia coli (IbpAE.coli). AncA1 was moderately efficient in this process and IbpA from Erwinia amylovora (IbpAE.amyl) was the least efficient sequestrase (Fig. 1D)." - First, this should be referring to Fig. 1C. Second, the text doesn't quite match the panel. A0 appears to have the strongest sequestrase activity over most concentrations. Can the authors comment on in what concentration range these differences are most meaningful?

      Figure legend was corrected. Descriptions of panels C and D were fixed. Now these data are presented in panels A and B of a new Fig. 3. In our opinion differences in sequestration are most meaningful at lower sHsp concentrations (in this case lower than 5 µM), as with high enough sHsp concentration even less effective sequestrases seem to be able to effectively sequester aggregated proteins. Comment about it was added to the main text (page 5, line 6)

      -"Ancestral proteins' interaction with the aggregated substrates was stronger than in the case of extant E. amylovora IbpA, but weaker than in the case of extant E. coli IbpA (Fig. 1C)." - Is this referring to Fig. 1C, or to the unlabelled panel on the bottom right panel of Fig 1 (that is not referred to in the legend)? Can the authors comment on why they think the 2 ancestral proteins are much more similar to each other than they are to either of the native IbpAs?

      Due to our mistake descriptions of panels C and D were switched.

      Figure 1 was rearranged and split into Figures 1 and 3. Former figure S1 (full phylogeny) was inserted into the main text, as Fig. 2, per request of reviewer #1. Former panel 1D (now 3B) was rearranged, as graph was not apparent to be a part of that panel and looked as if it was unlabeled.

      The fact that the two ancestral proteins are more similar to each other than to the extant E. coli and E. amylovora proteins in their interaction with model substrate might be caused by higher sequence identity between the two ancestral proteins than between ancestral and extant proteins (10 amino acid differences between AncA0 and AncA1 compared to 20 differences between AncA1 and IbpA from E. amylovora or 11 differences between AncA0 and IbpA from E. coli). One also has to remember that this property is only one aspect of sHsp activity – proteins AncA0 and AncA1 are much less similar to each other if other activities such as sequestrase activity are considered. Substrate affinity and sequestrase activity are connected to each other, but there isn’t a strict correlation, as can be seen in the case of free ACD domains, which strongly bind aggregated substrate while effectively lacking sequestrase activity (fig. 5 A, fig. 5 – figure supplement 4 A,B).

      -Figure 1E should have E. coli IbpA and IbpB, by themselves, included for comparison. Strangely, it seems, by comparison to Fig 1B, that the "inhibitory" activity of A0 is not present in the E. coli protein, and the authors should comment on this. Similarly, A1 disaggregation looks like it might not be significantly different than the E. coli protein. Can the authors comment on why disaggregation might be so low in A1 compared to E.amy?

      E. coli IbpA alone was added to Fig. 1E (Fig. 3C in the new version) as suggested.

      AncA1 indeed exhibits similar activity to extant IbpA from E. coli, which, at the conditions of the experiment, does not possess inhibitory effect observed for AncA0. This suggests that:

      -There was an additional increase in ability to stimulate luciferase disaggregation between AncA1 and extant IbpA from E. amylovora

      -There was also an increase of ability to stimulate luciferase refolding between AncA0 and extant E. coli IbpA, albeit to a significantly lesser degree than in the Erwiniaceae branch.

      It is quite likely that after separation of Erwiniaceae and Enterobacteriaceae sHsp systems, they underwent further optimization through evolution. This might have led to observed higher effectiveness of modern IbpAs from both clades in refolding stimulation in comparison to the reconstructed ancestral proteins.

      Despite the above, effects of substitutions on positions 66 and 109 on activities of the extant E. coli and E. amylovora proteins suggests that the two identified positions still play key role in differentiating extant IbpAs from Erwiniaceae and Enterobacteriaceae.

      Nevertheless, additional mutations that lead to increased ability to stimulate luciferase reactivation must have occurred in both Erwiniaceae and Enterobacteriaceae branches of the phylogeny during evolution. These substitutions would be a worthwhile subject of further study.

      -Fig 1D - lizate should be lysate.

      The typo was corrected.

      -What is the bottom right panel in Fig 1? It doesn't seem to be referred to in the legend.

      This panel was intendent to be the part of figure 1D, but it was not clearly visible. This figure was rearranged to make it clearer. Now these data are presented as Fig. 3B.

      -Sequences are provided for the ancestral proteins, but I don't see them anywhere for the alternative ancestral proteins. How similar are the Anc proteins to the AltAlls? If they are very similar, this may not tell us anything about "robustness".

      Sequences of alternative proteins are added as a figure supplement (Fig. 3 - figure supplement 2). Full sequences of ML and alternative ancestors with posterior probabilities for each reconstructed position are presented in supplementary file 3

      The testing of the robustness to statistical uncertainty was intended to test to what extent properties of reconstructed ancestral proteins could be influenced by uncertainty present in a given reconstruction due to probabilistic nature of the process. Relatively high similarity between ML and AltAll sequences would indicate low uncertainty of the reconstruction (most likely due to high conservation during evolution). In such a case similar properties of AltAll and ML proteins would simply indicate that they are robust to the level of uncertainty present in a given reconstruction (which may be low). It would not tell us much about “general” robustness to mutations, but it was not relevant to research questions considered.

      -If the functional gain by IbpA comes down to only two amino acid substitutions, I'm not convinced this would be meaningfully reflected in any tests of positive selection.

      After considering Reviewer #1’s comments about limitations of models used for selection analysis we added acknowledgment in the discussion (page 9, line 9 - 13) that results indicating positive selection in our dataset should not be considered conclusive (see answer to Reviewer #1’s public review below).

      -The full MSA should be provided as supplemental material.

      The full MSA in fasta format is presented in the supplementary file 1.

      -For the aggregate binding panels in Figs 3 and 4, it would be helpful to show the native and ancestral proteins for comparison. I know this is a bit redundant, as they're present in Fig 1, but I find it hard to judge the scale of change. This is especially important because A0 and A1 are very similar in Fig 1, so I want to see what kind of difference the 2 mutations make.

      Data presented in Fig. 3C (Fig. 5C in the new version) refer to the binding of α-crystallin domains (A0ACD and A0ACD Q66H G109D) and not full length sHsps to E. coli proteins aggregated on a BLI sensor. Our intention was to show the influence of the two crucial substitutions (Q66H G109D) on the properties of A0 ancestral α-crystallin domain.

      Figure 4 (Fig. 6 in the new version) represent the effects of the substitutions on the identified positions 66 and 109 on the properties of extant IbpA orthologs from E. coli and E. amylovora, showing that these two positions play a key role in differentiating properties of those extant proteins. Changes in binding to aggregated substrate caused by those substitutions, as shown in Figure 6 B,C (new version), are indeed larger than observed between AncA0 and AncA1, as shown in Fig. 3B (new version).

      One has to remember, however, that the experiment shown in Fig.3 (new version) shows the effects of all 10 amino acid changes between the nodes A0 and A1 and not only the two analyzed substitutions, as was the case in experiment shown in Fig. 6 B,C (new version). Moreover, due to relatively large number of differences between ancestral and extant sequences (11 differences between AncA0 and E. coli IbpA, 20 differences between AncA1 and E. amylovora IbpA), substitutions in the two experiments are introduced into different sequence context.

      Because of the above, we believe that direct comparison of the results obtained for ancestral proteins with the results obtained for substitutions introduced into extant proteins would not meaningfully contribute to answering the question of the role of analyzed substitution in the context of extant proteins, while decreasing clarity of presented information.

      -Some of the luciferase plots show a time course, but others just show a single %. What is the time point used for the single % plots?

      Information was added to appropriate figure legends that for experiments showing a single timepoint the luciferase activity was measured after 1h of refolding.

      Reviewer #3 (Recommendations For The Authors):

      1. In the Introduction, it would be beneficial to explore additional instances where this evolutionary simplification process has been observed in nature. Investigating the prevalence of this phenomenon and identifying other multi-protein systems that have undergone simplification could enhance the understanding of its significance and implications.

      The section of the introduction concerning gene loss and differential paralog retention was expanded with additional examples of gene loss that is considered adaptive (page 3 lines 1 - 12).

      1. I am intrigued by the reasons why certain organisms continue to maintain a two-protein system despite the viability of a single-protein system. This aspect is particularly relevant for bacteria, considering the fitness cost associated with maintaining extra gene copies. Do you have any hypotheses or theories that may shed light on this intriguing observation?

      Refolding of proteins from aggregates requires the functional cooperation of sHsps and chaperones from Hsp70 system and Hsp100 disaggregase. In two protein sHsps system one sHsp (IbpA) is specialized in substrate binding, while the second one (IbpB) possesses low substrate binding potential and enhances sHps dissociation from substrates (Obuchowski et al, 2019). Thus, the presence of IbpB reduces the amount of chaperones from Hsp70 system required to outcompete sHsps from aggregated substrates to initiate refolding process. The cost associated with maintaining extra sHsp gene copy (ibpB) in bacteria might be compensated by lower requirement for Hsp70 chaperones for efficient and fast protein refolding following stress conditions.

      In this study we have demonstrated how such a system could have been simplified to a single – protein system capable of efficient substrate sequestration as well as stimulation of reactivation. This indeed leads to the question why such single – protein system isn’t more prevalent in Enterobacterales.

      One possibility may be that there are very specific requirements for efficient reactivation by a single – protein sHsp system. We have shown that new, more efficient IbpA functionality observed in Erwiniaceae required at least two separate mutations. It is possible, that such combinations of two substitutions simply did not occur in Enterobacteriaceae clade, in which IbpA still required partner protein for efficient reactivation stimulation.

      One must also remember that experiments performed in this study were performed in vitro in a specific set of conditions, which most likely does not represent whole spectrum of challenges faced by different bacteria. It is possible that two – protein system has some other additional adaptive effects, counterbalancing the additional cost of gene maintenance. It was for example recently shown (Miwa & Taguchi, PNAS, 120 (32) e2304841120) that bacterial sHsps play an important role in regulation of stress response. Two – protein system could potentially allow for more complex regulation.

      1. Incorporating X-ray crystallization as an additional technique in the methodology would offer detailed molecular insights into the effects of Q66H and G109D substitutions on ACD-C-terminal peptide and ACD-substrate interactions. The inclusion of such data would further strengthen the results section and provide robust support for your findings. Since the x-ray data might be difficult to collect, the authors might think to get alphafold model or some rosetta score for the model to discuss the finding further.

      In response to reviewer comment we added the comparison of the structural models of AncA0 and AncA0 Q66H G109D ACD dimers complexed with the C-terminal peptides, representing middle structures of largest clusters obtained from equilibrium molecular dynamics simulation trajectories based on the AlphaFold2 prediction and in silico mutagenesis (Fig. 5 – figure supplement 2). Model comparison as well as C-terminal peptide – ACD contact analysis did not reveal any major changes in mode of peptide binding or α-crystallin domain conformation, although we do acknowledge that simulation timescale limits the conformational sampling.

      Reviewer #1 (Public Review):

      The work in this paper is in general done carefully. Reconstructions are done appropriately and the effects of statistical uncertainty are quantified properly. My only slight complaint is that I couldn't find statistics about posterior probabilities anywhere and that the sequences and trees do not seem to be deposited.

      Posterior probabilities for all positions of reconstructed proteins were added as a supplementary file 3. MSA of all sequences used for ancestral reconstruction as well as phylogenetic tree in Newick format were added as supplementary files 1 and 2, respectively.

      I would also have preferred to have the actual phylogeny in the main text. This is a crucial piece of data that the reader needs to see to understand what exactly is being reconstructed.

      Full phylogeny was added to the main text as Fig. 2.

      The paper identifies which mutations are crucial for the functional differences between the ancestors tested. This is done quite carefully - the authors even show that the same substitutions also work in extant proteins. My only slight concern was the authors' explanation of what these substitutions do. They show that these substitutions lower the affinity of the C-terminal peptide to the alpha-crystallin domain - a key oligomeric interaction. But the difference is very small - from 4.5 to 7 uM. That seems so small that I find it a bit implausible that this effect alone explains the differences in hydrodynamic radius shown in Figure S8. From my visual inspection, it seems that there is also a noticeable change in the cooperativity of the binding interaction. The binding model the authors use is a fairly simple logarithmic curve that doesn't appear to consider the number of binding sites or potential cooperativity. I think this would have been nice to see here.

      The binding model we used is equivalent to the Hill equation as it accounts for the variable slope of sigmoid function by inclusion of input scaling factor k, which is equivalent to the hill coefficient. Simple one site binding model and two site binding model were also considered but provided worse fits to the data than model including binding cooperativity. Not providing values of fitted parameter k was our mistake, and it was corrected (Fig. 5. with a legend). Additionally, output scaling parameter L is not necessary as fraction bound takes values from 0 to 1, therefore we have fitted the curves again without this parameter. The new values of fitted parameters are very similar to the previous ones. To make text more accessible to the reader, we have used a conventional form of Hill equation. Indeed, AncA0 Q66H G109D ACD displays higher binding cooperativity than more ancestral AncA0 ACD (hill coefficient 2.3 for AncA0 vs 3.7 for AncA0 Q66H G109D). Fitted values of Hill coefficients are higher than one can expect for 2-site ACD dimer, which is probably caused by an experimental setup of BLI, where C-terminal peptide is immobilized on the sensor and ACD is present in solution as bivalent analyte leading to emergence of avidity effects. Both cooperativity and avidity are reflected in the value of Hill coefficient, however as ligand density on the sensor is the same in all experiments only change in ACD binding cooperativity can account for observed difference in the value of Hill coefficients. Difference in the C-terminal peptide binding cooperativity may influence the process of sHsp oligomerization and assembly formation despite similar binding affinity, especially if avidity of multiple binding sites within oligomer is considered.

      In addition, we changed the legend to Figure S8 (now called Fig. 5 – figure supplement 4A ) to clarify the fact that the differences in average hydrodynamic radius are in fact ferly small. To highlight the observation that there are two populations of particles in AncA0 and AncA0 Q66H G109D measured at 25, 35 and 45 °C with different hydrodynamic diameters, we used % of intensity in DLS measurement. It allows us to show the change in the hydrodynamic diameter distribution that is relatively small. We recognize it was not properly explained in the article and added a clarification in figure description.

      Lastly, the authors use likelihood methods to test for signatures of selection. This reviewer is not a fan of these methods, as they are easily misled by common biological processes (see PMID 37395787 for a recent critique). Perhaps these pitfalls could simply be acknowledged, as I don't think the selection analysis is very important to the impact of the work.

      We thank the reviewer for pointing to the recent research about limitations of methods used in our work in selection analysis. As per recommendation we added acknowledgment of limitations of methods used to discussion (page 9, line 9 - 13), modifying wording of our conclusions to deemphasize significance of selection analysis results.

    1. Author Response:

      We thank the editors and reviewers for their time in reviewing our manuscript. We would like to post a brief response to the peer reviews at this stage, and we will revise the manuscript and re-post at a later time.

      The main concerns regarding our molecular dating approach consist of the limited number of marker genes used for phylogenetic reconstruction, the molecular clock model employed, and the calibrations used. Firstly, regarding the marker genes that we used in our phylogenetic reconstruction, we will point out that we have extensively benchmarked these methods in a previous study (Martinez-Gutierrez and Aylward, 2021). We initially planned on presenting all of these results together in the same manuscript, but we decided that benchmarking phylogenetic marker genes across all Bacteria and Archaea together with an extensive molecular dating analysis was too much for a single study, and we therefore divided the results into two papers. In short, we agree with R1 that the use of different marker genes will lead to marked differences in the posterior ages of our Bayesian molecular dating analysis; however, we demonstrated that several of the few marker genes shared between Bacteria and Archaea lack of a strong phylogenetic signal and therefore introduce topological biases in the final phylogeny (i.e., long branch attraction). Consequently, using poorly-performing marker genes for molecular dating does not add valuable information to the overall analysis.

      Secondly, regarding the autocorrelated Log-normal model used in our study (-ln on Phylobayes), we believe this is appropriate. Besides being biologically meaningful for our study, it represents a compromise between a relaxed model with rate variation across branches and the assumption of correlation between parent and descent branches (Thorne et al., 1998). In contrast, a fully uncorrelated model that assumes rate independence across branches would make our analysis extremely time-consuming and intractable given our study encompasses all of Bacteria and Archaea. Nonetheless we understand the concerns raised, and in a future manuscript we will include age estimates resulting from the CIR and UGAM models in order to explore the potential effect of model selection in posterior dates.

      Thirdly and lastly, we will point out that calibrations for molecular dating of Bacteria and Archaea are always highly controversial, and there are essentially no calibrations for the early evolution of life on Earth that would not be contested to some degree. Researchers are therefore left to use their best judgment and provide reasonable rationale, which we have done here. We understand that strong opinions abound in this area, and many researchers will disagree with our approach, but that alone does not invalidate our study. Moreover, the main novelty of our approach is the use of a large tree that combines Bacteria and Archaea; extensive benchmarking of different calibration points on such a large tree is not possible here as it may be on a smaller set. One of the main concerns is the use of the age estimate of the Great Oxidation Event (GOE, 2.4 Ga) as minimum and maximum constraints for oxygenic Cyanobacteria, and Ammonia Oxidizing Archaea and aerobic Marinimicrobia, respectively. We agree that oxygen may have existed before the GOE as proposed previously (e.g., Ostrander et al., 2021), however; the strongest geochemical evidence so far (Mass Independent Fractionation of Sulfur, MIFs, (Farquhar et al., 2000)) indicates a significant accumulation of oxygen around that time. We therefore feel that this is a reasonable calibration to use for microbial lineages that have a physiology that is tightly linked to the production or consumption of oxygen. Similar reasoning has been used in other molecular dating studies, so our logic is not out of step with much research in the field (Liao et al., 2022; Ren et al., 2019).

      Due to the limitations of molecular dating studies of microorganisms, we have been very careful to avoid strong conclusions based on the absolute dates we calculated, and the primary interest of readers will likely be the relative divergence times of the marine clades we study (i.e., the overall timeline of microbial diversification in the ocean). We will provide a more in-depth assessment of models and calibrations for Bacteria and Archaea in a future draft, but in the meantime we hope to convey that our study is not without merit despite the substantial challenges of research in this area.

      References:

      • Farquhar J, Bao H, Thiemens M. 2000. Atmospheric influence of Earth’s earliest sulfur cycle. Science 289:756–759.
      • Liao T, Wang S, Stüeken EE, Luo H. 2022. Phylogenomic Evidence for the Origin of Obligate Anaerobic Anammox Bacteria Around the Great Oxidation Event. Mol Biol Evol 39. doi:10.1093/molbev/msac170
      • Martinez-Gutierrez CA, Aylward FO. 2021. Phylogenetic Signal, Congruence, and Uncertainty across Bacteria and Archaea. Mol Biol Evol 38:5514–5527.
      • Ren M, Feng X, Huang Y, Wang H, Hu Z, Clingenpeel S, Swan BK, Fonseca MM, Posada D, Stepanauskas R, Hollibaugh JT, Foster PG, Woyke T, Luo H. 2019. Phylogenomics suggests oxygen availability as a driving force in Thaumarchaeota evolution. ISME J 13:2150–2161.
      • Ostrander CM, Johnson AC, Anbar AD. 2021. Earth's first redox revolution. Annu Rev Earth Planet Sci. 49, 337-366.
      • Thorne JL, Kishino H, Painter IS. 1998. Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657.
    2. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your time and effort in handling and reviewing our manuscript. We have responded to all comments below.

      Reviewer #1 (Public Review):

      Martinez-Gutierrez and colleagues presented a timeline of important bacteria and archaea groups in the ocean and based on this they correlated the emergence of these microbes with GOE and NOE, the two most important geological events leading to the oxygen accumulation of the Earth. The whole study builds on molecular clock analysis, but unfortunately, the clock analysis contains important errors in the calibration information the study used, and is also oversimplified, leaving many alternative parameters that are known to affect the posterior age estimates untested. Therefore, the main conclusion that the oxygen availability and redox state of the ocean is the main driver of marine microbial diversification is not convincing.

      We do not conclude that “oxygen availability and redox state of the ocean is the main driver of marine microbial diversification”. Our conclusion is much more nuanced. We merely discuss our findings in light of the major oxygenation events and oxygen availability (among other things) given the important role this molecule has played in shaping the redox state of the ocean.

      Regarding the methodological concerns, to address them we have provided additional analyses to account for different clock models and calibration points.

      Basically, what the molecular clock does is to propagate the temporal information of the nodes with time calibrations to the remaining nodes of the phylogenetic tree. So, the first and the most important step is to set the time constraints appropriately. But four of the six calibrations used in this study are debatable and even wrong.

      (1) The record for biogenic methane at 3460 Ma is not reliable. The authors cited Ueno et al. 2006, but that study was based on carbon isotope, which is insufficient to demonstrate biogenicity, as mentioned by Alleon and Summons 2019.

      Thank you for pointing out the limitations of using the geochemical evidence of methane as calibrations. Indeed, several commentaries have suggested that the biotic and abiotic origin of the methane reported by Ueno et al. are equally plausible (Alleon and Summons, 2019; Lollar and McCollom, 2006), however; we used that calibration as a minimum for the presence of life on Earth, not methanogenesis. Despite the controversy regarding the origin of methane, there are other lines of evidence suggesting the presence of life around ~3.4 Ga. For example stromatolites from the Dresser Formation, Pilbara, Western Australia (Djokic et al., 2017; Walter et al., 1980; Buick and Dunlop, 1990), and more recently (Hickman-Lewis et al., 2022). To avoid confusion, we have added a more extended explanation for the use of that calibration and additional evidence of life around that time in Table 1 and lines 100-104.

      (2) Three calibrations at Aerobic Nitrososphaerales, Aerobic Marinimicrobia, and Nitrite oxidizing bacteria have the same problem - they are all assumed to have evolved after the GOE where the Earth started to accumulate oxygen in the atmosphere, so they were all capped at 2320 Ma. This is an important mistake and will significantly affect the age estimates because maximum constraint was used (maximum constraint has a much greater effect on age estimates and minimum constraint), and this was used in three nodes involving both Bacteria and Archaea. The main problem is that the authors ignored the numerous evidence showing that oxygen can be produced far before GOE by degradation of abiotically-produced abundant H2O2 by catalases equipped in many anaerobes, also produced by oxygenic cyanobacteria evolved at least 500 Ma earlier than the onset of GOE (2500 Ma), and even accumulated locally (oxygen oasis). It is well possible that aerobic microbes could have evolved in the Archaean.

      We appreciate the suggestion of assessing the validity of the calibrations used in our analyses. We initially evaluated the informative power of the priors used for the Bayesian molecular dating (Supplemental File 5), and found that the only calibration that lacked enough information for the purposes of our study was Ammonia Oxidizing Archaea (AOA). In contrast to previous evidence (Ren et al., 2019; Yang et al., 2021), we associate this finding to the potential earlier diversification of AOA. Due to the limitations of several of the calibrations used, we performed an additional molecular dating analysis on 1000 replicate trees using a Penalized Likelihood strategy. This analysis consisted in excluding the calibrations that assumed the presence of oxygen as a maximum constraint. Our analysis shows similar age estimates of the marine microbial clades regardless of the exclusion of these calibrations (Supplemental File 8; TreePL Priors set 2). Our findings thus suggest that the age estimates reported in our study are consistent regardless of whether or not the presence of oxygen is used to calibrate several nodes in the tree. We describe the results of this analysis in lines 490-499 and include estimates in Supplemental File 8. Our results are therefore robust regardless of the use of these somewhat controversial calibrations.

      Once the phylogenetic tree is appropriately calibrated with fossils and other time constraints, the next important step is to test different clock models and other factors that are known to significantly affect the posterior age estimates. For example, different genes vary in evolutionary history and evolutionary rate, which often give very different age estimates. So it is very important to demonstrate that these concerns are taken into account. These are done in many careful molecular dating studies but missing in this study.

      We agree that the selection of marker genes will have a profound impact on the final age estimates. First, it is important to understand that very few genes present in modern Bacteria and Archaea can be traced back to the Last Universal Common Ancestor, so there are very few genes to use for this purpose. Studies that focus on particular groups of Bacteria and Archaea may have larger selections of genes to choose from, but for our purposes there are only about ~40 different genes - mostly encoding for ribosomal proteins, RNA polymerase subunits, and tRNA synthetases - that can be use for this purpose (Creevey et al., 2011; Wu and Scott, 2012). In a previous study we have extensively benchmarked methods for the reconstruction of high-resolution phylogenetic trees of Bacteria and Archaea using these genes (Martinez-Gutierrez and Aylward, 2021). Our analyses demonstrated that some of these genes (mainly tRNA synthetases) have undergone ancient lateral gene transfer events and are not suitable for deep phylogenetics or molecular dating. In this previous study we also evaluated different sets of marker genes to examine which provide the most robust phylogenetic inference. We arrived at a set of ribosomal proteins and RNA polymerase subunits that performs best for phylogenetic reconstruction, and we have used that in the current study.

      Furthermore, we tested the role of molecular dating model selection on the final Bayesian estimates by running four independent chains under the models UGAM and CIR, respectively. Overall, the results did not vary substantially compared with the ages obtained using the log-normal model reported on our manuscript (Supplemental File 8). The additional results are described in lines 478-488 and shown in Supplemental File 8. The clades that showed more variation when using different Bayesian models were SAR86, SAR11, and Crown Cyanobacteria (Supplemental File 8). Despite observing some differences in the age estimates when using different molecular models, the conclusion that the different marine microbial clades presented in our study diversified during distinct periods of Earth’s history remains. Moreover, the main goal of our study is to provide a relative timeline of the diversification of abundant marine microbial clades without focusing on absolute dates.

      Reviewer #2 (Public Review):

      In this paper, Martinez-Gutierrez and colleagues present a dated, multidomain (= Archaea+Bacteria) phylogenetic tree, and use their analyses to directly compare the ages of various marine prokaryotic groups. They also perform ancestral gene content reconstruction using stochastic mapping to determine when particular types of genes evolved in marine groups.

      Overall, there are not very many papers that attempt to infer a dated tree of all prokaryotes, and this is a distinctive and up-to-date new contribution to that oeuvre. There are several particularly novel and interesting aspects - for example, using the GOE as a (soft) maximum age for certain groups of strictly aerobic Bacteria, and using gene content enrichment to try to understand why and how particular marine groups radiated.

      Thank you for your thorough evaluation and comments on our manuscript.

      Comments

      One overall feature of the results is that marine groups tend to be quite young, and there don't seem to be any modern marine groups that were in the ocean prior to the GOE. It might be interesting to study the evolution of the marine phenotype itself over time; presumably some of the earlier branches were marine? What was the criterion for picking out the major groups being discussed in the paper? My (limited) understanding is that the earliest prokaryotes, potentially including LUCA, LBCA and LACA, was likely marine, in the sense that there would not yet have been any land above sea level at such times. This might merit discussion in the paper. Might there have been earlier exclusively marine groups that went extinct at some point?

      Thank you for pointing this out - this is a very interesting idea.<br /> Firstly, the major marine lineages that we study here have largely already been defined in previous studies and are known to account for a large fraction of the total diversity and biomass of prokaryotes in the ocean. For example, Giovannoni and Stingl described most of these groups previously when discussing cosmopolitan and abundant marine lineages (Giovannoni and Stingl, 2005). The main criteria to select the marine clades studied here are 1) these groups have large impacts in the marine biogeochemical cycles and represent a large fraction of the microbial biomass in the open ocean, 2) they have an appropriate representation on genomic databases such that they can be confidently included in a phylogenetic tree, 3) the clades included can be confidently classified as being marine, in the sense that consequently the last common ancestor had a marine origin. This is explained in lines 83-86. We were primarily interested in lineages that encompassed a broad phylogenetic breadth, and we therefore did not include many groups that can be found in the ocean but are also readily isolated from a range of other environments (i.e., Pseudomonas spp., some Actinomycetes, etc.).

      We agree that some of the earlier microbial branches in the Tree of Life were likely marine. The study of the marine origin of LUCA, LBCA, LACA, although interesting, is out of the scope of our study, and our results cannot offer any direct evidence of their habitat. We have therefore sought to focus on the origins of extant marine lineages.

      What do the stochastic mapping analyses indicate about the respective ancestors of Gracilicutes and Terrabacteria? At least in the latter case, the original hypothesis for the group was that they possessed adaptations to life on land - which seems connected/relevant to the idea of radiating into the sea discussed here - so it might be interesting to discuss what your analyses say about that idea.

      Thank you for your recommendation to perform additional analysis regarding the characterization of the ancestor of the superphyla Gracilicutes and Terrabacteria. We agree that this analysis would be very interesting, but we wish to focus the manuscript primarily on the marine clades in question, and other supergroups are listed in Figure 2 mainly for context. However, we did check the results of the stochastic mapping analysis and we now report the list of genes predicted to be gained and lost at the ancestor of the Gracilicutes and Terrabacteria clades, however; it is out of the scope of this study.

      I very much appreciate that finding time calibrations for microbes is challenging, but I nonetheless have a couple of comments or concerns about the calibrations used here:

      The minimum age for LBCA and LACA (Nodes 1 and 2 in Fig. 1) was calibrated with the earliest evidence of biogenic methane ~3.4Ga. In the case of LACA, I suppose this reflects the view that LACA was a methanogen, which is certainly plausible although perhaps not established with certainty. However, I'm less clear about the logic of calibrating the minimum age of Bacteria using this evidence, as I am not aware that there is much evidence that LBCA was a methanogen. Perhaps the line of reasoning here could be stated more explicitly. An alternative, slightly younger minimum age for Bacteria could perhaps be obtained from isotope data ~3.2Ga consistent with Cyanobacteria (e.g., see https://pubmed.ncbi.nlm.nih.gov/30127539/).

      Thank you for pointing this out. We used the presence of methane as a minimum for life on Earth, not as a minimum for methanogenesis. Despite using this calibration as a minimum for the root of Bacteria and not having methanogenic representatives within this domain, there are independent lines of evidence that point to the presence of microbial life around the same time (~3.5 Ga, for example stromatolites from the Dresser Formation, Pilbara, Western Australia (~3.5 Ga) (Djokic et al., 2017; Walter et al., 1980; Buick and Dunlop, 1990), and more recently (Hickman-Lewis et al., 2022). We added a rationale for the use of the evidence of methane as a minimum age for life on Earth to the manuscript (Table 1 and 100104).

      I am also unclear about the rationale for setting the minimum age of the photosynthetic Cyanobacteria crown to the time of the GOE. Presumably, oxygen-generating photosynthesis evolved on the stem of (photosynthetic) Cyanobacteria, and it therefore seems possible that the GOE might have been initiated by these stem Cyanobacteria, with the crown radiating later? My confusion here might be a comprehension error on my part - it is possible that in fact one node "deeper" than the crown was being calibrated here, which was not entirely clear to me from Figure 1. Perhaps mapping the node numbers directly to the node, rather than a connected branch, would help? (I am assuming, based on nodes 1 and 2, that the labels are being placed on the branch directly antecedent to the node of interest)?

      Thank you so much for your suggestion. As pointed out, the calibrations used were applied at the crown node of existing Cyanobacterial clades, not at the stem of photosynthetic Cyanobacteria. We agree that photosynthesis and therefore the production of molecular oxygen may have been present in more ancient Cyanobacterial clades, however; these groups have not been discovered yet or went extinct. We have improved Fig. 1 to avoid confusion and now it is part of the updated version of our manuscript.

      Alleon J, Summons RE. 2019. Organic geochemical approaches to understanding early life. Free Radic Biol Med 140:103–112.

      Buick R, Dunlop JSR. 1990. Evaporitic sediments of Early Archaean age from the Warrawoona Group, North Pole, Western Australia. Sedimentology 37: 247-277.

      Creevey CJ, Doerks T, Fitzpatrick DA, Raes J, Bork P. 2011. Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS One 6:e22099.

      Djokic T, Van Kranendonk MJ, Campbell KA, Walter MR, Ward CR. 2017. Earliest signs of life on land preserved in ca. 3.5 Ga hot spring deposits. Nat Commun 8:15263.

      Giovannoni SJ, Stingl U. 2005. Molecular diversity and ecology of microbial plankton. Nature 437: 343-348. Hickman-Lewis K, Cavalazzi B, Giannoukos K, D'Amico L, Vrbaski S, Saccomano G, et al. 2023. Advanced two-and three-dimensional insights into Earth's oldest stromatolites (ca. 3.5 Ga): Prospects for the search for life on Mars. Geology 51: 33-38.

      Lollar BS, McCollom TM. 2006. Geochemistry: biosignatures and abiotic constraints on early life. Nature. Martinez-Gutierrez CA, Aylward FO. 2021. Phylogenetic Signal, Congruence, and Uncertainty across Bacteria and Archaea. Mol Biol Evol 38:5514–5527.

      Ren M, Feng X, Huang Y, Wang H, Hu Z, Clingenpeel S, Swan BK, Fonseca MM, Posada D, Stepanauskas R, Hollibaugh JT, Foster PG, Woyke T, Luo H. 2019. Phylogenomics suggests oxygen availability as a driving force in Thaumarchaeota evolution. ISME J 13:2150–2161.

      Walter M R, R Buick, JSR Dunlop. 1980. Stromatolites 3,400–3,500 Myr old from the North pole area, Western Australia. Nature 284: 443-445.

      Wu M, Scott AJ. 2012. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28:1033–1034.

      Yang Y, Zhang C, Lenton TM, Yan X, Zhu M, Zhou M, Tao J, Phelps TJ, Cao Z. 2021. The Evolution Pathway of Ammonia-Oxidizing Archaea Shaped by Major Geological Events. Mol Biol Evol 38:3637–3648.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This work successfully identified and validated TRLs in hepatic metastatic uveal melanoma, providing new horizons for enhanced immunotherapy. Uveal melanoma is a highly metastatic cancer that, unlike cutaneous melanoma, has a limited effect on immune checkpoint responses, and thus there is a lack of formal clinical treatment for metastatic UM. In this manuscript, the authors described the immune microenvironmental profile of hepatic metastatic uveal melanoma by sc-RNAseq, TCR-seq, and PDX models. Firstly, they identified and defined the phenotypes of tumor-reactive T lymphocytes (TRLs). Moreover, they validated the activity of TILs by in vivo PDX modeling as well as in vitro co-culture of 3D tumorsphere cultures and autologous TILs. Additionally, the authors found that TRLs are mainly derived from depleted and late-activated T cells, which recognize melanoma antigens and tumor-specific antigens. Most importantly, they identified TRLs-associated phenotypes, which provide new avenues for targeting expanded T cells to improve cellular and immune checkpoint immunotherapy.

      Strengths:

      Jonas A. Nilsson, et al. has been working on new therapies for melanoma. The team has also previously performed the most comprehensive genome-wide analysis of uveal melanoma available, presenting the latest insights into metastatic disease. In this work, the authors performed paired sc-RNAseq and TCR-seq on 14 patients with metastatic UM, which is the largest single-cell map of metastatic UM available. This provides huge data support for other studies of metastatic UM.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. That is, insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The author's description of the overall results of the article should be logical, not just a description of the observed phenomena. For example, the presentation related to the results of TRLs lacked logic. In addition, the title of the article emphasizes the three subtypes of hepatic metastatic UM TRLs, but these three subtypes are not specifically discussed in the results as well as the discussion section. The title of the article is not a very comprehensive generalization and should be carefully considered by the authors.

      We thank the reviewer for the critical reading of our work. We agree that there is need of more discussion and will do this in a revised version.

      The authors' claim that they are the first to use autologous TILs and sc-RNAseq to study immunotherapy needs to be supported by the corresponding literature to be more convincing. This can help the reader to understand the innovation and importance of the methodology.

      We will go through the manuscript and literature to see where there might be missing references.

      In addition, the authors argue that TILs from metastatic UM can kill tumor cells. This is the key and bridging point to the main conclusion of the article. Therefore, the credibility of this conclusion should be considered. Metastatic UM1 and UM9 remain responsive to autologous tumors under in vitro conditions with their autologous TILs.

      UM1 responds also in vivo in the subcutaneous model in the paper. We have also finished an experiment where we show that this model also responds in a liver metastasis model. These data will be added in next version of the paper.

      In contrast, UM22, also as a metastatic UM, did not respond to TIL treatment. In particular, the presence of MART1-responsive TILs. The reliability of the results obtained by the authors in the model of only one case of UM22 liver metastasis should be considered. The authors should likewise consider whether such a specific cellular taxon might also exist in other patients with metastatic UM, producing an immune response to tumor cells. The results would be more comprehensive if supported by relevant data.

      The reviewer has interpreted the results absolutely right, the allogenic and autologous MART1-specific TILs cells while reactive in vitro against UM22, cannot kill this tumor either in a subcutaneous or liver metastases model. We hypothesize this has to do with an immune exclusion phenotype and show weak immunohistochemistry that suggest this. We hope the addition of more UM1 data can be viewed as supportive of tumor-reactivity also in vivo.

      In addition, the authors in that study used previously frozen biopsy samples for TCR-seq, which may be associated with low-quality sequencing data, high risk of outcome indicators, and unfriendly access to immune cell information. The existence of these problems and the reliability of the results should be considered. If special processing of TCR-seq data from frozen samples was performed, this should also be accounted for.

      We agree with the reviewers and acknowledge we never anticipated the development of single-cell sequencing techniques when we started biobank 2013. We performed dead cell removal before the 10x Genomics experiment. We have also done extensive quality controls and believe that the data from the biopsies should be viewed as a whole and that quantitative intra-patient comparisons cannot be done.

      Reviewer #2 (Public Review):

      Summary:

      The study's goal is to characterize and validate tumor-reactive T cells in liver metastases of uveal melanoma (UM), which could contribute to enhancing immunotherapy for these patients. The authors used single-cell RNA and TCR sequencing to find potential tumor-reactive T cells and then used patient-derived xenograft (PDX) models and tumor sphere cultures for functional analysis. They discovered that tumor-reactive T cells exist in activated/exhausted T cell subsets and in cytotoxic effector cells. Functional experiments with isolated TILs show that they are capable of killing UM cells in vivo and ex vivo.

      Strengths:

      The study highlights the potential of using single-cell sequencing and functional analysis to identify T cells that can be useful for cell therapy and marker selection in UM treatment. This is important and novel as conventional immune checkpoint therapies are not highly effective in treating UM. Additionally, the study's strength lies in its validation of findings through functional assays, which underscores the clinical relevance of the research.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      The manuscript may pose challenges for individuals with limited knowledge of single-cell analysis and immunology markers, making it less accessible to a broader audience.

      The first draft of the manuscript (excluding methods) was written by a person (J.A.N) who is not a bioinformatician. It has been corrected to include the correct nomenclature where applicable but overall it is written with the aim to be understandable. We will make an additional effort for the next version.

    1. Author Response

      eLife assessment

      This work describes new validated conditional double KO (cDKO) mice for LRRK1 and LRRK2 that will be useful for the field, given that LRRK2 is widely expressed in the brain and periphery, and many divergent phenotypes have been attributed previously to LRRK2 expression. The manuscript presents solid data demonstrating that it is the loss of LRRK1 and LRRK2 expression within the SNpc DA cells that is not well tolerated, as it was previously unclear from past work whether neurodegeneration in the LRRK double Knock Out (DKO) was cell autonomous or the result of loss of LRRK1/LRRK2 expression in other types of cells. Future studies may pursue the biochemical mechanisms underlying the reason for the apoptotic cells noted in this study, as here, the LRRK1/LRRK2 KO mice did not replicate the dramatic increase in the number of autophagic vacuoles previously noted in germline global LRRK1/LRRK2 KO mice.

      We thank the editors for handling our manuscript and for the succinct summary that recognizes the significance of our findings and points out interesting directions for future studies. We also thank the reviewers for their helpful comments and positive evaluation of our work. Below, we have provided point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Summary:

      This is an important work showing that loss of LRRK function causes late-onset dopaminergic neurodegeneration in a cell-autonomous manner. One of the LRRK members, LRRK2, is of significant translational importance as mutations in LRRK2 cause late-onset autosomal dominant Parkinson's disease (PD). While many in the field assume that LRRK2 mutant causes PD via increased LRRK2 activity (i.e., kinase activity), it is not a settled issue as not all disease-causing mutant LRRK2 exhibit increased activity. Further, while LRRK2 inhibitors are under clinical trials for PD, the consequence of chronic, long-term LRRK2 inhibition is unknown. Thus, studies evaluating the long-term impact of LRRK deficit have important translational implications. Moreover, because LRRK proteins, particularly LRRK2, are known to modulate immune response and intracellular membrane trafficking, the study's results and the reagents will be valuable for others interested in LRRK function.

      Strengths:

      This report describes a mouse model where the LRRK1 and LRRK2 gene is conditionally deleted in dopaminergic neurons. Previously, this group showed that while loss of LRRK2 expression does not cause brain phenotype, loss of both LRRK1 and LRRK2 causes a later onset, progressive degeneration of catecholaminergic neurons and dopaminergic (DAergic) neurons in the substantia nigra (SN), and noradrenergic neurons in the locus coeruleus (LC). However, because LRRK genes are widely expressed with some peripheral phenotypes, it was unknown if the neurodegeneration in the LRRK double knockout (DKO) was cell autonomous. To rigorously test this question, the authors have generated a double conditional (cDKO) allele where both LRRK1 and LRRK2 genes were targeted to contain loxP sites. In my view, this was beyond what is usually required, as most investigators might might combine one KO allele with another floxed allele. The authors provide a rigorous validation showing that the Driver (DAT-Cre) is expressed in most DAergic neurons in the SN and that LRRK levers are decreased selectively in the ventral midbrain. Using these mice, the authors show that the number of DAergic neurons is normal at 15 but significantly decreased at 20 months of age. Moreover, the authors show that the number of apoptotic neurons is increased by ~2X in aged SN, demonstrating increased ongoing cell death, as well as an increase in activated microglia. The degeneration is limited to DAergic neurons as LC neurons are not lost as this population does not express DAT. Overall, the mouse genetics and experimental analysis were performed rigorously, and the results were statistically sound and compelling.

      Weaknesses:

      I only have a few minor comments. First is that in PD and other degenerative conditions, loss of axons and terminals occurs prior to cell bodies. It might be beneficial to show the status of DAergic markers in the striatum. Second, previous studies indicate that very little, if any, LRRK1 is expressed in SN DAergic neurons. This also the case with the Allen Brain Atlas profile. Thus, authors should discuss the discrepancy as authors seem to imply significant LRRK1 expression in DA neurons.

      We appreciate the reviewer’s recognition of the importance of the study as well as our rigorous experimental approaches and compelling results. Our responses to the reviewer's two minor comments are below.

      1) DAergic markers in the striatum:

      We performed TH immunostaining in the striatum and quantified TH+ DA terminals in the striatum of DA neuron-specific LRRK cDKO and littermate control mice at the ages of 15 and 24 months. We found similar levels of TH immunoreactivity in the striatum of LRRK cDKO and littermate control mice at the age of 15 months (p = 0.6565, unpaired Student’s t-test) and significantly reduced levels of TH immunoreactivity in the striatum of LRRK cDKO, compared to control mice at the age of 24 months (~19%, p = 0.0215), suggesting an age-dependent loss of dopaminergic terminals in the striatum of DA neuron-specific LRRK cDKO mice. These results are now included as Figure 5 of the revised manuscript.

      2) LRRK1 expression in the SNpc:

      It is shown in the Mouse brain RNA-seq dataset and the Allen Mouse brain ISH dataset (https://www.proteinatlas.org/ENSG00000154237-LRRK1/brain) that LRRK1 is broadly expressed in the mouse brain and is expressed at modest levels in the midbrain, comparable to the cerebral cortex. Indeed, our Western analysis also showed that levels of LRRK1 detected in the dissected ventral midbrain and the cerebral cortex of control mice are similar (40µg total protein loaded per lane; Figure 2E). Furthermore, we previously demonstrated that deletion of LRRK2 (or LRRK1) alone does not cause age-dependent loss of DA neurons in the SNpc, but deletions of both LRRK1 and LRRK2 result in age-dependent loss of DA neurons in LRRK DKO mice, indicating the functional importance of LRRK1 in the protection of DA neuron survival in the aging mouse brain (Tong et al., PNAS 2010, 107: 9879-9884, Giaime et al., Neuron 2017, 96: 796-807).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shen and collaborators described the generation of cDKO mice lacking LRRK1 and LRRK2 selectively in DAT-positive DAergic neurons. The Authors asked whether selective deletion of both LRRK isoforms could lead to a Parkinsonian phenotype, as previously reported by the same group in germline double LRRK1 and LRRK2 knockout mice (PMID: 29056298). Indeed, cDKO mice developed a late reduction of TH+ neurons in SNpc that partially correlated with the reduction of NeuN+ cells. This was associated with increased apoptotic cell and microglial cell numbers in SNpc.

      Unlike the constitutive DKO mice described earlier, however, cDKO mice did not replicate the dramatic increase in the number of autophagic vacuoles. The study supports the authors' hypothesis that loss of function rather than gain of function of LRRK2 leads to PD.

      Strengths:

      The study described for the first time a model where both the PD-associated gene LRRK2 and its homolog LRRK1 are deleted selectively in DAergic neurons, offering a new tool to understand the physiopathological role of LRRK2 and the compensating role of LRRK1 in modulating DAergic cell function.

      Weaknesses:

      The model has no construct validity since loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD. The evidence of a Parkinsonian phenotype in these cDKO mice is limited and should be considered preliminary.

      We thank the reviewer for commenting on the usefulness of this new PD mouse model.

      The reviewer did not include a reference citation for the statement "loss of function mutations of LRRK2 are well-tolerated in humans and do not lead to PD." It is possible that the reviewer was referring to a human population study (Whiffin et al., Nat Med 2020, 26: 869-877), entitled "The effect of LRRK2 lossof-function variants in humans." In this study, the authors analyzed 141,456 individuals sequenced in the Genome Aggregation Database, 49,960 exome-sequenced individuals from the UK Biobank, and more than 4 million participants in the 23andMe genotyped dataset, and they looked for human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants). The reported findings were interesting, and the authors were careful in stating their conclusions. However, this is not a linkage study of large pedigrees carrying a single, clear-cut loss-of-function mutation (e.g. large deletions of most exons and coding sequences). Therefore, the experimental evidence is not compelling enough to conclude whether loss-of-function mutations in LRRK2 cause PD or do not cause PD.

      The current report is an unbiased genetic study in an effort to reveal the normal physiological role of LRRK in dopaminergic neurons. It was not intended to produce Parkinsonian phenotypes in LRRK cDKO mice, which would be a biased effort. However, the unequivocal discovery of the cell intrinsic role of LRRK in the protection of DA neurons from age-dependent degeneration and apoptotic cell death should be considered seriously, while we contemplate the disease mechanism and how LRRK2 mutations may cause DA neuron loss and PD.

      Reviewer #3 (Public Review):

      Kang, Huang, and colleagues investigated the impact of LRRK1 and LRRK2 deletion, specifically in dopaminergic neurons, using a novel cDKO mouse model. They observed a significant reduction in DAergic neurons in the substantia nigra in their conditional LRRK1 and LRRK2 KO mice and a corresponding increase in markers of apoptosis and gliosis. This work set out to address a longstanding question within the field around the role and importance of LRRK1 and LRRK2 in DAergic neurons and suggests that the loss of both proteins triggers some neurodegeneration and glial activation.

      The studies included in this work are carefully performed and clearly communicated, but additional studies are needed to strengthen further the authors' claims around the consequences of LRRK2 deletion in DAergic neurons.

      1. In Figures 2E and F, the authors assess the protein levels of LRRK1 and LRRK2 in their cDKO mouse model to confirm the deletion of both proteins. They observe a mild loss of LRRK1 and LRRK2 signals in the ventral midbrain compared to wild-type animals. While this is not surprising given other cell types that still express LRRK1 and LRRK2 would be present in their dissected ventral midbrain samples, it does not sufficiently confirm that LRRK1 and LRRK2 are not expressed in DAergic neurons. Additional data is needed to more directly demonstrate that LRRK1 and LRRK2 protein levels are reduced in DAergic neurons, including analysis of LRRK1 and LRRK2 protein levels via immunohistochemistry or FACS-based analysis of TH+ neurons.

      We thank the reviewer for highlighting this incredibly important but often overlooked issue. We agree that the data in Figure 2E, F alone would be inadequate to validate DA neuron-specific LRRK cDKO mice.

      Cell type-specific conditional knockouts are a mosaic with KO cells mixed with other cell types expressing the gene normally. DA neuron-specific cDKO is particularly challenging, as DA neurons are a subset of cells embedded in the ventral midbrain. Rather than using immunostaining, which relies upon specific, good LRRK1 and LRRK2 antibodies for IHC, or FACS sorting of TH+ neurons followed by Western blotting (few cells, mixed cell populations, etc.), we chose a clean genetic approach by generating germline mutant mice carrying the deleted LRRK1 and LRRK2 alleles in all cells from the floxed LRRK1 and LRRK2 alleles. This approach permits characterization of these deletion mutations in germline mutant mice using molecular approaches that yield unambiguous results.

      We crossed CMV-Cre deleter mice with floxed LRRK1 and LRRK2 mice to generate respective germline LRRK1 KO and LRRK2 KO mice, in which all cells carry the LRRK1 or LRRK2 deleted alleles that are identical to those in DA neurons of cDKO mice. We then performed Northern, extensive RTPCR followed by sequencing, and Western analyses to show the absence of the full length LRRK1 and LRRK2 mRNA (Figure 1G, H, Figure 1-figure supplement 8 and 10), and the expected truncation of LRRK1 and LRRK2 mRNA (Figure 1-figure supplement 9 and 11), and the absence of LRRK1 and LRRK2 proteins (Figure 1I). These analyses together demonstrate that in the presence of Cre, either CMV-Cre expressed in all cells or DAT-Cre expressed selectively in DA neurons, the floxed LRRK1 and LRRK2 exons are deleted, resulting in null alleles. We further demonstrated the specificity of DAT-Cremediated recombination (deletion) by crossing DAT-Cre mice with a GFP reporter, showing that 99% TH+ DA neurons in the SNpc are also GFP+ (Figure 2A, B), indicating that DAT-Cre-mediated recombination of the floxed alleles occurs in essentially all TH+ DA neurons in the SNpc.

      1. The authors observed a significant but modest effect of LRRK1 and LRRK2 deletion on the number of TH+ neurons in the substantia nigra (12-15% loss at 20-24 months of age). It is unclear whether this extent of neuron loss is functionally relevant. To strengthen the impact of these data, additional studies are warranted to determine whether this translates into any PD-relevant deficits in the mice, including motor deficits or alterations in alpha-synuclein accumulation/aggregation.

      Yes, the reduction of DA neurons in the SNpc of cDKO mice at the age of 20-24 months is modest. At 15 months of age, the number of TH+ DA neurons in the SNpc is similar between LRRK cDKO mice (10,000 ± 141) and littermate controls (10,077 ± 310, p > 0.9999). At 20 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,948 ± 273) is significantly reduced (-12.7%), compared to control mice (10,244 ± 220, F1,46 = 16.59, p = 0.0002, two-way ANOVA with Bonferroni’s post hoc multiple comparisons, p = 0.0041). By 24 months of age, the number of DA neurons in the SNpc of LRRK cDKO mice (8,188 ± 452) relative to controls (9,675 ± 232, p = 0.0010) is further reduced (15.4%).

      Similar results were obtained by an independent quantification by another investigator, also conducted in a genotype blind manner, using the fractionator and optical dissector method, by which TH+ cells were quantified in 25% areas. These results are included as Figure 3-figure supplement 1 in the revised manuscript. Because of the more limited sampling, the quantification data are more variable, compared to quantification of TH+ cells in all areas of the SNpc, shown in Figure 3. With both methods, we quantified TH+ cells in every 10th sections encompassing the entire SNpc (3D structure), as sampling using every 5th or every 10th sections yielded similar results.

      We also performed behavioral analysis of LRRK cDKO mice and littermate controls at the ages of 10 and 25 months using the beam walk test (10 mm and 20 mm beam) and the pole test, which are sensitive to impairment of motor coordination. We found that LRRK cDKO mice at 10 months of age showed significantly more hindlimb errors (p = 0.0005, unpaired two-tailed Student’s t-test) and longer traversal time (p = 0.0075) in the 10mm beam walk test, compared to control mice, though their performance is similar in the 20 mm beam walk (hindlimb slips: p = 0.0733, traversal time: p = 0.9796) and in the pole test. At 22 months of age, the performance of LRRK cDKO mice and littermate controls is more variable and worse, compared to the younger mice, and is not significantly different between the genotypic groups. These results are now included as Figure 9 of the revised manuscript.

      1. The authors demonstrate that, unlike in the germline LRRK DKO mice, they do not observe any alterations in electron-dense vacuoles via EM. Given their data showing increased apoptosis and gliosis, it remains unclear how the loss of LRRK proteins leads to DAergic neuronal cell loss. Mechanistic studies would be insightful to understand better potential explanations for how the loss of LRRK1 and LRRK2 may impair cellular survival, and additional text should be added to the discussion to discuss potential hypotheses for how this might occur.

      We agree that this phenotypic difference between germline DKO and DA neuron-specific cDKO mice is intriguing, suggesting a non-cell autonomous contribution of LRRK in age-dependent accumulation of autophagic and lysosomal vacuoles in SNpc neurons of germline LRRK DKO mice. We will discuss the phenotypic difference further in the revised manuscript. We are generating microglial specific LRRK cDKO mice to investigate the role of LRRK in microglia and whether microglia contribute in a cell extrinsic manner to the regulation of the autophagy-lysosomal pathway in DA neurons.

      1. The authors discuss the potential implications of the neuronal cell loss observed in cDKO mice for LRRK1 and LRRK2 for therapeutic approaches targeting LRRK2 and suggest this argues that LRRK2 variants may exert their effects through a loss-of-protein function. However, all of the data generated in this work focus on a mouse in which both LRRK1 and LRRK2 have been deleted, and it is therefore difficult to make any definitive conclusions about the consequences of specifically targeting LRRK2. The authors note potential redundancy between the two LRRK proteins, and they should soften some of their conclusions in the discussion section around implications for the effects of LRRK2 variants. Human subjects that carry LRRK2 loss-of-function alleles do not have an increased risk for developing PD, which argues against the author's conclusions that LRRK2 variants associated with PD are loss-offunction. Additional text should be included in their discussion to better address these nuances and caution should be used in terms of extrapolating their data to effects observed with PD-linked variants in LRRK2.

      We will modify the discussion accordingly in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses: There appears to be a lack of basic knowledge of the process of spermatogenesis. For instance, the statement that "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary permatocytes around postnatal day (PND) 10 to 12." is inaccurate. The Aal cells are the spermatogonial chains, the two are not distinct from one another. In addition, the authors fail to mention spermatogonial stem cells which form the basis for steady-state spermatogenesis. The authors also do not acknowledge the well-known fact that, in the mouse, the first wave of spermatogenesis is distinct from subsequent waves. Finally, the authors do not mention the presence of both undifferentiated spermatogonia (aka - type A) and differentiating spermatogonia (aka - type B). The premise for the study they present appears to be the implication that little is known about the dynamics of chromatin during the development of spermatogonia. However, there are published studies on this topic that have already provided much of the information that is presented in the current manuscript.

      We acknowledge the reviewer’s criticism about the inaccuracy and incompleteness of some of the statements about spermatogonial cells and spermatogenesis. We will be improve the text accordingly in the reviewed manuscript. We will also clarify the premise of the study which was to complement existing datasets on spermatogonial cells by providing parallel transcriptomic and chromatin accessibility maps of high resolution from the same cell populations at early postnatal, late postnatal and adult stages collected from single individuals (for adults). These features make our datasets comprehensive and an important additional resource for people in the community. We will also revise the description of published studies to be more inclusive.

      It is not clear which spermatogonial subtype the authors intended to profile with their analyses. On the one hand, they used PLZF to FACS sort cells. This typically enriches for undifferentiated spermatogonia. On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected. The authors cite multiple previously published studies of gene expression during spermatogenesis, including studies of gene expression in spermatogonia. It is not at all clear what the authors' data adds to the previously available data on this subject.

      The authors analyzed cells recovered at PND 8 and 15 and compared those to cells recovered from the adult testis. The PND 8 and 15 cells would be from the initial wave of spermatogenesis whereas those from the adult testis would represent steady-state spermatogenesis. However, as noted above, there appears to be a lack of awareness of the well-established differences between spermatogenesis occurring at each of these stages.

      The reviewer correctly points that our samples contain both undifferentiated spermatogonial stem cells and differentiated spermatogonia, which is expected from the chosen FACS strategy. We clearly mention the fact that our populations are mixed and that our samples are 85-95% PLZF+ enriched. We also acknowledge the possible presence of contaminating cells that may influence the results and data interpretation in the section “Limitations”. We believe that this does not diminish the value of the datasets. But to further increase their usefulness and improve their interpretation, we will conduct new analyses and apply computational methods to deconvolute our bulk RNA-seq datasets in silico (PMID: 37528411) using publicly available single-cell RNA-seq datasets. Such analyses shall correct for cell-type heterogeneity and provide information about the cellular composition of our cell preparations clarifying the representation of undifferentiated and differentiated spermatogonial cells and the possible presence of somatic cells.

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      We acknowledge that RNA-seq and ATAC-seq datasets like ours are observational and that their interpretation can be speculative. Nevertheless, our datasets represent an additional useful resource for the community because they are comprehensive and high resolution, and can be exploited for instance, for studies in environmental epigenetics and epigenetic inheritance examining the immediate and long-term effects of postnatal exposure and their dynamics. The depth of our RNA sequencing allowed detect transcripts with a high dynamic range, which has been limited with classical RNA sequencing analyses of spermatogonial cells and with single-cell analyses (which have comparatively low coverage). Further, our experimental pipeline is affordable (more than single cell sequencing approaches) and in the case of adults, provides data per animal informing on the intrinsic variability in transcriptional and chromatin regulation across males. These points will be discussed in the revised manuscript.

      The phenomenon of epigenetic priming is discussed, but then it seems that there is some expression of surprise that the data demonstrate what this reviewer would argue are examples of that phenomenon. The authors discuss the "modest correspondence between transcription and chromatin accessibility in SCs." Chromatin accessibility is an example of an epigenetic parameter associated with the primed state. The primed state is not fully equivalent to the actively expressing state. It appears that certain histone modifications along with transcription factors are critical to the transition between the primed and actively expressing states (in either direction). The cell types that were investigated in this study are closely related spermatogenic, and predominantly spermatogonial cell types. It is very likely that the differentially expressed loci will be primed in both the early (PND 8 or 15) and adult stages, even though those genes are differentially expressed at those stages. Thus, it is not surprising that there is not a strict concordance between +/- chromatin accessibility and +/- active or elevated expression.

      The reviewer is right that a strict concordance between chromatin accessibility and transcription is not necessarily expected. The text of the revised manuscript will be modified accordingly. However, we would like to note that our data strengthen the observations made by others that in cells from the same lineage, the global landscape of chromatin accessibility is more stable than their transcriptional programs over developmental time.

      Reviewer #2 (Public Review):

      The objective of this study from Lazar-Contes et al. is to examine chromatin accessibility changes in "spermatogonial cells" (SCs) across testis development. Exactly what SCs are, however, remains a mystery. The authors mention in the abstract that SCs are undifferentiated male germ cells and have self-renewal and differentiation activity, which would be true for Spermatogonial STEM Cells (SSCs), a very small subset of total spermatogonia, but then the methods they use to retrieve such cells using antibodies that enrich for undifferentiated spermatogonia encompass both undifferentiated and differentiating spermatogonia. Data in Fig. 1B prove that most (85-95%) are PLZF+, but PLZF is known to be expressed both by undifferentiated and differentiating (KIT+) spermatogonia (Niedenberger et al., 2015; PMID: 25737569). Thus, the bulk RNA-seq and ATAC-seq data arising from these cells constitute the aggregate results comprising the phenotype of a highly heterogeneous mixture of spermatogonia (plus contaminating somatic cells), NOT SSCs. Indeed, Fig. 1C demonstrates this by showing the detection of Kit mRNA (a well-known marker of differentiating spermatogonia - which the authors claim on line 89 is a marker of SCs!), along with the detection of markers of various somatic cell populations (albeit at lower levels).

      The reviewer is correct that our spermatogonial cell populations are mixed and include undifferentiated and differentiated cells, hence the name of spermatogonia (SCs), and probably also contain some somatic cells. We acknowledge that this is a limitation of our isolation approach. To circumvent this limitation, we will conduct in silico deconvolution analysis using publicly available single cell RNA sequencing datasets to obtain information about markers corresponding to undifferentiated and differentiated spermatogonia cells, and somatic cells. These additional analyses will provide information about the cellular composition of the samples and clarify the representation of undifferentiated and differentiated spermatogonial cells and other cells.

      This admixture problem influences the results - the authors show ATAC-seq accessibility traces for several genes in Fig. 2E (exhibiting differences between P15 and Adult), including Ihh, which is not expressed by spermatogenic cells, and Col6a1, which is expressed by peritubular myoid cells. Thus, the methods in this paper are fundamentally flawed, which precludes drawing any firm conclusions from the data about changes in chromatin accessibility among spermatogonia (SCs?) across postnatal testis development.

      The reviewer raises concern about the lack of correspondence between chromatin accessibility and expression observed for some genes, arguing that this precludes drawing firm conclusions. However, a dissociation between chromatin accessibility and gene expression is normal and expected since chromatin accessibility is only a readout of protein deposition and occupancy e.g. by transcription factors, chromatin regulators, nucleosomes, at specific genomic loci that does not give functional information of whether there is ongoing transcriptional activity or not. A gene that is repressed or poised for expression can still show clear signal of chromatin accessibility at regulatory elements. The dissociation between chromatin accessibility and transcription has been reported in many different cells and conditions (PMID: 36069349, PMID: 33098772) including in spermatogonial cells (PMID: 28985528) and in gonads in different species (PMID: 36323261). Therefore, the dissociation between accessibility and transcription is not a reason to conclude that our data are flawed.

      In addition, there already are numerous scRNA-seq datasets from mouse spermatogenic cells at the same developmental stages in question.

      This is true but full transcriptomic profiling like ours on cell populations provides different transcriptional information that is deeper and more comprehensive. Our datasets identified >17,000 genes while scRNA-seq typically identifies a few thousands of genes. Our analyses also identified full length transcripts, variants, isoforms and low abundance transcripts. These datasets are therefore a valuable addition to existing scRNA-seq.

      Moreover, several groups have used bulk ATAC-seq to profile enriched populations of spermatogonia, including from synchronized spermatogenesis which reflects a high degree of purity (see Maezawa et al., 2018 PMID: 29126117 and Schlief et al., 2023 PMID: 36983846 and in cultured spermatogonia - Suen et al., 2022 PMID: 36509798) - so this topic has already begun to be examined. None of these papers was cited, so it appears the authors were unaware of this work.

      We apologize for not mentioning these studies in our manuscript, we will do so in the revised version.

      The authors' methodological choice is even more surprising given the wealth of single-cell evidence in the literature since 2018 demonstrating the exceptional heterogeneity among spermatogonia at these developmental stages (the authors DID cite some of these papers, so they are aware). Indeed, it is currently possible to perform concurrent scATAC-seq and scRNA-seq (10x Genomics Multiome), which would have made these data quite useful and robust. As it stands, given the lack of novelty and critical methodological flaws, readers should be cautioned that there is little new information to be learned about spermatogenesis from this study, and in fact, the data in Figures 2-5 may lead readers astray because they do not reflect the biology of any one type of male germ cell. Indeed, not only do these data not add to our understanding of spermatogonial development, but they are damaging to the field if their source and identity are properly understood. Here are some specific examples of the problems with these data:

      1. Fig. 2D - Gata4 and Lhcgr are not expressed by germ cells in the testis.

      2. Fig. 3A - WT1 is expressed by Sertoli cells, so the change in accessibility of regions containing a WT1 motif suggests differential contamination with Sertoli cells. Since Wt1 mRNA was differentially high in P15 (Fig. 3B) - this seems to be the most likely explanation for the results. How was this excluded?

      3. Fig. 3D - Since Dmrt1 is expressed by Sertoli cells, the "downregulation" likely represents a reduction in Sertoli cell contamination in the adult, like the point above. Did the authors consider this?

      We acknowledge that concurrent scATAC-seq and scRNA-seq analyses have been done by others but our datasets add to these analyses by providing concurrent chromatin and expression analyses at high resolution in spermatogonial populations at 2 postnatal stages and in adulthood and from individual males (for adult cells). This provides a set of information that adds to the current literature. Doing such analyses in single cells is not tractable financially so we offer an economical alternative that delivers high resolution datasets for these different time points. Our analyses were not meant to study spermatogenesis but to provide a thorough and comprehensive profiling of chromatin accessibility and transcription in postnatal and adult spermatogonial cells.

      Our data need careful interpretation to avoid any misleading conclusions. Fig. 2D does not show expression but accessibility which does not tell if a particular locus or gene is expressed or not. Thus, candidates like Gata4 and Lhcgr shown in Fig. 2D are simply associated with DARs but this does not mean that they are expressed. Likewise in Fig. 3A, motifs refer to decreased accessibility and not to expression. Fig. 1C indicates that PND15 cells have low to no expression of 3 Sertoli cells markers (Vim, Tspan17 and Rhox), suggesting little contamination by Sertoli cells. The presence of WT1 in PND15 cells will however be examined more carefully and re-analysed by in silico deconvolution methods using single cell datasets for the revised manuscript. In Fig. 3D, differential contamination by Sertoli cells is possible, this will also be examined by deconvolution methods.

      Reviewer #3 (Public Review):

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases of postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using a cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seem sound but there are several concerns with the cell population isolated from testes and lack of acknowledgment for previous studies that have also performed ATAC-sequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question the validity of the interpretations and reduce the potential merit of the findings.

      I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      We thank the reviewer for the suggestion and will rename SCs into SPGs in the revised manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice.

      We will provide a rationale for the use of postnatal 8 and 15 stages in the revised manuscript. Briefly, these stages are interesting to study because early to mid postnatal life is a critical window of development for germ cells during which environmental exposure can have strong and persistent effects. The possibility that changes in germ cells can happen during this period and persist until adulthood is an important area of research linked to disciplines like epigenetic toxicology and epigenetic inheritance.

      The FACS sorting approach used was based on cell surface proteins that are not germline-specific so there were undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ-cell specific and single-cell RNA-seq analyses of testicular tissue have revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ-cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for the interpretation of the RNA-seq and ATAC-seq data that was generated.

      The reviewer is right that our cell populations likely contain undifferentiated and differentiated spermatogonial cells and a small percentage of somatic cells including Sertoli cells. As suggested, we examined the expression of the germ-cell marker Ddx4 in our datasets and observed that Ddx4 is highly expressed. It is indeed more highly expressed than the SSC marker Id4 (average log2CPM of 5 vs 8, respectively). We will include this information in the revised manuscript. Further, the deconvolution analyses that will be conducted are expected to clarify the cellular composition of our cell populations.

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATAC-seq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      We thank the reviewer for pointing out this study as well as other studies in human spermatogonia. We will cross-reference all of them in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aims to further resolve the history of speciation and introgression in Heliconius butterflies. The authors break the data into various partitions and test evolutionary hypotheses using the Bayesian software BPP, which is based on the multispecies coalescent model with introgression. By synthesizing these various analyses, the study pieces together an updated history of Heliconius, including a multitude of introgression events and the sharing of chromosomal inversions.

      Strengths:

      Full-likelihood methods for estimating introgression can be very computationally expensive, making them challenging to apply to datasets containing many species. This study provides a great example of how to apply these approaches by breaking the data down into a series of smaller inference problems and then piecing the results together. On the empirical side, it further resolves the history of a genus with a famously complex history of speciation and introgression, continuing its role as a great model system for studying the evolutionary consequences of introgression. This is highlighted by a nice Discussion section on the implications of the paper's findings for the evolution of pollen feeding.

      Weaknesses:

      The analyses in this study make use of a single method, BPP. The analyses are quite thorough so this is okay in my view from a methodological standpoint, but given this singularity, more attention should be paid to the weaknesses of this particular approach.

      In the Discussion, we have now added a discussion of the limitations of our approach in the section 'Approaches for estimating species phylogeny with introgression from whole-genome sequence data: advantages and limitations.'

      Additionally, little attention is paid to comparable methods such as PhyloNet and their strengths and weaknesses in the Introduction or Discussion.

      We have also mentioned other methods (PhyloNet and starBEAST) in our Discussion. Our attempts to obtain usable estimates from PhyloNet were unsuccessful. In another study, the full likelihood version of PhyloNet (comparable in intent to the BPP methodology used here) could run with only small datasets of ~100 loci; see Edelman et al. (2019).

      BPP reduces computational burden by fixing certain aspects of the parameter space, such as the species tree topology or set of proposed introgression events. While this approach is statistically powerful, it requires users to make informed choices about which models to test, and these choices can have downstream consequences for subsequent analyses. It also might not be as applicable to systems outside of Heliconius where less previous information is available about the history of speciation and introgression. In general, it is likely that most modelling decisions made in the study are justified, but more attention should be paid to how these decisions are made and what the consequences of them could be, including alternative models.

      We agree with the reviewer that inferring the species tree topology and placing introgression events on the species tree, although well justified here, may be challenging in many groups of organisms and may affect downstream analyses. We now discuss this as a limitation of our approach in the Discussion. In general, the initial MSC analysis without gene flow should provide information about possible species trees and introgression events. We can construct multiple introgression models and perform parameter estimation and model comparison to decide which best fits the data. This is summarized in the last paragraph of the section 'Approaches for estimating species phylogeny with introgression from whole-genome sequence data: advantages and limitations.' It would, of course, be nice to have a completely unsupervised method that could work with large phylogenies, but this is currently computationally impossible.

      • Co-estimating histories of speciation and introgression remains computationally challenging. To circumvent this in the study, the authors first estimate the history of speciation assuming no gene flow in BPP. While this approach should be robust to incomplete lineage sorting and gene tree estimation, it is still vulnerable to gene flow. This could result in a circular problem where gene flow causes the wrong species tree to be estimated, causing the true species tree to be estimated as a gene flow event.

      The goal of this initial analysis is to obtain a list of possible species trees with introgression events. We assume that gene flow results in a topology that is informative about the lineages involved. We also focus on common MAP trees with high posterior probabilities as less frequent trees or trees with low posterior probabilities reflect high uncertainty and are more likely to be erroneous. A difficulty is to decide which tree topology is most likely to be the true species tree. We summarize our approach in the Discussion.

      This is a flaw that this approach shares with summary-statistic approaches like the D-statistic, which also require an a-priori species tree.

      In a sense, this is true, but BPP is more flexible because it can be used to explore an arbitrary introgression model on any type of tree, while summary methods like D-statistic assume a specific species phylogeny with a particular introgression between nonsister lineages as well as fixed sampling configurations. Furthermore, as shown in the paper, we can compare different assumed trees, and test between them; we do this repeatedly in the paper for difficult branch placement issues. In contrast, summary methods such as the D-statistic works with species quartets only and do not work with either smaller or larger species trees.

      Enrichment of particular topologies on the Z chromosome helps resolve the true history in this particular case, but not all datasets will have sex chromosomes or chromosome-level assemblies to test against.

      Yes, we have the privilege of having chromosome-level assemblies available for Heliconius. In general, a spatial pattern of species tree estimates across genomic blocks can be informative about possible topologies that could represent the true species relationship. Then these candidate species trees can be tested by fitting different introgression models (as in Figure 1D,E) or by using the recombination rate argument (Figure 1F), which prefers trees common in low recombination rate regions of the genome, although this requires knowing a recombination rate map. In our case, we used a chromosome-level recombination rate per base pair, which is negatively correlated with the chromosome size. We have clarified this in the text. Ultimately, multiple lines of evidence should be examined before deciding on the most likely species tree. We now mention these potential difficulties with applying our methods to other datasets as limitations of our approach in the Discussion.

      • The a-priori specification of network models necessarily means that potentially better-fitting models to the data don't get explored. Models containing introgression events are proposed here based on parsimony to explain patterns in gene tree frequencies. This is a reasonable and common assumption, but parsimony is not always the best explanation for a dataset, as we often see with phylogenetic inference. In general, there are no rigorous approaches to estimating the best-fitting number of introgression events in a dataset.

      Joint inference of species topologies and possible introgression events remains computationally challenging. PhyloNet implements this joint inference but is limited to small datasets (<100 loci) and we found it to be unreliable.

      Likewise, the study estimates both pulse and continuous introgression models for certain partitions, though there is no rigorous way to assess which of these describes the data better.

      The Bayes factor can be used to compare different models fitted to the same data, for example, different MSC-I models with different introgression events, or MSC-I models with gene flow in pulses versus MSC-M models with continuous gene flow. We did not attempt this as it was clear to us that a better model would include both modes of gene flow, but such an option is not currently implemented in any software. Rather, we relied on our exploratory analysis (BPP MSC and 3s) and previous knowledge to inform a likely introgression model. In the case of groups that we fitted the MSC-M models, we chose to provide an intuitive justification as to why they might be more realistic than the MSC-I model without formally performing model selection.

      • Some aspects of the analyses involving inversions warrant additional consideration. Fewer loci were able to be identified in inverted regions, and such regions also often have reduced rates of recombination. I wonder if this might make inferences of the history of inverted regions vulnerable to the effects of incomplete lineage sorting, even when fitting the MSC model, due to a small # of truly genealogically independent loci.

      We agree with the reviewer that it is challenging to infer the history of a small region of the genome, such as the inversions studied here. Indeed, the presence of only a few loci in the 15b inversion means there is only limited information in the data for the species tree, as reflected in the low posterior probabilities for the MAP tree (Figure 3A). The effect of using tightly linked loci in the inversion should be increased uncertainty in the estimates, but not a systematic bias towards any particular species tree topology. Since major patterns of species relationships in each of the 15a, 15b and 15c regions are clear, we do not expect these effects to strongly influence our conclusions.

      Additionally, there are several models where introgression events are proposed to explain the loss of segregating inversions in certain species. It is not clear why these scenarios should be proposed over those in which the inversion is lost simply due to drift or selection.

      We know that the 15b inversion is absent in most species except for H. numata and H. pardalinus, at least, and that introgression of the inversion occurred between these two species, based on previous studies such as Jay et al (2018) and our own analysis. Polymorphism at this inversion forms a well-known “supergene” that affects mimicry, and is maintained by documented balancing selection in H. numata. Given this information, we propose a few possible scenarios of how the inversion might have originated, and when and where the introgression might have occurred, shown in Figure 3. In particular, the direction of introgression is something we test specifically. One way to test among these scenarios is to date the origin and introgression event of the inversion, but doing so properly is beyond the scope of this work. Nonetheless, we argue that it is at least likely that one difference between H. pardalinus and its sister species H. elevatus is the presence of the 15b inversion. Since other evidence shows that colour patterning loci in H. elevatus originated from an unrelated species, H. melpomene (i.e. the 15b and other non-inverted colour patterning loci), it is indeed likely that the inversion was “swapped out” by an uninverted sequence from H. melpomene during the formation of H. elevatus.

      We are aware that hypotheses such as these might appear highly elaborate and unparsimonious. But these are the conclusions where the data lead us. In the melpomene-silvanform clade, many speciation and introgression events occurred in short succession, and wild-caught hybrids prove that occasional hybridizations can occur across all 15 or so species in the group. We now detail how we have looked only for the major introgression patterns using a limited number of key speces. We leave fuller analyses for future work.

      In the main text, we have revised our discussion of the four proposed scenarios for 15b to improve clarity. We have also updated the introgression model from the melpomene-cydno clade to H. elevatus to be unidirectional based on the BPP results in Figure S18.

      Reviewer #2 (Public Review):

      Thawornwattana et al. reconstruct a species tree of the genus Heliconius using the full-likelihood multispecies coalescent, an exciting approach for genera with a history of extensive gene flow and introgression. With this, they obtain a species tree with H. aoede as the earliest diverging lineage, in sync with ecological and morphological characters. They also add resolution to the species relationships of the melpomene-silvaniform clade and quantify introgression events. Finally, they trace the origins of an inversion on chromosome 15 that exists as a polymorphism in H. numata, but is fixed in other species. Overall, obtaining better species tree resolutions and estimates of gene flow in groups with extensive histories of hybridization and introgression is an exciting avenue. Being able to control for ILS and get estimates between sister species are excellent perks. One overall quibble is that the paper seems to be best suited to a Heliconius audience, where past trees are easily recalled, or members of the different clades are well known.

      We thank the reviewer for the accurate summary and positive comments. Although our data and some of the discussion are specific to Heliconius, we believe our analysis framework will be useful to study species phylogeny and introgression in other taxa as well.

      Overall, applying approaches such as these to gain greater insight into species relationships with extensive gene flow could be of interest to many researchers. However, the conclusions could be strengthened with a bit more clarity on a few points.

      1) The biggest point of concern was the choice of species to use for each analysis. In particular the omission of H. ismenius in the resolution of the BNM clade species tree. The analysis of the chromosome 15 inversion seems to rely on the knowledge that H. ismenius is sister to H. numata, so without that demonstrated in the BNM section the resulting conclusions of the origin of that inversion are less interruptible.

      The choice of species to be included was mainly based on available high-quality genome resequence data from Edelman et al (2019), which were chosen to cover most of the major lineages within the genus. We agree that inclusion of H. ismenius would strengthen the analysis of the melpomene-silvaniform clade. In particular, it would be interesting to know which of only H. numata or H. numata+H. ismenius are responsible for the main source of genealogical variation across the genome in this group in Figure 2. The reviewer is correct in saying that we do assume that H. ismenius and H. numata are sister species. This relationship is supported by our analysis (Figure 3A) and previous analyses of genomic data, e.g. Zhang et al (2016), Cicconardi et al. (2023) and Rougemont et al. (2023). We made this clearer in the text:

      "Although this conclusion assumes that H. numata and H. ismenius are sister species while H. ismenius was not included in our species tree analysis of the melpomene-silvaniform clade (Figure 2), this sister relationship agrees with previous genomic studies of the autosomes and the sex chromosome (Zhang et al. 2016; Cicconardi et al. 2023; Rougemont et al. 2023)."

      2) An argument they make in support of the branching scenario where H. aoede is the earliest diverging branch is based on which chromosomes support that scenario and the key observation that less introgression is detected in regions of low recombination. Yet, they go no further to understand the relationship between recombination rate and species trees produced.

      We believe Figure 1F does examine this relationship, showing that trees under scenario 2 are more common in regions of the genome with lower recombination rates (i.e. in longer chromosomes). We added more clarification in the text where Figure 1F is mentioned. The relationship between recombination and introgression in Heliconius was earlier discovered and shown using windowed estimated gene trees in Martin et al. (2019) and in Edelman et al. (2019), so we did not re-test this here.

      3) How the loci were defined could use more clarity. From the methods, it seems like each loci could vary quite a bit in total bp length and number of informative sites. Understanding the data processing would make this paper a better resource for others looking to apply similar approaches.

      We added a new supplemental figure, Figure S20, to illustrate how coding and noncoding loci were extracted from the genome.

      Reviewer #3 (Public Review):

      The authors use a full-likelihood multispecies coalescent (MSC) approach to identify major introgression events throughout the radiation of Heliconius butterflies, thereby improving estimates of the phylogeny. First, the authors conclude that H. aoede is the likely outgroup relative to other Heliconius species; miocene introgression into the ancestor of H. aoede makes it appear to branch later. Topologies at most loci were not concordant with this scenario, though 'aoede-early' topologies were enriched in regions of the genome where interspecific introgression is expected to be reduced: the Z chromosome and larger autosomes. The revised phylogeny is interesting because it would mean that no extant Heliconius species has reverted to a non-pollen-feeding ancestral state. Second, the authors focus on a particularly challenging clade in which ancient and ongoing gene flow is extensive, concluding that silvaniform species are not monophyletic. Building on these results, a third set of analyses investigates the origin of the P1 inversion, which harbours multiple wing patterning loci, and which is maintained as a balanced polymorphism in H. numata. The authors present data supporting a new scenario in which P1 arises in H. numata or its ancestor and is introduced to the ancestor of H. pardilinus and H. elevatus - introgression in the opposite direction to what has previously been proposed using a smaller set of taxa and different methods.

      The analyses were extensive and methodologically sound. Care was taken to control for potential sources of error arising from incorrect genotype calls and the choice of a reference genome. The argument for H. aoede as the earliest-diverging Heliconius lineage was compelling, and analyses of the melpomene-silvaniform clade were thorough.

      The discussion is quite short in its current form. In my view, this is a missed opportunity to summarise the level of support and biological significance of key results. This applies to the revised Melpomenesilvaniform phylogeny and, in particular, the proposed H. numata origin of P1. It would be useful to have a brief overview of the relationships that remain unclear, and which data (if any) might improve estimates.

      We added a paragraph in the Discussion to summarize our key findings in 'An updated phylogeny of Heliconius', and discuss issues that remain uncertain.

      It was good to see the authors reflect on the utility of full-likelihood approaches more generally, though the discussion of their feasibility and superiority was at times somewhat overstated and reductive. Alternative MSC-based methods that use gene tree frequencies or coalescence times can be used to infer the direction and extent of introgression with accuracy that is satisfactory for a wide variety of research questions. In practice, a combination of both approaches has often been successful. Although full-likelihood approaches can certainly provide richer information if specific parameter estimates are of interest, they quickly become intractable in large species complexes where there is extensive gene flow or significant shifts in population size. In such cases, there may be hundreds of potentially important parameters to estimate, and alternate introgression scenarios may be impossible to disentangle. This is particularly challenging in systems, unlike Heliconius where there is little a priori knowledge of reproductive isolation, genome evolution, and the unique life history traits of each species. It would be useful for the authors to expand on their discussion of strategies that can simplify inference problems in such systems, acknowledging the difficulties therein.

      We agree that approximate methods based on summary statistics (e.g. gene tree topologies) are computationally much cheaper and are sometimes useful. We now discuss limitations of our approach regarding strategies for constructing possible introgression models, computational cost and analysis of large phylogenies, and modeling assumptions in the MSC framework in the first section of the Discussion.

      Reviewer #1 (Recommendations For The Authors):

      In addition to the comments raised in the public review, I have some minor suggestions:

      • In the Introduction, "Those methods have limited statistical power" implies summary-statistic methods have a high false negative rate for inferring the presence of introgression, which I don't think is true.

      We removed 'statistical' as we used the term power loosely to mean ability to estimate more parameters in the model by making a better use of information in the sequence data and not in the sense of a true positive rate.

      • When discussing full-likelihood approaches in a general sense, please cite additional methods than just BPP, such as PhyloNet.

      We added references for PhyloNet (Wen & Nakhleh, 2018) and starBEAST (Zhang et al., 2018) in the Introduction and Discussion.

      • Consider explicitly labelling chromosomal region 21 as the Z chromosome in relevant Figures, for ease of interpretation.

      In the main figures, we changed the chromosome label from 21 to Z.

      • From reading the main text it's not clear what a "3s analysis" is

      The 3s analysis estimates pairwise migration rates between two species by fitting an MSC-withmigration (MSC-M) model, also known as isolation-with-migration (IM), for three species, where gene flow is allowed between the two sister species while the outgroup is used to improve the power but does not involved in gene flow. We changed the text from

      "We use estimates of migration rates between each pair of species with a 3s analysis under the IM model of species triplets ..."

      to

      "We use estimates of migration rates between each pair of species under the the MSC-withmigration (MSC-M or IM) model of species triplets (3s analysis) ..."

      • "This agrees with the scenario in which H. elevatus is a result of hybrid speciation between H. pardalinus and the common ancestor of the cydno-melpomene clade [42, 43]." I don't think this model provides any support for hybrid speciation in particular, over a standard post-speciation introgression scenario.

      We took the finding that the introgression from the melpomene-cydno clade into H. elevatus occurs almost right after H. elevatus split off from H. pardalinus as evidence for hybrid speciation. We revised the text to make this clearer:

      "Our finding that divergence of H. elevatus and introgression from the cydno-melpomene clade occurred almost simultaneously provides evidence for a hybrid speciation origin of H. elevatus resulting from introgression between H. pardalinus and the common ancestor of the cydno-melpomene clade (Rosser et al. 2019; Rosser et al. 2023)."

      In particular, the Rosser et al. (2023) paper has now been submitted, and is the main paper to cite for the hybrid speciation hypothesis for H. elevatus.

      • "while clustering with H. elevatus would suggest the opposite direction of introgression" careful with terminology here; is this about direction (donor vs. recipient species) or taxa involved (which is not direction)?

      This is about the direction of introgression, not the taxa involved. We modified the text to make this clearer:

      "By including H. ismenius and H. elevatus, sister species of H. numata and H. pardalinus respectively, different directions of introgression should lead to different gene tree topologies. Clustering of (H. numata with the inversion, H. pardalinus) with H. numata without the inversion would suggest the introgression is H. numata → H. pardalinus while clustering of (H. numata with the inversion, H. pardalinus) with H. elevatus would suggest H. pardalinus → H. numata introgression."

      Reviewer #3 (Recommendations For The Authors):

      The work is methodologically sound and rigorous but could have been reported and discussed with greater clarity.

      It was difficult to assess the level of support for the proposed P1 introgression scenario without digging through the extensive supplementary materials. The discussion would ideally be used to clarify and summarise this.

      We have substantially revised the section on the P1 inversion. We also mention in the Results (in the final paragraph of the inversion section) and Discussion that our data provided robust evidence that the introgression of the inversion is from H. numata into H. pardalinus while its precise origin (in which lineage and when it originated) remains uncertain.

      The authors may also wish to compare their results to the recent work by Rougemont et al. on introgression between H. hecale and H. ismenius in the discussion.

      We now mention Rougemont et al. (2023) in the Discussion as an example of introgression of small regions of the genome involved in wing patterning. We also acknowledge that our updated phylogeny does not include this kind of local introgression.

      It was not initially obvious which number corresponded to the Z chromosome in any of the figures, even though this is critical to their interpretation.

      We changed the label for chromosome 21 to Z in the main figures.

      The supplementary tables should be described in more detail. For example, what is 'log_bf_check' and 'prefer_pred' in supplementary table S11?

      We added more details explaning necessary quantities in the table heading in both SI file and in the spreadsheet.

      Minor comments:

      First paragraph of 'Complex introgression in the 15b inversion region (P locus):' Rephrase "This suggests another introgression between the common...".

      We modified the text as follows:

      "Another feature of this 15b region is that among the species without the inversion, the cydnomelpomene clade clusters with H. elevatus and is nested within the pardalinus-hecale clade (without H. pardalinus). This is contrary to the expectation based on the topologies in the rest of the genome (Figure 2A, scenarios a–c) that the cydno-melpomene clade would be sister to the pardalinus-hecale clade (without H. pardalinus). One explanation for this pattern is that introgression occurred between the common ancestor of the cydno-melpomene clade and either H. elevatus or the common ancestor of H. elevatus and H. pardalinus together with a total replacement of the non-inverted 15b in H. pardalinus by the P1 inversion from H. numata (Jay et al. 2018). We confirm and quantify this introgression below."

      Second paragraph of 'Major Introgression Patterns in the melpomene-silvaniform clade:' "cconclusion" should be "conclusion."

      Corrected.

      Paragraph preceding discussion: sentences toward the end of the paragraph should be rephrased for clarity. E.g. "different tree topologies are expected under different direction of introgression."

      We revised this paragraph. The sentence now says:

      "By including H. ismenius and H. elevatus, sister species of H. numata and H. pardalinus respectively, different directions of introgression should lead to different gene tree topologies.<br /> Clustering of (H. numata with the inversion, H. pardalinus) with H. numata without the inversion would suggest the introgression is H. numata → H. pardalinus while clustering of (H. numata with the inversion, H. pardalinus) with H. elevatus would suggest H. pardalinus → H. numata introgression."

      I enjoyed reading this paper and I am certain it will generate discussion and future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      While the manuscript was reasonably clearly written and the methodology and results sound, it is not clear what the real contribution of the work is. The authors' findings - that ultrasonic stimulation is capable of altering intracellular Ca2+ to effect an increase in EV secretion from cells as long as the irradiation does not affect cell viability-is well established (see, for example, Ambattu et al., Commun Biol 3, 553, 2020; Deng et al., Theranostics, 11, 9 2021; Li et al., Cell Mol Biol Lett 28, 9, 2023). Moreover, the authors' own work (Maeshige et al., Ultrasonics 110, 106243, 2021) using the exact same stimulation (including the same parameters, i.e., intensity and frequency) and cells (C2C12 skeletal myotubes) reported this. Similarly, the authors themselves reported that EV secretion from C2C12 myotubes has the ability to regulate macrophage inflammatory response (Yamaguchi et al., Front Immunol 14, 1099799, 2023). It would then stand to reason that a reasonable and logical deduction from both studies is that the ultrasonic stimulation would lead to the same attenuation of inflammatory response in macrophages through enhanced secretion of EVs from the myotubes.

      We appreciate your comments and suggestions. Ambattu et al. in their report stated that the high frequency acoustic stimulation they used has a less effect on cell membranes than the 1 MHz ultrasound that we used in this study. Deng et al. and Li et al. applied low intensity pulsed ultrasound (LIPUS) (about 300 mW/cm2) in their studies. In this study, we assumed that ultrasound induced increase in EV secretion via increased Ca2+ influx into the cell by enhancing cell membrane permeability, and since it has been reported that the effect of ultrasound-induced enhancement in cell membrane permeability increases in an intensity-dependent manner (Zeghimi et al., 2015), we applied intensities of 1-3 W/cm2. While previous studies using LIPUS have used 15 minutes of irradiation, the high intensity employed in this study was capable to promote EV release after 5 minutes of stimulation. We have added the above explanation to the introduction in the revised version of the manuscript. Furthermore, while the previous studies used other types of cells, the main purpose of this study was to determine the optimal ultrasound intensity to promote EV release from skeletal muscle and to determine whether ultrasound-induced EVs are qualitatively altered compared to those released under normal conditions, thereby validating the anti-inflammatory effects of ultrasound-induced muscle EVs. Our previous study (Maeshige et al. 2021) used the same muscle cells but did not investigate an intensity dependence, so this is the first study to show that ultrasound irradiation promotes EV release in an intensity-dependent manner in muscle. In addition, we would like to emphasize that this study also goes beyond our previous study in the method of stimulation. Specifically, the present study a more efficient 5-minute irradiation protocol was used, whereas the previous study have adopted a 9-minute intervention.

      We understand that the results of this study are predictable from two of our previous studies, but since stimulus-induced EVs may be qualitatively different compared to EVs released under normal conditions (Kawanishi et al., 2023; Li et al., 2023), it is worthwhile to examine the effects of stimulus-induced EVs. This explanation has been added in the introduction of revised version of the manuscript.

      The authors' claim that 'the role of Ca2+ in ultrasound-induced EV release and its intensity-dependency are still unclear', and that the aim of the present work is to clarify the mechanism, is somewhat overstated. That ultrasonic stimulation alters intracellular Ca2+ to lead to EV release, therefore establishing their interdependency and hence demonstrating the mechanism by which EV secretion is enhanced by the ultrasonic stimulation, was detailed in Ambattu et al., Commun Biol 3, 553, 2020. While this was carried out at a slightly higher frequency (10 MHz) and slightly different form of ultrasonic stimulation, the same authors have appeared to since establish that a universal mechanism of transduction across an entire range of frequencies and stimuli (Ambattu, Biophysics Rev 4, 021301, 2023).

      In this study, we showed that Ca2+ is involved in ultrasound-induced EV release using Ca2+-depleted culture medium, but since we did not examine the mechanism in more detail than that, we modified the introduction to avoid overstating.

      Similarly, the anti-inflammatory effects of EVs on macrophages have also been extensively reported (Li et al., J Nanobiotechnol 20, 38, 2022; Lo Sicco et al., Stem Cells Transl Med 6, 3, 2017; Hu et al., Acta Pharma Sin B 11, 6, 2021), including that by the authors themselves in a recent study on the same C2C12 myotubes (Yamaguchi et al., Front Immunol 14, 1099799, 2023). Moreover, the authors' stated aim for the present work - clarifying the mechanism of the anti-inflammatory effects of ultrasound-induced skeletal muscle-derived EVs on macrophages - appears to be somewhat redundant given that they simply repeated the microRNA profiling study they carried out in Yamaguchi et al., Front Immunol 14, 1099799, 2023. The only difference was that a fraction of the EVs (from identical cells) that they tested were now a consequence of the ultrasound stimulation they imposed.

      That the authors have found that their specific type of ultrasonic stimulation maintains this EV content (i.e., microRNA profile) is novel, although this, in itself, appears to be of little consequence to the overall objective of the work which was to show the suppression of macrophage pro-inflammatory response due to enhanced EV secretion under the ultrasonic irradiation since it was the anti-inflammatory effects were attributed to the increase in EV concentration and not their content.

      In comparison with the current study, our previous study observed EVs secreted only from muscle in normal condition. However, the purpose of the current study is to answer the question whether ultrasound treatment could enhance the effect of EVs and change the encapsuled miRNAs. Although we identified several miRNAs which are specifically induced by ultrasound, further studies are needed to demonstrate the effect of those miRNAs derived from ultrasound-treated muscles on macrophages. We have mentioned this limitation in the discussion of the revised manuscript.  

      Reviewer #1 (Recommendations For The Authors):

      This reviewer felt that there was a lack of novelty in the manuscript and that the results of the work confirm conclusions that could have been logically deduced from a combination of the authors' preceding work (Maeshige et al., Ultrasonics 110, 106243, 2021 and Yamaguchi et al., Front Immunol 14, 1099799, 2023). The contribution of the work could perhaps be elevated if the authors were to focus more on whether the 0.01% of altered miRNA has any impact on cellular activity.

      As mentioned above, the present study is novel compared to our previous studies for examining the effects of ultrasound-induced EVs. In addition, the fact that EV content is maintained after ultrasound stimulation rather indicates that ultrasound can be used as a highly stable and effective method of promoting EV release.

      A further, albeit more minor, recommendation is to omit lines 73-80 in the manuscript. The discussion on physical exercise for promoting EV secretion together with the non-invasive nature of ultrasound therapy is very misleading as it creates the impression that the authors' work can be applied as a direct intervention on a patient. This was not shown in the work, which was limited to irradiating cells ex vivo.

      We agree and have edited the introduction.

      Reviewer #2 (Public Review):

      1. The exploration of output parameters for US induction appears limited, with only three different output powers (intensities) tested, thus narrowing the scope of their findings.

      We appreciate your comments and suggestions. The intensity of LIPUS is basically in the ~0.3 W/cm2 range, and in clinical practice, ~2.5 W/cm2 is considered to be a safe intensity to irradiate the human body (Draper, 2014). Therefore, 3.0 W/cm2 is also a fairly high intensity for the human body, so 3.0 W/cm2 was set as the maximum intensity in this study.

      1. Their claim of elucidating mechanisms seems to be only partially met, with a predominant focus on the correlation between calcium responses and EV release.

      The focus of this study was to examine the effects of ultrasound-induced EVs on the inflammatory responses of macrophages and not on the detailed mechanism of calcium involvement. We revised the introduction to make the purpose of this study clearer.

      1. While the intracellular calcium response is a dynamic activity, the method used to measure it could risk a loss of kinetic information.

      Although we did not examine the kinetic action of calcium, we believe that Ca2+ is at least proven to be involved to the EV-promoting effect of ultrasound on muscle, since the enhancement of EV release by ultrasound was canceled by elimination of calcium from the culture medium. Furthermore, real-time measurement of Ca2+ after ultrasound irradiation has shown that ultrasound irradiation promotes Ca2+ influx into cells immediately after the irradiation. (Fan et al., 2010).  

      1. The inclusion of miRNA sequencing is commendable; however, the interpretation of this data fails to draw clear conclusions, diminishing the impact of this segment.

      Although we identified several miRNAs which are specifically induced by ultrasound, further studies are needed to demonstrate the effect of those miRNAs derived from US-treated muscles on macrophages. We have mentioned this limitation in the discussion of the revised version of manuscript.

      While the authors have shown the anti-inflammatory effects of US-induced EVs on macrophages, there are gaps in the comprehensive understanding of the mechanisms underlying US-induced EV release. Certain aspects, like the calcium response and the utility of miRNA sequencing, were not fully explored to their potential. Therefore, while the study establishes some findings, it leaves other aspects only partially substantiated.

      As stated above, the main purpose of this study was to examine the effects of ultrasound-induced EVs on the inflammatory responses of macrophages. We set detailed investigation on the mechanism of ultrasound-induced EV release as our next step and have revised the introduction and discussion of the revised manuscript to make the purpose and limitation of this study clearer.  

      Reviewer #2 (Recommendations For The Authors):

      The author's exploration into the role of Ca2+ in the context of US-induced EV release is a timely endeavor, especially given the growing interest in understanding the cellular dynamics associated with external stimulants like ultrasound. Nevertheless, the manuscript does not unambiguously define the mechanism of action and its broader implications.

      Ca2+ has long been established as a versatile intracellular messenger, governing a myriad of cellular processes. There is a wealth of methodologies, from specific inhibitors to specialized assays, tailored to dissect the role of Ca2+ in diverse contexts. In the specific case of US-induced Ca2+ activity, the expectation would be for a clear, mechanistic delineation of how this ionic surge drives EV release. Yet, this study stops short of providing those details. It is imperative for the authors to dig deeper, employing a diverse set of tools at their disposal, to fill this knowledge gap.

      Recently, it was reported that increased Ca2+ influx causes an increase in EV secretion via the plasma membrane repair protein annexin A6 (Williams et al. 2023). However, the full mechanism of how an increase in intracellular Ca2+, let alone ultrasound-induced Ca2+, promotes EV release has not yet been understood yet, and it is beyond the scope of this study to elucidate this part of the mechanism.

      Furthermore, the paper raises another important question: Which specific proteins are pivotal in orchestrating the US-induced Ca2+ entry in myotubes? Addressing this would not only enhance the manuscript's novelty but would also contribute a vital piece to the puzzle of understanding US-cellular interactions.

      Ultrasound increases Ca2+ uptake by increasing cell membrane permeability by sonoporation, rather than via protein reactions (Fan et al., 2010). We added this explanation to the introduction in the revised version of manuscript.  

      Lastly, while the report touches upon the influence of varying US output power on EV concentrations, it piques curiosity about potential effects beyond the 3W/cm2 threshold. It's observed that cell viability isn't compromised at this intensity, suggesting room for further exploration. Would a higher intensity yield a proportionally increased EV release, or is there a saturation point? Conversely, could intensities beyond 3W/cm2 begin to have deleterious effects on the cells? These are crucial considerations that merit investigation to realize the full potential of US as a modulatory tool, both for research and therapeutic applications.

      As mentioned above, 3.0 W/cm2 was adopted as the maximum intensity in this study with reference to the intensity used in clinical practice. In addition, since the cytotoxicity and therapeutic effects of ultrasound depend not only on intensity but also on other parameters such as duty cycle, acoustic frequency, pulse repetition frequency and duration, so a comprehensive analysis of the effects of ultrasound on cells at various parameter settings would be valuable as an independent study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate your positive assessment and the comments by the two reviewers on the previous version of our manuscript, all of which are very helpful and greatly improved our manuscript. We have incorporated all changes and corrections requested by these reviewers and we believe their suggestions have enhanced the overall quality of our manuscript.

      As for Reviewer #1.

      We thank Reviewer 1 very much for her/his very positive and detailed remarks, all of which have been introduced into the revised version of our manuscript.

      We have added the information about the biological control on the development of phosphatic-shelled brachiopod columns in the introduction, so that our late narrative can be more understandable. The Cambrian Explosion is the innovation of metazoan body plans and radiation of animals during a relatively short geological time. The expansion of new body plans in different groups of brachiopods in the early Cambrian was likely driven by the Cambrian Explosion. The columnar architectures are not developed in living lingulate brachiopods, and thus it is important to get a better understanding of this extinct shell architecture from the fossil records on a global scale in order to study the evolutionary trend of shell architectures and compositions in brachiopods. We hope the current comparison study of columnar shell architectures from some of the oldest known brachiopods will help to pursue this goal. Furthermore, the adaptive innovation of biomineralized columnar architecture in early brachiopods is discussed in the revised manuscript.

      As for Reviewer #2.

      We thank Reviewer 2 very much for her/his very constructive and detailed remarks. All the comments have been thoroughly considered, and introduced into the revised version of the manuscript.

      The current information on the shell structures of early linguliform brachiopods is unclear, which has been introduced in the revised manuscript and the supplementary Appendix 1. We also state that more detailed studies of the complexity and diversity of linguliform brachiopod architectures (especially their early fossil representatives) require further investigations. As the shell structure and biomineralization process are crucial to unravel the poorly resolved phylogeny and early evolution of Brachiopoda, in this paper, we undertake a primary study of exquisitely well-preserved brachiopods from the Cambrian Series 2. The shapes and sizes of microscopic cylindrical columns are described in detail in this research, and this work will be useful for further comparative studies on brachiopod shell architecture. The important reference paper on brachiopod shells by Butler et al. (2015) has been added to the revised manuscript. The structure and language of the manuscript are revised based on the very helpful suggestions.

      Concerning the families Eoobolidae and Lingulellotretidae, we are aware of the current problematic situation of these families, and we have added more discussion about the detailed characters of Eoobolidae in the Systematic Palaeontology part of the manuscript. However, the revision of the families Eoobolidae and Lingulellotretidae falls outside the scope of this paper. We prefer to leave it now as it will be part of an upcoming publication based on more global materials from China, Australia, Sweden and Estonia that we are currently working on.

      On behalf of my co-authors, I thank you for taking the time to consider our manuscript for publication in eLife and I hope that with the changes we have made to our paper, it is now suitable for publication. If you have any further questions about our revised manuscript, please do not hesitate to get in contact. Thank you very much for your time and consideration.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The authors deeply appreciate the reviewer’s constructive criticism.

      Answers to the public review from Reviewer 1

      1. The pathogenesis of truncating LRRC23 in asthenozoospermia needs to be further considered. The molecular mechanism of LRRC23 demonstrated in mice should be tested in patients with the LRRC23 variant. As it may be difficult to determine the structures of RS3 in the infertile male sperm, the LRRC23 localization should be observed in the sperm from patients with the LRRC23 variant.

      We understand the reviewer’s point. Unfortunately, the patients declined to continue in the project after the initial clinical evaluation and blood draw, so we were unable to follow up.

      1. The absence of the RS3 head in LRRC23Δ/Δ mouse sperm is not sufficient to support the specific localization of LRRC23 in RS3 head. Although LRRC23 might bind to RS head protein RSPH9, the authors state that "RSPH9 is a head component of RS1 and RS2 like in C. reinhardtii (Gui et al, 2021), but not of RS3" as the protein level and the localization of RSPH9 is not altered in LRRC23Δ/Δ sperm. Thus, the specific localization of LRRC23 in RS3 head should be further confirmed.

      Thank you for your comment. We agree with the reviewer that the specific localization of LRRC23 within the RS3 head needs to be further confirmed, but this requires an atomic resolution structure of the RS3 head, which is beyond the scope of the current study. We will pursue this direction in our future study.

      3) The interaction between LRRC23 and RSPH9 needs to be defined. AlphaFold models could help determine the likelihood of a direct interaction. In addition, the structure of the 96-nm modular repeats of axonemes from the flagella of human respiratory cilia has been determined (PMID: 37258679), and the localization of LRRC23 in RS could be further predicted.

      We appreciate the comment. We are pursuing an atomic resolution structure of the RS3 head, and thus leave the prediction and detailed localization to future studies.

      4) The ortholog of the RSP15 may also be predicted or confirmed by using the reported structure in human respiratory cilia (PMID: 37258679). Whether the LRCC34 in RS2 is LRRC34?

      Based on the amino acid sequence and AlphaFold predicted structure comparison, we proposed LRRC34 as the RSP15 orthologue. We agree that further clarification of whether the reported RSP15 structure in human respiratory cilia is LRRC34 is valuable, but we would like to focus the current study on re-annotating LRRC23 function to RS3 and male infertility.

      Answers to the public review from Reviewer 2

      1. While the author generated mutant mice expressing truncated LRRC23 proteins, the expression of these truncated proteins was not detected in sperm. This implies that, in terms of sperm structure, the mutant LRRC23 protein behaves similarly to the complete knockout of the LRRC23 protein, which has been previously reported and characterized (Zhang et al., 2021).

      We partially agree with the reviewer’s comments. Indeed, the spermatozoa from truncated mutant LRRC23 mice may be similar to those from the complete knockout. However, the truncated LRRC23 in the testis could in part contribute to the assembly and structural organization of the RS3 head and/or bridge during spermatogenesis, and thus it is possible that complete absence of the LRRC23 could result in more severe structural defects in the RS3 and bridge structure. Therefore, to simply infer the same defects requires a direct comparison.

      1. This reviewer questions the proposal that LRRC23 is an integral component of RS3, as the results indicate not only the loss of the RS3 head structure but also an incomplete RS2-RS3 junction structure. In addition, the interaction of LRRC23 with RSPH9 alone does not fully explain its involvement solely in RS3 assembly. Additional evidence is required to examine the influence of LRRC23 on the RS2-RS3 junction.

      Thank you for the reviewer’s point. In a previous study, LRRC23 was detected in tracheal cilia that lack the bridge structure. Thus, we concluded that LRRC23 is a component in the RS3 head, but not necessarily in the RS2-RS3 bridge structure, although the bridge structure is also affected. Broad structural defects due to single protein loss of function are often observed in sperm flagella. For example, deficiency of RSPH6A, an RS head component, affects not only the RS structure but the entire flagellar structure in a non-uniform manner, resulting in multiple morphological flagellar abnormalities. We anticipate that our future study to determine the molecular architecture in the RS3 head and bridge structure will provide further insights into this question.

      1. The article does not explore how these mutations affect the flagella structure in human sperm, which needs further study. Expanding the study to include human sperm structure would undoubtedly enhance the quality of the article.

      We agree with the importance of further pursuing the effect of these mutations in human samples, but faced practical difficulties. As responded to reviewer 1, the patients not only dropped out of the project, but also are distantly located in remote region of Pakistan, making the application of cryo-ET not feasible.

      Answers to the recommendations of Reviewer 1

      1. The statistics analysis should be performed in Figures 2E and 2F.

      We appreciate the reviewer’s recommendation. For 2E, since the standard deviations for two groups are equal to 0, it is not possible to perform appropriate statical analyses. For 2F, since the knockout males do not sire, it is not possible to know the number of litters in this case. Therefore, litter size information is not available for knockout males, and statistical analyses are not applicable.

      1. In Figure 3A, the human sperm RS structures (PMID: 36593309) should be provided.

      Thanks for the suggestion. We have included human sperm RS structures as suggested.

      1. The molecular weight markers should also be added in Figure 3F (left), EV4B, and EV5B (AKAP3, RSPH9, AcTub).

      In the original Figure 3F, the markers were shown as the white lines in the blot images due to the space limitations. Since the previous markers are not clearly visible, we have changed the color to yellow. The marker information in EV4B and 5B has also been updated.

      Answers to the recommendations of Reviewer 2

      1. Line 119, Table S1 is incorrectly shown.

      We have corrected the Table nomenclature to Table EV1.

      1. Line 132, the author suggests that LRRC23 mutations do not affect female reproduction based on the fertility of the mother. However, this conclusion may lack rigor since it overlooks the sterility of IV-4. To address this, the author needs to examine the fertility of female mice more comprehensively. Additionally, considering the higher expression level of LRRC23 in the oviduct, the author should investigate any potential changes in the oviduct cilia.

      Thank you for the reviewer’s comment. As described in line 134, the mother of IV-4, who also carries the homozygous mutant allele like IV-4, was fertile. In addition, Lrrc23Δ/Δ female mice are fertile (now added in lines 173-174). In fact, we maintain the mouse line by crossing Lrrc23Δ/Δ females with heterozygous males. Thus, our initial conclusion that the LRRC23 mutation does not cause female fertility is still valid. However, LRRC23 has a function in the regulation of oviductal cilia requires further study, so we have softened down the corresponding sentence.

      1. In the article, the author mentions that there are some morphological differences observed in the sperm, which are not clearly depicted in Fig.1B. It is essential to specify the specific changes in sperm morphology that the author identified.

      Thank you for your comment. The morphological variations (e.g., the sperm in the lower left corner of Fig.1B has more a rounded sperm head) meant overall normal morphology with the normal range of occurrence in abnormal sperm morphology in normal fertile men, not necessarily caused by the LRRC23 mutation. To avoid confusion, we have rephrased the sentence (see lines 122-124).

      1. In Fig.3F, the previous study confirmed an interaction between LRRC23 and RSPH3 (Zhang et al., 2021), but the current manuscript does not demonstrate such an interaction; the author should explain the text.

      We appreciate your point. This could be due to the different interaction condition in vitro, and we described the possibility in main text (See Lines 200-201).

      1. In the case of the interaction between LRRC23 and RSPH9, the author utilizes human protein to detect but conducts phenotype verification in mice. Thus, discussing the relevance and potential limitations of extrapolating these findings from human protein interactions to the phenotypic effects

      Thank you for the reviewer’s suggestion. We added discussion for that part (lines 336-341).

      1. The authors needed to detect changes in LRRC23 protein and mRNA levels at different stages of spermatogenesis.

      We agree that expression profiling of LRRC23 protein levels in developing male germ cells will be helpful to further understand LRRC23 function in spermatogenesis, but we do not perceive that it is not critical in this study as LRRC23 mRNA expression profiling from scRNA database (Fig. EV4) hints at the protein profiles.

      1. In Figure 4C of the article, the author should provide a clear and detailed explanation in the text of how they distinguish RS1, RS2, and RS3.

      We added the information in figure legends (lines 1034-1037).

      1. Zoom in on the RS structure in Fig.EV5D for precise observation.

      In TEM images with limited resolution, we could not tell which RS (RS1, 2, or 3) we have in the cross-section, and simple zoom-in does not provide a better and/or more accurate observation (the main reason, we moved forward with cryo-ET).

      1. By utilizing computational modeling and bioinformatics tools, the authors gain insights into the potential interactions, binding sites, and structural features of LRRC23 within the RS3 complex. This approach provides a deeper understanding of LRRC23's function and role in the assembly and stability of the RS3 complex. To enhance the clarity and visualization of the findings, the authors should generate a schematic diagram that illustrates the proposed interactions and structural organization of LRRC23 within the RS3 complex.

      We appreciate the reviewer’s suggestion to speculate the molecular position and interaction of LRRC23 within the RS3 complex. For the level of computational modeling and bioinformatics, we believe that purification of RS3 complex and LRRC23 interactome study is required, which is one of our future directions. Given the limitation of our current data, we choose to stay conservative and not to suggest detailed structural information of LRRC23 in RS3 complex.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Re: Revised author response for eLife-RP-RA-2023-90135 (“The white-footed deermouse, an infection-tolerant reservoir for several zoonotic agents, tempers interferon responses to endotoxin in comparison to the mouse and rat” by Milovic, Duong, and Barbour”)

      The revised manuscript has taken into account all the comments and questions of the two reviewers. Our responses to each of the comments are detailed below. In brief, the modifications or additional materials for the revision each specifically address a reviewer comment. These modifcations or materials include the following….

      • a more in-depth consideration of sample sizes

      • a better explanation of what p values signify for a GO term analysis

      • a more detailed account of the selection of the normalization procedure for cross-species targeted RNA-seq (including a new supplemental figure)

      • several more box plots in supplementary materials to complement the scatterplots and linear regressions of the figures of the primary text

      • provision in a public access repository of the complete data for the RNA-seq analyses as well as primary data for figures and tables as new supplementary tables

      • the expansion of description of the analysis done for the revision of Borrelia hermsii infection of P. leucopus. This included a new table (Table 10 of the revision) • development of the possible relevance of finding for longevity studies by citing similarities of the findings in P. leucopus with those in the naked mole-rat

      • what we think is a better assessment of differences between female and male P. leucopus for this particular study, while still keeping focus on DEGs in common for females and males. This included a new figure (Figure 4 of the revision).

      • removal of reference to a “inverse” relationship between Nos2 and Arg1 while still retaining ratios of informative value

      We note that in the interval between uploading the original bioRxiv preprint and now we learned of the paper of Gozashti, Feschotte, and Hoekstra (reference 32), which supports our conception of the important place of endogenous retroviruses in the biology and ecology of deermice. This is the only addition or modification that was not a direct response to a reviewer comment or question, but it was germane to one of Reviewer #1’s comments (“Regarding..”).

      Reviewer #1:

      Supplemental Table 1 only lists genes that passed the authors statistical thresholds. The full list of genes detected in their analysis should be included with read counts, statistics, etc. as supplemental information.

      We agree that provision of the entire lists of reference transcripts and the RNA-seq results for each of the 40 animals is merited. These datasets are too large for what the journal’s supplementary materials resource was intended for, so we have deposited them at the Dryad public access repository.

      While P. leucopus is a critical reservoir for B. burgdorferi, caution should be taken in directly connecting the data presented here and the Lyme disease spirochete. While it's possible that P. leucopus have a universal mechanism for limiting inflammation in response to PAMPs, B. burgdorferi lack LPS and so it is also possible the mechanisms that enable LPS tolerance and B. burgdorferi tolerance may be highly divergent.

      The impetus for the study was the phenomenon of tolerance of infection of P. leucopus by a number of different kinds of pathogens, not just B. burgdorferi. We take the reviewer’s point, though. Certainly, the white-footed deermouse is probably most notable at-large for its role as a reservoir for the Lyme disease agent. We doubt that the species responses to LPS and to the principal agonists of B. burgdorferi are “highly divergent”, though. Other than the TLR itself-TLR4 for LPS vs the heterodimer TLR2/TLR1 for the lipoproteins of these spirochetes--the downstream signaling is generally similar for amounts comparable in their agonist potency.

      We had thought that we had addressed this distinction for B. burgdorferi and other Borreliaceae members by referring to the earlier study. But we agree with the reviewer that what was provided on this point was insufficient in the context of the present work. Accordingly, for the revision we have added a new analysis of the data on experimental infection of P. leucopus with Borrelia hermsii, which lacks LPS and for which the TLR agonists eliciting inflammation are lipoproteins. We do this in a format (new Table 6) that aids comparison with the LPS experimental data elsewhere in the article. As the manuscript references, B. burgdorferi infection of P. leucopus elicits comparatively little inflammation in blood even at the height of infection. While this phenomenon with the Lyme disease agent was part of the rationale driving these studies, the better comparison with LPS was 5 days into B. hermsii infection when the animals are spirochetemic.

      Statistical significance is binary and p-values should not be used as the primary comparator of groups (e.g. once a p-value crosses the deigned threshold for significance, the magnitude of that p-value no longer provides biological information). For instance, in comparing GO-terms, the reason for using of high p-value cutoffs ("None of these were up-regulated gene GO terms with p values < 1011 for M. musculus.") to compare species is unclear. If the authors wish to compare effect sizes, comparing enrichment between terms that pass a cutoff would likely be the better choice. Similarly, comparing DEG expression by p-value cutoff and effect size is more meaningful than analyses based on exclusively on p-value: "Of the top 100 DEGs for each species by ascending FDR p value." Description in later figures (e.g. Figure 4) is favored.

      Effect sizes--in this case, fold-changes--were taken into account for GO term analysis and were specified in the settings that are described. So, any gene that was “counted” for consideration for a particular GO term would have passed that threshold and with a falsediscovery corrected p value of a specified minimum. There is no further scoring of the “hit” based upon the magnitude of the p value beyond that point. It is, as the reviewer writes, binary at that point. We are in agreement on those principles.

      As we understand the comment above, though, the p-values referred to are in regard to the GO term analysis itself. The objective was discovery followed by inference. The situation was more like a genome-wide association study (GWAS) study. This is not strictly speaking a hypothesis test, because there was no stated hypothesis ahead of time or one driving the design. The “p value” for something like GO term analysis or GWAS provides an estimate of the strength of the association. It is not binary in that sense. The lower the p value, the greater confidence about the association. In a GWAS of a human population an association of a trait with a particular SNP or indel is usually not taken seriously unless the p value is less than 10^-7 or 10^-8. In the case of GO terms, the p value approximates (but is not equivalent to) the number of genes that are differentially expressed that belong to a GO cluster out of the total number of genes that define that cluster. The higher the proportion of the genes in the cluster that are associated with a treatment (LPS vs. saline), the lower the p value. Thus, it provides information beyond the point at which it would be rightly deemed of little additional value in many hypothesis testing circumstances.

      That said, we agree that the original manuscript could have been clearer on this point and have for the revision expanded the description of the GO term analysis in the Methods, including some explanation for a reader on what the p value signifies here. We also refrain from specifying a certain p value for special attention and merely list 20 by ascending p value.

      The ability to use of CD45 to normalize data is unclear. Authors should elaborate both on the use of the method and provide some data how the data change when they are normalized. For instance, do correlations between untreated Mus and Peromyscus gene expression improve? The authors seem to imply this should be a standard for interspecies comparison and so it would be helpful to either provide data to support that or, if applicable, use of the technique in literature should be referenced.

      The reviewer brings up an important point that we considered addressing in more depth for the original manuscript but in the end deferred to considerations about length and left it out.

      But we are glad to address this here, as well as in the revised manuscript.

      We did not intend to imply either that this particular normalization approach had been done before by others or that it “should” be a standard. We are not aware of another report on this, and it would be up to others whether it would be useful or not for them. We made no claim about its utility in another model or circumstance. The challenge before us was to do a comparative analysis of transcription in the blood not just for animals of one species under different conditions but animals of two different genera under different conditions. A notable difference between the animals was in their white blood cell counts, as this study documents. White cells would be the source of a majority of transcripts of potential relevance here, but there would also be mRNA for globins, from reticulocytes, from megakaryocytes, and likely cell-free RNA with origins in various tissues. If the white cell numbers differed, but the non-white cell sources of RNA did not, then there could be unacknowledged biases.

      It would be like comparing two different kinds of tissues and assuming them to be the same in the types and numbers of cells they contained. Four hours after a dose of LPS the liver cells (or brain cells) would differ in their transcriptional profiles from untreated the livers (or brains) of untreated animals for sure, but there would not be much if any change in the numbers of different kinds of cells in the liver (or brain) within 4 hours. The blood can change a lot in composition within that time frame under these same conditions. Some sort of accounting for differing white cell numbers in the blood in different outbred animals of two species seemed to be called for.

      The normalization that was done for the genome-wide analysis was not based on a particular transcript, but instead was based on the total number of reads, the lengths of the reference transcripts, and the distributions of reads matching to the tens of thousands of references for each sample. This was done according to what are standard procedures by now for bulk RNAseq analyses. Because the reference transcript sets for P. leucopus and M. musculus differed in their numbers and completeness of annotation, we did not attempt any cross-species comparison for the same set of genes at that point. That would not be possible because they were not entirely commensurate.

      The GO term analysis of those results provided the leads for the more targeted approach, which was roughly analogous to RT-qPCR. For a targeted assay of this sort, it is common to have a “housekeeping gene” or some other presumably stably transcribed gene for normalization. A commonly used one is Gapdh, but we had previously found that Gapdh was a DEG itself in the blood in P. leucopus and M. musculus at the four hour mark after LPS. The aim was to provide for some adjustment so datasets for blood samples differing in white blood cell counts could be compared. Two options were the 12S ribosomal RNA of the mitochondria, which would be in white cells but not mature erythrocytes, and CD45, which has served an approximately similar function for flow cytometry of the blood. As described in what has been added for the revision and the supplementary materials, we compared these different approaches to normalization. Ptprc and 12S rRNA were effectively interchangeable as the denominator with identifying DEGs of P. leucopus and M. musculus and cross-species comparisons.

      Regarding the ISG data-is a possible conclusion not that Peromyscus don't upregulate the antiviral response because it's already so high in untreated rodents? It seems untreated Peromyscus have ISG expression roughly equivalent to the LPS mice for some of the genes. This could be compared more clearly if genes were displayed as bar plots/box and whisker plots rather than in scatter plots. It is unclear why the linear regression is the key point here rather than normalized differences in expression.

      In answer to the question: yes, that is possible. In the interval between uploading of the manuscript and this revision, we became aware of a study by Gozashti and Hoekstra published this year in Molecular Biology and Evolution (reference 32) and reporting on the “massive invasion” of endogenous retroviruses in P. maniculatus and the defenses deployed in response to achieve silencing. We cite this work and discuss it, including related findings for P. leucopus, in the revision.

      We had originally intended to include box plots as well as scatterplots with regressions for the data, but thought it would be too much and possibly considered redundant. But with this encouragement from the reviewer we provide additional box plots in supplementary materials for the revision.

      Some sections of the discussion are under supported:

      The claim that low inflammation contributes to increased lifespan is stated both in the introduction and discussion. Is there justification to support this? Do aged pathogen-free mice show more inflammation than aged Peromyscus?

      We respectively point out that there was not a claim of this sort. We stated a fact about P. leucopus’ longevity. We made no statement connecting longevity and inflammation beyond the suggestion in the introduction that the explanation(s) for infection tolerance might have some bearing for studies on determinants of life span.

      But the reviewer’s comment prompted further consideration of this aspect of Peromyscus biology. This led eventually to the literature on the naked mole-rat, which seems to be the rodent with the longest known life span and the subject of considerable study. The discussion section of the revision has an added paragraph on some of the similarities of P. leucopus and the naked mole-rat in terms of neutrophils, expression of nitric oxide synthase 2 in response to LPS, and type 1 interferon responses. While this is far from decisive, it does serve to connect some of the dots and, hopefully, is considered at least partially responsive to the reviewer’s question.

      The claim that reduced Peromyscus responsiveness could lead to increased susceptibility to infection is prominently proposed but not supported by any of the literature cited.

      There was not this claim. In fact, it was framed as a question, not a statement. Nevertheless, we think we understand what the comment is getting at and acknowledge in the revision that there may be unexamined circumstances in which P. leucopus may be more vulnerable.

      References to B. burgdorferi, which do not have LPS, in the discussion need to ensure that the reader understands this and the potential that responses could be very different.

      We think we addressed this comment in a response above.

      Reviewer #2:

      1. How were the number of animals for each experiment selected? Was a power analysis conducted?

      A power analysis of any meaning for bulk RNA-seq with tens of thousands of reference transcripts, each with their own variance, and a comparison of animals of two different genera is not straight forward. Furthermore, a specific hypothesis was not being tested. This was a broad, forward screen. But the question about sample sizes is one that deserves more attention than the original manuscript provided. This now provided in added text in two places in Methods ( “RNA-seq” and “Genome-wide different gene expression”) in the revision.

      1. The authors conducted a cursory evaluation of sex differences of P. leucopus and reported no difference in response except for Il6 and Il10 expression being higher in the males than the females in the exposed group. The data was not presented in the manuscript. Nor was sex considered for the other two species. A further discussion of the role that sex could play and future studies would be appreciated.

      We agree that the limited analysis of sex differences and the undocumented remark about Il6 and Il10 expression in females and males warranted correction. For the revision we removed that analysis of targeted RNA-seq of P. leucopus from the two different studies. For this study we were looking for differences that applied to both species. This was the reason that there were equal numbers of females and males in the samples. We agree that further investigation of differences between sexes in their responses is of interest but is probably best left for “future studies”.

      But in revision we do not entirely ignore the question of sex of the animal and provide an additional analysis of the bulk RNA-seq for P. leucopus with regard to differences between females and males. This basically demonstarted an overall commensurability between sexes, at least for the purposes of the GO term analysis and subsequent targeted RNA-seq, but did reveal some exceptions that are candidate genes for those future studies.

      In the revision, we also add for the discussion and its “study limitations” section a disclaimer about possibly missing sex associated differences because the groups were mixed sexes.

      1. The ratio of Nos2 and Arg1 copies for LPS treated and control P. leucopus and M.musculus in Table 3 show that in P. leucopus there is not a significant difference but in M.musculus there is an increase in Nos2 copies with LPS treatment. The authors then used a targeted RNA-seq analysis to show that in P. leucopus the number of Arg1 reads after LPS treatment is significantly higher than the controls. These results are over oversimplified in the text as an inverse relationship for Nos2/Arg1 in the two species.

      We agree. In addition to providing box plots for Arg1 and Nos2, as suggested by Reviewer #1, we also replaced “ratio” in commenting on Arg1 and Nos2, with “differences in Nos2 and Arg1 expresssion” replacing “ratio of Nos2 to Arg1 expression” at one place. At another place we have removed “inverse” with regard to Nos2 and Arg1. But we respectfully decline to remove Nos2/Arg1 from Figure 5 (now Figure 6) or inclusion of Nos2/Arg1 ratios elsewhere. According to our understanding there need not be an inverse relationship for a ratio to have informative value.

      Recommendations For the Authors

      We thank the two reviewers for their constructive recommendations and suggestions, in some case pointing out errors we totally missed. For the great majority, the recommendations were followed. Where we decline or disagree we explain this in the response.

      Reviewer #1 (Recommendations For The Authors):

      • How was the FDR < 0.003 cutoff chosen for DEG? All cutoffs are arbitrary but there should be some justification.

      We agree and have provided the rationale at that point in the paper (before Figure 3) in R2: "For GO term analysis the absolute fold-change criterion was ≥ 2. Because of the ~3-fold greater number of transcripts for the M. musculus reference set than the P. leucopus reference set, application of the same false-discovery rate (FDR) threshold for both datasets would favor the labeling of transcripts as DEGs in P. leucopus. Accordingly, the FDR p values were arbitrarily set at <5 x 10-5 for P. leucopus and <3 x 10-3 for M. musculus to provide approximately the same number of DEGs for P. leucopus (1154 DEGs) and M. musculus (1266 DEGs) for the GO term comparison."

      • It would be helpful to include a figure demonstrating the correlation between CD45 and WBC ("Pearson's continuous and Spearman's ranked correlations between log-transformed total white blood cell counts and normalized reads for Ptprc across 40 animals representing both species, sexes, and treatments were 0.40 (p = 0.01) and 0.34 (p = 0.03), respectively.")

      In both the first version of the revision (R1) and in R2 we provide a fuller explanation of the choice of CD45 (Ptprc) for normalization as detailed in the response to Reviewer #1's public comment. In the revision only Pearson's correlation and p value is given. We did not think another figure was justified after there was additional space devoted to this in both R1 and R2.

      • Unclear what the following paragraph is referring to-is this from the previous paper? Was this experiment introduced somewhere? "Low transcription of Nos2 and high transcription of Arg1 both in controls and LPS-treated P. leucopus was also observed in the experiment where the dose of LPS was 1 µg/g body mass instead of 10 µg/g and the interval between injection and assessment was 12 h instead of 4 h (Table 4)."

      This experiment is described in the Methods in the original and subsequent versions, but we agree that it is not clear whether it was from present study or previous one. Here is the revised text for R2: "Low transcription of Nos2 in both in controls and LPS-treated P. leucopus and an increase in Arg1 with LPS was also observed in another experiment for the present study where the dose of LPS was 1 µg/g body mass instead of 10 µg/g and the interval between injection and assessment was 12 h instead of 4 h (Table 4)."

      • Regarding the differences in IFNy between outbred and BALB/c mice-are there any other RNA-seq datasets you can mine where other inbred mice (B/6, C3H, etc) have been injected with LPS and probed roughly the same amount of time later? Do they look like BALB/c or the outbreds?

      In both the original and R1 and R2 we cite two papers on the difference of BALB/c mice. While this is of interest for follow-up in the future, we did not think additional content on a subject that mainly pertains to M. musculus was warranted here, where the main focus is Peromyscus.

      • Figure 8 and its legend are difficult to follow. The top half of the figure is not well explained and it's unclear what species this is. Decreased use of abbreviations would help. Consider marking each R2 value as Mus or Peromyscus (As done in Fig 9). There are some typographical errors in the legend ("gree," incomplete sentence missing the words LPS or treatment AND Mus: "Co-variation between transcripts for selected PRRs (yellow) and ISGs (gree) in the blood of P. leucopus (P) or (M) with (L") or without (C)."

      This is now Figure 9 in both R1 and R2. We revised it for R1 to include references to the box plots in supplementary materials, but agree with Reviewer #1's recommendation to correct the typos and make the legend less confusing. We did not think that further labeling of the R2 values in the scatterplots with the species names was necessary. The data points are not just colors but also different symbols, so it should be fairly easy for readers to distinguish the regression lines by species. For R2 this is the revised legend with additions in response to the recommendation underlined:

      "Figure 9. Co-variation between transcripts for selected PRRs and ISGs in the blood of P. leucopus (P) or M. musculus (M) with (L) or without (C) LPS treatment. Top panel: matrix of coefficients of determination (R2) for combined P. leucopus and M. musculus data. PRRs are indicated by yellow fill and ISGs by blue fill on horizontal and vertical axes. Shades of green of the matrix cells correspond to R2 values, where cells with values less than 0.30 have white fill and those of 0.90-1.00 have deepest green fill. Bottom panels: scatter plots of log-transformed normalized Mx2 transcripts on Rigi (left), Ifih1 (center), and Gbp4 (right). The linear regression curves are for each species. For the right-lower graph the result from the General Linear Model (GLM) estimate is also given. Values for analysis are in Table S4; box plots for Gbp4, Irf7, Isg15, Mx2, and Oas1 are provided in Figure S6."

      • Discussion section could benefit from editing for clarity. Examples listed: o Unclear what effect is described here "The bacterial infection experiment indicated that the observed effect in P. leucopus was not limited to a TLR4 agonist; the lipoproteins of B. hermsii are agonists for TLR2 (Salazar et al. 2009)."

      Both R1 and R2 include the new section on the B. hermsii infection model. This was added in response to Reviewer #1 public comment. So the expanded consideration of this aspect should address the reviewer's recommendation for more clarity and context here. For R2 we modified the text in the discussion of R1:

      "The analysis here of the B. hermsii infection experiment also indicated that the phenomenon observed in P. leucopus was not limited to a TLR4 agonist."

      o Unclear what the takeaway from this paragraph is: "Reducing the differences between P. leucopus and the murids M. musculus and R. norvegicus to a single all-embracing attribute may be fruitless. But from a perspective that also takes in the 2-3x longer life span of the whitefooted deer mouse compared to the house mouse and the capacity of P. leucopus to serve as disease agent reservoir while maintaining if not increasing its distribution (Moscarella et al. 2019), the feature that seems to best distinguish the deer mouse from either the mouse or rat is its predominantly anti-inflammatory quality. The presentation of this trait likely has a complex, polygenic basis, with environmental (including microbiota) and epigenetic influences. An individual's placement is on a spectrum or, more likely, a landscape rather than in one or another binary or Mendelian category."

      We agree that modification, simplication, and clarification was called for. In response to a public comment of Reviewer #1 we had changed that section, leaving out reference to longevity here. Here is the revised text in both R1 and R2:

      "Reducing differences between P. leucopus and murids M. musculus and R. norvegicus to a single attribute, such as the documented inactivation of the Fcgr1 gene in P. leucopus (7), may be fruitless. But the feature that may best distinguish the deermouse from the mouse and rat is its predominantly anti-inflammatory quality. This characteristic likely has a complex, polygenic basis, with environmental (including microbiota) and epigenetic influences. An individual’s placement is on a spectrum or, more likely, a landscape rather than in one or another binary or Mendelian category."

      Minor comments:

      • Use of blue and red in figures as the -only- way to easily distinguish between groups is a poor choice-both in terms of how inclusivity of color-blind researchers and enabling grayscale printing. Most detrimental in Figure 2, but also slightly problematic in Figure 1. Use of color and shape (as done in other figures) is a much better alternative.

      We agree. Both figures have been modified to include an additional characteristic for denoting the data point. For Figure 1 it is a black filling, and for Figure 2 it is the size of symbol in additon to the color. This should enable accurate visualization by color blind individuals and printing in gray scale. We have added definitions for the symbols within the graph itself, so there is no need to refer to the legend to interpret what they mean.

      • Note the typo where it should read P leucopus: "The differences between P. musculus and M. musculus in the ratios of Nos2/Arg1 and IL12/IL10 were reported before (BalderramaGutierrez et al. 2021),"

      We thank the reviewer for pointing this typo out, which also carried over to R1. It has been corrected for R2.

      • Optional: Can the relationship between the ratios in figure 5 and macrophage "types" be displayed graphically alongside the graphs? It's a little challenging to go back and forth between the text and the figure to try to understand the biological implication.

      We considered something like this but in the end decided that we were not yet comfortable assigning “types” in this fashion for Peromyscus.

      Reviewer #2 (Recommendations For The Authors):

      • Be consistent with nomenclature for your species/treatment groups in the text, figures, and tables. For example, you go back and forth between "P. leucopus" and "deermouse" in the text. And in figures you use "P," "Peromyscus", or "Pero".

      In the Methods section of the original and revisions R1 and R2 we indicate that "deermouse" is synonymous with "Peromyscus leucopus" and "mouse" is synonymous with "Mus musculus" in the context of this paper. We think that some alternation in the terms relieves the text of some of its repetitiveness and that readers should not have a problem with equating one with the other. The use of "deermouse" also reinforces for readers that Peromyscus is not a mouse. With regard to the abbreviations for P. leucopus, those were used to accommodate design and space issues of the figures or tables. In all cases, the abbreviations referred to are defined in the legends of the figures. So, we respectfully decline to follow this recommendation.

      • Often the sentence structure and/or word choice is irregular and makes quick/easy comprehension difficult. Several examples are:

      o The third paragraph of the introduction

      We agree that the first and second sentences are unclear. Here is the revision for R2:

      “As a species native to North America, P. leucopus is an advantageous alternative to the Eurasian-origin house mouse for study of natural variation in populations that are readily accessible (9, 53). A disadvantage for the study of any Peromyscus species is the limited reagents and genetic tools of the sorts that are applied for mouse studies.”

      o The first line after Figure 5 on page 9.

      We agree. The long sentence which we think the reviewer is referring to has been in split into two sentences for R2.

      “An ortholog of Ly6C (13), a protein used for typing mouse monocytes and other white cells, has not been identified in Peromyscus or other Cricetidae family members. Therefore, for this study the comparison with Cd14 is with Cd16 or Fcgr3, which deermice and other cricetines do have.”

      o The sentence that starts "Our attention was drawn to..." on page 14.

      We agree that the sentence was awkward and split into two sentences.

      “Our attention was drawn to ERVs by finding in the genome-wide RNA-seq of LPS-treated and control rats. Two of the three highest scoring DEGs by FDR p value and fold-change were a gagpol polyprotein of a leukemia virus with 131x fold-change from controls and a mouse leukmia virus (MLV) envelope (Env) protein with 62x fold-change (Dryad Table D5).”

      • For figures with multiple panels, use A), B) etc then indicate which panel you are discussing in your text. This is a very data heavy study and your readers can easily get lost.

      We agree and have added pointers in the text to the panels we are referring to. But we prefer to use easily understood descriptors like “left” and “upper” over assigned letters.

      • For all the figures, where are the stats from the t-tests? Why didn't you do a two-way ANOVA? Instead of multiple t-tests?

      Where we are not hypothesis testing and we are able to show all the data points in box-whisker plots with distributions fully revealed, our default position is not to apply significance tests in a post hoc fashion. If a reader or other investigator wants to do this for other purposes, e.g. a meta-analysis, the data is provided in public repository for them to do this. We are not sure what the reviewer means by "multiple t-tests" for "all figures". Where we do 2-tailed t-tests for presentation of data for many genes in a table for the targeted RNA (where individual values cannot shown in the table), there is always correction for multiple testing, as indicated in Methods. The p values shown as "FDR" are after correction.

      • Results paragraph "LPS experiment and hematology studies"

      o List the two species for the first description to orient the reader since you eventually include rat data.

      We agree that this is warranted and followed this recommendation for R2.

      o Not all the mice experienced tachypnea, but the text makes it seem like 100% did.

      We are not sure what the reviewer is referring to here. This is what is in the text on tachypnea: "By the experiment’s termination at 4 h, 8 of 10 M. musculus treated with LPS had tachypnea, while only one of ten LPS-treated P. leucopus displayed this sign of the sepsis state (p = 0.005)." The only other mention of "tachypnea" was in Methods.

      • Figure 1: Why was the M. musculus outlier excluded? Where any other outliers excluded?

      That data point for the mouse was not "excluded" from the graph. It is identified (MM17) for reference with Table 1, and there is the graph for all to see where it is. It was only excluded from the regression curve for control mice. There was no significance testing. There were no other outliers excluded.

      • Figure 3: explain the colors and make the scales the same for all the panels or at least for the upregulated DEGs and the downregulated DEGs.

      We have modified the legend for Figure 3 to include fuller definitions of the x-axes and a description of the color spectrum. We decline to make the x-axis scale the same for all the panels because the horizontal bars in “transcription down” panels would take up only a small fraction of the space. The x-axes are clearly defined and the colors of the bars also indicate the differences in p-values. We doubt that readers will be misled. Here is the revised legend: “Figure 3. Gene Ontology (GO) term clusters associated with up-regulated genes (upper panels) and down-regulated genes (lower panels) of P. leucopus (left panels) and M. musculus (right panels) treated with LPS in comparison with untreated controls of each species. The scale for the x-axes for the panels was determined by the highest -log10 p values in each of the 4 sets. The horizontal bar color, which ranges from white to dark brown through shades of yellow through orange in between, is a schematic representation of the -log10 p values.”

      • Results paragraph "Targeted RNA seq analysis"

      o In the third paragraph, an R2 of 0.75 is not close enough to 1 to call it "~1"

      What the reviewer is referring to is no longer in either R1 and R2, as detailed in the authors' response to public comments.

      o In the 4th paragraph, where are your stats?

      We have replaced terms like "substantially" and "marginally" with simple descriptions of relationships in the graphs.

      "For the LPS-treated animals there was, as expected for this selected set, higher expression of the majority genes and greater heterogeneity among P. leucopus and M. musculus animals in their responses for represented genes. In contrast to the findings with controls, Ifng and Nos2 had higher transcription in treated mice. In deermice the magnitude of difference in the transcription between controls and LPS-treated was less."

      • Figure 4: The colors are hard to see, I suggest making all the up regulated reads one color, the down regulated reads a different color, and the reads that aren't different black or gray.

      This is now Figure 5 in R1 and R2. The selected genes that are highlighted in the panels are denoted not only by color but also by type of symbol. We do not think that readers will have a problem telling one from another even if color blind. The purpose of this figure was to provide an overview and a visual representation with calling out of selected genes, some of which will be evaluated in more detail later. We thought that this was necessary before diving deeper into the data of Table 2. We do not think further discriminating between transcripts in the categorical way that the reviewer suggests is warranted at this point. So, we respectfully decline to follow this suggestion.

      • Results paragraph " Alternatively- activated macrophages...."

      o Include a brief description of Nos2 and Arg1

      We have defined what enzymes these are genes for in R2.

      o How do you explain the lack of a difference in P. leucopus Arg1? Your text says the RT-qPCR confirms the RNA-seq findings.

      There was a difference in P. leucopus Arg1 by RT-qPCR between control and LPS treated by about 3-fold. By both RNA-seq and RT-qPCR Arg1 transcription is higher in P. leucopus than in M. musculus under both conditions. But we have modified the sentence so that does not imply more than what the data and analysis of the table reveal.

      "While we could not type single cells using protein markers, we could assess relative transcription of established indicators of different white cell subpopulations in whole blood. The present study, which incorporated outbred M. musculus instead of an inbred strain, confirmed the previous finding of differences in Nos2 and Arg1 expression between M. musculus and P. leucopus (Figure 5; Table 2). Results similar to the RNA-seq findings were obtained with specific RT-qPCR assays for Nos2 and Arg1 transcripts for P. musculus and M. musculus (Table 3)."

      • Figure 5: reorganize the panels to make the text description and label with letters, where are the stats?

      We thought the figure (now Figure 6) was self-explanatory, but agree that further explanation in the legend was indicated. We prefer to use descriptions of locations (“upper left”) over labels, like “panel C”, which do not obviously indicate the location of the panel. Of course, if the journal’s style mandates the other format we will do so. Our response about “stats” for boxplot figures is the same as what we provided above.

      • Results paragraph "Interferon-gamma and interleukin-1 beta..."

      o Either add the numbers or direct the viewer to where Ifng is in Table 2. The table is very big and Ifng is all the way at the bottom!

      We agree that this table is large, but we thought it better to err on the side of inclusiveness by having a single table, rather than have some genes in the main article and other results in a supplementary table. We thought that it would make it easier for reviewers and readers to find a gene of interest, but we also acknowledge the challenge to locate the genes we highlight. We follow for R2 that reviewer's recommendation to provide some guidance for readers trying to locate a featured gene by pointing relative locations. While adding a column of numbers to already complex table seems more than what is called for, we are depositing an Excel spreadsheet of the table at the Dryad repository to facilitate searching by an interested reader for a particular gene.

      • Figure 6: stats? The pink and red are hard to easily distinguish from each other. I also suggest not using red and green together for color blind readers.

      With regard to the box-plots and significance testing, please see response above to an earlier recommendation. We have removed an interpretative adjective (i.e. "marked") from the description of the graph. Different symbols as well as colors are used, so we do not think that this will pose a problem for readers, even those with complete red-green color blindness. For what it’s worth, with regard to the "red" and "pink" issue, according to the figure on our displays the colors of the two symbols appear to be red and purple. They are also applied to different species and different conditions for those species.

      • Figure 8: In the legend it says "... PRRs (yellow) and ISGs (gree)" which is a typo, but don't you mean blue not green anyways?

      See response above to Reviewer #1's recommendation. This has been corrected.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback . Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance ofPfMORC in maintaining chromatin structural integrity in the parasite and highlight this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional assays that point to the relevance of the PfMORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which PfMORC is involved, we bring forth first-hand evidence of its overall function in heterochromatin binding and gene-regulation, its association with major TF regulatory players, and essentiality for parasite survival. We however agree with the comment regarding the lack of direct effects of PfMORC KD and will provide additional evidence by performing ChIP-seq experiments against additional histone marks in WT and PfMORC KD lines.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We do agree with the reviewer's comment. Validation of the identified interacting partners is critical and most likely essential to understanding their role in directing MORC to its targets. However, our protein pull down experiments have been done using biological replicates. Several of the interacting partners have also been identified and published by other labs. A direct comparison of our work together with previous published work will be incorporated in a revised version of the manuscript to further validate the identified interacting partners and the accuracy of the data we obtained in this manuscript. Molecular validation of all proteins identified in our protein may take a few more years and will be submitted for publication in futur manuscripts.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and intend to perform additional experiments to delve deeper into the multifaceted roles of PfMORC. We have begun to explore the effects of PfMORC depletion on heterochromatin marks using ChIP-seq experiments at distinct stages of parasite development. We hope our new results will shed light on the direct implications of PfMORC in heterochromatin regulation.

    1. Author Response:

      We would like to thank you very much for handling and reviewing our manuscript so carefully and to be so positive about our work. We are indeed grateful about these very concise and constructive reviews as well as about the Editorial Assessment. We basically agree with all reviewers' comments. Besides addressing all formal suggestions, we also decided to do some more experiments.

      The main concern, the role of the transcription factor NF-YA1 during rhizobial infections, is indeed an absolut valid one. While the CDEL system has its beauties it certainly has its limitations as well. Thus, we will try to assess the role of NF-YA1 during symbiotic infections in Medicago more specifically. We will place NF-YA1 expression under the control of infection-specific promoters to limit pleiotropic effects of ectopic over-expression and assess rhizobial infections as well as cell cycle patterns in tranformed hairy roots producing the H3.1/H3.3 marker. Infection-inducible promoters will also be used to drive the ectopic expression of CYCD3;1 on the cortical infection thread trajectory to locally increase mitotic cycles, in order to test the functional importance of cell cycle exit on cortical infections.

      We hope that we will be able to conclude more firmly on NF-YA1 function prior to locking the version of record and to deliver these experiments in a time frame of about 4-6 months, which is the minimum time we need for cloning the respective constructs, doing all hairy root transformations in sufficient numbers and quantitative microscopy.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Overall the manuscript is well written, and the successful generation of the new endogenous Cac tags (Td-Tomato, Halo) and CaBeta, stj, and stolid genes with V5 tags will be powerful reagents for the field to enable new studies on calcium channels in synaptic structure, function, and plasticity. There are also some interesting, though not entirely unexpected, findings regarding how Brp and homeostatic plasticity modulate calcium channel abundance. However, a major concern is that the conclusions about how "molecular and organization diversity generate functional synaptic heterogeneity" are not really supported by the data presented in this study. In particular, the key fact that frames this study is that Cac levels are similar at Ib and Is active zones, but that Pr is higher at Is over Ib (which was previously known). While Pr can be influenced by myriad processes, the authors should have first assessed presynaptic calcium influx - if they had, they would have better framed the key questions in this study. As the authors reference from previous studies, calcium influx is at least two-fold higher per active zone at Is over Ib, and the authors likely know that this difference is more than sufficient to explain the difference in Pr at Is over Ib. Hence, there is no reason to invoke differences in "molecular and organization diversity" to explain the difference in Pr, and the authors offer no data to support that the differences in active zone structure at Is vs Ib are necessary for the differences in Pr. Indeed, the real question the authors should have investigated is why there are such differences in presynaptic calcium influx at Is over Ib despite having similar levels/abundance of Cac. This seems the real question, and is all that is needed to explain the Pr differences shown in Fig. 1. The other changes in active zone structure and organization at Is vs Ib may very well contribute to additional differences in Pr, but the authors have not shown this in the present study, and rely on other studies (such as calcium-SV coupling at Is vs Ib) to support an argument that is not necessitated by their data. At the end of this manuscript, the authors have found an interesting possibility that Stj levels are reduced at Is vs Ib, that might perhaps contribute to the difference in calcium influx. However, at present this remains speculative.

      Overall, the authors have generated powerful reagents for the field to study calcium channels and how they are regulated, but draw conclusions about active zone structure and organization contributing to functional heterogeneity that are not strongly supported by the data presented.

      Reviewer 1 raises an interesting question that we agree will form the basis of important studies. Here, we set out to address a different question, which we will work to better frame. While we and others had previously found a strong correlation between calcium channel abundance and synaptic release probability (Pr (Akbergenova et al., 2018; Gratz et al., 2019; Holderith et al., 2012; Nakamura et al., 2015; Sheng et al., 2012)), more recent studies found that calcium channel abundance does not necessarily predict synaptic strength (Aldahabi et al., 2022; Rebola et al., 2019). Our study explores this paradox and presents findings that provide an explanation: calcium channel abundance predicts Pr among individual synapses of either low-Pr type-Ib or high-Pr type-Is inputs where modulating channel number tunes synaptic strength, but does not predict Pr between the two inputs, indicating an inputspecific role for calcium channel abundance in promoting synaptic strength. Thus, we propose that calcium channel abundance predictably modulates synaptic strength among individual synapses of a single input or synapse subtype, which share similar molecular and spatial organization, but not between distinct inputs where the underlying organization of active zones differs. Consistently, in the mouse, calcium channel abundance correlates strongly with release probability specifically when assessed among homogeneous populations of connections (Aldahabi et al., 2022; Holderith et al., 2012; Nakamura et al., 2015; Rebola et al., 2019; Sheng et al., 2012).

      As Reviewer 1 notes, the two-fold difference in calcium influx at type-Is synapses is certainly an important difference underlying three-fold higher Pr. However, growing evidence indicates that calcium influx alone, like calcium channel abundance, does not reliably predict synaptic strength between inputs. For example, Rebola et al. (2019) compared cerebellar synapses formed by granule and stellate cells and found that lower Pr granule synapses exhibit both higher calcium channel abundance and calcium influx. In another example, Aldahabi et al. (2023) demonstrate that even when calcium influx is greater at high-Pr synapses, it does not necessarily explain differences in synaptic strength between inputs. Studying excitatory hippocampal CA1 synapses onto distinct interneuronal targets, they found that raising calcium entry at low-Pr inputs to high-Pr synapse levels is not sufficient to increase synaptic strength to high-Pr synapse levels. Similarly, at the Drosophila NMJ, the finding that type-Ib synapses exhibit loose calcium channel-synaptic vesicle coupling whereas type-Is synapses exhibit tight coupling suggests factors beyond calcium influx also contribute to differences in Pr between the two inputs (He et al., 2023). Consistently, a two-fold increase in external calcium does not induce a three-fold increase in release at low-Pr type-Ib synapses (He et al., 2023). Thus, upon finding that calcium channel abundance is similar at type-Ib and -Is synapses, we focused on identifying differences beyond calcium channel abundance and calcium influx that might contribute their distinct synaptic strengths. We agree that these studies, ours included, cannot definitively determine the contribution of identified organizational differences to distinct release probabilities because it is not currently possible to specifically alter subsynaptic organization, and will ensure that our language is tempered accordingly. However, in addition to the studies cited above and our findings, recent work demonstrating that homeostatic potentiation of neurotransmitter release is accompanied by greater spatial compaction of multiple active zone proteins (Dannhauser et al., 2022; Mrestani et al., 2021) and decreased calcium channel mobility (Ghelani et al., 2023) provide support for the interpretation that subsynaptic organization is a key parameter for modulating Pr.

      Reviewer #2 (Public Review):

      The authors aim to investigate how voltage-gated calcium channel number, organization, and subunit composition lead to changes in synaptic activity at tonic and phasic motor neuron terminals, or type Is and Ib motor neurons in Drosophila. These neuron subtypes generate widely different physiological outputs, and many investigations have sought to understand the molecular underpinnings responsible for these differences. Additionally, these authors explore not only static differences that exist during the third-instar larval stage of development but also use a pharmacological approach to induce homeostatic plasticity to explore how these neuronal subtypes dynamically change the structural composition and organization of key synaptic proteins contributing to physiological plasticity. The Drosophila neuromuscular junction (NMJ) is glutamatergic, the main excitatory neurotransmitter in the human brain, so these findings not only expand our understanding of the molecular and physiological mechanisms responsible for differences in motor neuron subtype activity but also contribute to our understanding of how the human brain and nervous system functions.

      The authors employ state-of-the-art tools and techniques such as single-molecule localization microscopy 3D STORM and create several novel transgenic animals using CRISPR to expand the molecular tools available for exploration of synaptic biology that will be of wide interest to the field. Additionally, the authors use a robust set of experimental approaches from active zone level resolution functional imaging from live preparations to electrophysiology and immunohistochemical analyses to explore and test their hypotheses. All data appear to be robustly acquired and analyzed using appropriate methodology. The authors make important advancements to our understanding of how the different motor neuron subtypes, phasic and tonic-like, exhibit widely varying electrical output despite the neuromuscular junctions having similar ultrastructural composition in the proteins of interest, voltage gated calcium channel cacophony (cac) and the scaffold protein Bruchpilot (brp). The authors reveal the ratio of brp:cac appears to be a critical determinant of release probability (Pr), and in particular, the packing density of VGCCs and availability of brp. Importantly, the authors demonstrate a brp-dependent increase in VGCC density following acute philanthotoxin perfusion (glutamate receptor inhibitor). This VGCC increase appears to be largely responsible for the presynaptic homeostatic plasticity (PHP) observable at the Drosophila NMJ. Lastly, the authors created several novel CRISPRtagged transgenic lines to visualize the spatial localization of VGCC subunits in Drosophila. Two of these lines, CaBV5-C and stjV5-N, express in motor neurons and in the nervous system, localize at the NMJ, and most strikingly, strongly correlate with Pr at tonic and phasic-like terminals.

      1) The few limitations in this study could be addressed with some commentary, a few minor follow-up analyses, or experiments. The authors use a postsynaptically expressed calcium indicator (mhcGal4>UAS -GCaMP) to calculate Pr, yet do not explore the contribution that glutamate receptors, or other postsynaptic contributors (e.g. components of the postsynaptic density, PSD) may contribute. A previous publication exploring tonic vs phasic-like activity at the drosophila NMJ revealed a dynamic role for GluRII (Aponte-Santiago et al, 2020). Could the speed of GluR accumulation account for differences between neuron subtypes?

      We did observe that GCaMP signals are higher at type Is synapses, where synapses tend to form later but GluRs accumulate more rapidly upon innervation (Aponte-Santiago et al., 2020). However, because we are using our GCaMP indicator as a plus/minus readout of synaptic vesicle release at mature synapses, we do not expect differences in GluR accumulation to have a significant effect on our measures. Consistently, the difference in Pr we observe between type-Ib and -Is inputs (Fig. 1C) is similar to that previously reported (He et al., 2023; Lu et al., 2016; Newman et al., 2022).

      2) The observation that calcium channel density and brp:cac ratio as a critical determinant of Pr is an important one. However, it is surprising that this was not observed in previous investigations of cac intensity (of which there are many). Is this purely a technical limitation of other investigations, or are other possibilities feasible? Additionally, regarding VGCC-SV coupling, the authors conclude that this packing density increases their proximity to SVs and contributes to the steeper relationship between VGCCs and Pr at phasic type Is. Is it possible that brp or other AZ components could account for these differences. The authors possess the tools to address this directly by labeling vesicles with JanellaFluor646; a stronger signal should be present at Is boutons. Additionally, many different studies have used transmission electron microscopy to explore SVs location to AZs (t-bars) at the Drosophila NMJ.

      To date, the molecular underpinnings of heterogeneity in synaptic strength have primarily been investigated among individual type-Ib synapses. However, a recent study investigating differences between type-Ib and -Is synapses also found that the Cac:Brp ratio is higher at type-Is synapses (He et al., 2023).

      At this point, we do not know which active zone components are responsible for the organizational (Figs. 1, 2) and coupling (now demonstrated by He et al., 2023) differences between type-Ib and -Is synapses or what establishes the differences in active zone protein levels we observe (Figs. 3,6), although Brp likely plays a local role. We find that Brp is required for dynamically regulating calcium channel levels during homeostatic plasticity and plays distinct roles at type-Ib and -Is synapses (Figs. 3, 4). Brp regulates a number of proteins critical for the distribution of docked synaptic vesicles near T bars of type Ib active zones, including Unc13 (Bohme et al., 2016). Extending these studies to type-Is synapses will be of great interest.

      3) In reference to the contradictory observations that VGCC intensity does not always correlate with, or determine Pr. Previous investigations have also observed other AZ proteins or interactors (e.g. synaptotagmin mutants) critically control release, even when the correlation between cac and release remains constant while Pr dramatically precipitates.

      This is an important point as a number of molecular and organizational differences between high- and low-Pr synapses certainly contribute to baseline functional differences. The other proteins we (Figs. 3,6) and others (Dannhauser et al., 2022; Ehmann et al., 2014; He et al., 2023; Jetti et al., 2023; Mrestani et al., 2021; Newman et al., 2022) have investigated are less abundant and/or more densely organized at type-Is synapses. Investigating additional active zone proteins, including synaptic proteins, and determining how these factors combine to yield increased synaptic strength are important next steps.

      4) To confirm the observations that lower brp levels results in a significantly higher cac:brp ratio at phasic-like synapses by organizing VGCCs; this argument could be made stronger by analyzing their existing data. By selecting a population of AZs in Ib boutons that endogenously express normal cac and lower brp levels, the Pr from these should be higher than those from within that population, but comparable to Is Pr. I believe the authors should also be able to correlate the cac:brp ratio with Pr from their data set generally; to determine if a strong correlation exists beyond their observation for cac correlation.

      We do not have simultaneous measures of Pr and Cac and Brp abundance. However, our findings suggest that distinct Cac:Brp ratios at type Ib and Is inputs reflect underlying organizational differences that contribute to distinct release probabilities between the two synaptic subtypes. In contrast, within either synaptic subtype, release probability is positively correlated with both Cac and Brp levels. Thus, the mechanisms driving functional differences between synaptic subtypes are distinct from those driving functional heterogeneity within a subtype, so we do not expect Cac:Brp ratio to correlate with Pr among individual type-Ib synapses. We will work to clarify this point in the revised text.

      5) For the philanthotoxin induced changes in cac and brp localization underlying PHP, why do the authors not show cac accumulation after PhTx on live dissected preparations (i.e. in real time)? This also be an excellent opportunity to validate their brp:cac theory. Do the authors observe a dynamic change in brp:cac after 1, or 5 minutes; do Is boutons potentiate stronger due to proportional increases in cac and brp? Also regarding PhTx-induced PHP, their observations that stj and α2δ-3 are more abundant at Is synapses, suggests that they may also play a role in PhTx induced changes in cac. If either/both are overexpressed during PhTx, brp should increase while cac remains constant. These accessory proteins may determine cac incorporation at AZs.

      As we have previously followed Cac accumulation in live dissected preparations and found that levels increase proportionally across individual synapses (Gratz et al., 2019), we did not attempt to repeat these challenging experiments at smaller type-Is synapses. We will reanalyze our data to investigate Cac:Brp ratio at individual active zones post PhTx. However, as noted above, we do not expect changes in the Cac:Brp ratio to correlate with Pr among individual synapses of single inputs as this measure reflects organization differences between inputs and PhTx induces an increase in the abundance of both proteins at both inputs.

      Determining the effect of PhTx on Stj levels at type-Ib and -Is active zones is an excellent idea and might provide insight into how lower Stj levels correlate with higher Pr at type-Is synapses. While prior studies have demonstrated critical roles for Stj in regulating Cac accumulation during development and in promoting presynaptic homeostatic potentiation (Cunningham et al., 2022; Dickman et al., 2008; Kurshan et al., 2009; Ly et al., 2008; Wang et al., 2016), its regulation during PHP has not been investigated.

      Taken together this study generates important data-driven, conceptional, and theoretical advancements in our understanding of the molecular underpinnings of different motor neurons, and our understanding of synaptic biology generally. The data are robust, thoroughly analyzed, appropriately depicted. This study not only generates novel findings but also generated novel molecular tools which will aid future investigations and investigators progress in this field.

      References

      Akbergenova, Y., K.L. Cunningham, Y.V. Zhang, S. Weiss, and J.T. Littleton. 2018. Characterization of developmental and molecular factors underlying release heterogeneity at Drosophila synapses. eLife. 7.

      Aldahabi, M., F. Balint, N. Holderith, A. Lorincz, M. Reva, and Z. Nusser. 2022. Different priming states of synaptic vesicles underlie distinct release probabilities at hippocampal excitatory synapses. Neuron. 110:4144-4161 e4147.

      Aponte-Santiago, N.A., K.G. Ormerod, Y. Akbergenova, and J.T. Littleton. 2020. Synaptic Plasticity Induced by Differential Manipulation of Tonic and Phasic Motoneurons in Drosophila. The Journal of neuroscience : the official journal of the Society for Neuroscience. 40:6270-6288.

      Bohme, M.A., C. Beis, S. Reddy-Alla, E. Reynolds, M.M. Mampell, A.T. Grasskamp, J. Lutzkendorf, D.D. Bergeron, J.H. Driller, H. Babikir, F. Gottfert, I.M. Robinson, C.J. O'Kane, S.W. Hell, M.C. Wahl, U. Stelzl, B. Loll, A.M. Walter, and S.J. Sigrist. 2016. Active zone scaffolds differentially accumulate Unc13 isoforms to tune Ca(2+) channel-vesicle coupling. Nature neuroscience. 19:1311-1320.

      Cunningham, K.L., C.W. Sauvola, S. Tavana, and J.T. Littleton. 2022. Regulation of presynaptic Ca(2+) channel abundance at active zones through a balance of delivery and turnover. Elife. 11.

      Dannhauser, S., A. Mrestani, F. Gundelach, M. Pauli, F. Komma, P. Kollmannsberger, M. Sauer, M. Heckmann, and M.M. Paul. 2022. Endogenous tagging of Unc-13 reveals nanoscale reorganization at active zones during presynaptic homeostatic potentiation. Front Cell Neurosci. 16:1074304.

      Dickman, D.K., P.T. Kurshan, and T.L. Schwarz. 2008. Mutations in a Drosophila alpha2delta voltagegated calcium channel subunit reveal a crucial synaptic function. The Journal of neuroscience : the official journal of the Society for Neuroscience. 28:31-38.

      Ehmann, N., S. Van De Linde, A. Alon, D. Ljaschenko, X.Z. Keung, T. Holm, A. Rings, A. Diantonio, S. Hallermann, U. Ashery, M. Heckmann, M. Sauer, and R.J. Kittel. 2014. Quantitative super-resolution imaging of Bruchpilot distinguishes active zone states. Nature Communications. 5.

      Ghelani, T., M. Escher, U. Thomas, K. Esch, J. Lützkendorf, H. Depner, M. Maglione, P. Parutto, S. Gratz, T. Matkovic-Rachid, S. Ryglewski, A.M. Walter, D. Holcman, K. O‘Connor Giles, M. Heine, and S.J. Sigrist. 2023. Interactive nanocluster compaction of the ELKS scaffold and Cacophony Ca<sup>2+</sup> channels drives sustained active zone potentiation. Science Advances. 9:eade7804.

      Gratz, S.J., P. Goel, J.J. Bruckner, R.X. Hernandez, K. Khateeb, G.T. Macleod, D. Dickman, and K.M. O'Connor-Giles. 2019. Endogenous tagging reveals differential regulation of Ca<sup>2+</sup> channels at single AZs during presynaptic homeostatic potentiation and depression. The Journal of Neuroscience:3068-3018.

      He, K., Y. Han, X. Li, R.X. Hernandez, D.V. Riboul, T. Feghhi, K.A. Justs, O. Mahneva, S. Perry, G.T. Macleod, and D. Dickman. 2023. Physiologic and Nanoscale Distinctions Define Glutamatergic Synapses in Tonic vs Phasic Neurons. The Journal of neuroscience : the official journal of the Society for Neuroscience. 43:4598-4611.

      Holderith, N., A. Lorincz, G. Katona, B. Rózsa, A. Kulik, M. Watanabe, and Z. Nusser. 2012. Release probability of hippocampal glutamatergic terminals scales with the size of the active zone. Nature neuroscience. 15:988-997.

      Jetti, S.K., A.B. Crane, Y. Akbergenova, N.A. Aponte-Santiago, K.L. Cunningham, C.A. Whittaker, and J.T. Littleton. 2023. Molecular Logic of Synaptic Diversity Between Drosophila Tonic and Phasic Motoneurons. bioRxiv:2023.2001.2017.524447.

      Kurshan, P.T., A. Oztan, and T.L. Schwarz. 2009. Presynaptic alpha2delta-3 is required for synaptic morphogenesis independent of its Ca2+-channel functions. Nature neuroscience. 12:1415-1423.

      Lu, Z., A.K. Chouhan, J.A. Borycz, Z. Lu, A.J. Rossano, K.L. Brain, Y. Zhou, I.A. Meinertzhagen, and G.T. Macleod. 2016. High-Probability Neurotransmitter Release Sites Represent an Energy-Efficient Design. Current biology : CB. 26:2562-2571.

      Ly , C.V., C.-K. Yao , P. Verstreken , T. Ohyama , and H.J. Bellen 2008. straightjacket is required for the synaptic stabilization of cacophony, a voltage-gated calcium channel α1 subunit. Journal of Cell Biology. 181:157-170.

      Mrestani, A., M. Pauli, P. Kollmannsberger, F. Repp, R.J. Kittel, J. Eilers, S. Doose, M. Sauer, A.-L. Sirén, M. Heckmann, and M.M. Paul. 2021. Active zone compaction correlates with presynaptic homeostatic potentiation. Cell Reports. 37:109770.

      Nakamura, Y., H. Harada, N. Kamasawa, K. Matsui, Jason S. Rothman, R. Shigemoto, R.A. Silver, David A. DiGregorio, and T. Takahashi. 2015. Nanoscale Distribution of Presynaptic Ca2+ Channels and Its Impact on Vesicular Release during Development. Neuron. 85:145-158.

      Newman, Z.L., D. Bakshinskaya, R. Schultz, S.J. Kenny, S. Moon, K. Aghi, C. Stanley, N. Marnani, R. Li, J. Bleier, K. Xu, and E.Y. Isacoff. 2022. Determinants of synapse diversity revealed by superresolution quantal transmission and active zone imaging. Nature Communications. 13:229.

      Rebola, N., M. Reva, T. Kirizs, M. Szoboszlay, A. Lőrincz, G. Moneron, Z. Nusser, and D.A. Digregorio. 2019. Distinct Nanoscale Calcium Channel and Synaptic Vesicle Topographies Contribute to the Diversity of Synaptic Function. Neuron. 104:693-710.e699.

      Sheng, J., L. He, H. Zheng, L. Xue, F. Luo, W. Shin, T. Sun, T. Kuner, D.T. Yue, and L.-G. Wu. 2012. Calcium-channel number critically influences synaptic strength and plasticity at the active zone. Nature neuroscience. 15:998-1006.

      Wang, T., R.T. Jones, J.M. Whippen, and G.W. Davis. 2016. alpha2delta-3 Is Required for Rapid Transsynaptic Homeostatic Signaling. Cell Rep. 16:2875-2888.

    1. Author Response:

      We sincerely appreciate the recognition from both reviewers regarding the innovative gradual activity-blocking design employing NBQX, as well as the robustness of our approach that integrates experimental and computational approaches to investigate the interplay between homeostatic functional and structural plasticity in response to activity deprivation.

      Acknowledging the raised concerns and insightful advice shared by the reviewers, we provide the the following provisional response:

      Why did we focus on activity silencing? Our decision to focus on chronic activity deprivation stems from a robust body of evidence—summarised in the recent review by Moulin and colleagues (2022)—that highlights the consistent occurrence of homeostatic spine loss alongside synaptic downscaling in response to prolonged excitation. In contrast, chronic silencing studies, as outlined in the same review, exhibit inconsistencies and contradictions, with spine loss often manifesting as non-homeostatic. After carefully reviewing the available data, we formulated two hypotheses to account for this heterogeneity: (i) the non-linear nature of activity-dependent structural plasticity, and (ii) the intricate interplay between homeostatic synaptic scaling and structural plasticity influenced by factors such as the extend of activity deprivation, specific dendritic segments, cell phenotypes, brain regions, and even across species. The intricate exploration of these hypotheses necessitated a systematic approach through computational simulations (and suitable experiments). The present manuscript intentionally confines the discussion of heightened activity to a proof-of-concept computer simulation, underscoring our deliberate emphasis on the central theme of activity silencing. Nevertheless, we do concur with the reviewers that an intriguing avenue for future exploration lies in extending the model to encompass homeostatic synaptic downscaling triggered by augmented activity.

      Why did we choose NBQX and why didn't we extensively characterise it? We utilised NBQX, a competitive antagonist targeting AMPA receptors, enabling us to finely modulate network activity via dosages (as elucidated by Wrathall et al., 2007), surpassing the control attainable with TTX. Despite its atypical role in studying homeostatic synaptic plasticity, NBQX boasts commendable efficacy in regulating network activity, substantiated by our electrophysiological recordings as well as in vivo and in vitro studies (Follett et al., 2000; Wrathall et al., 2007). However, it's worth noting that NBQX selectively binds to GluA2-containing AMPA receptors, pivotal for TTX-triggered synaptic scaling (Gainey et al., 2009) and glutamate-induced spine protrusion in the presence of TTX (Richards et al., 2005). Importantly, there's no conclusive evidence suggesting that NBQX, when applied in isolation (without TTX), hinders the synthesis or insertion of AMPA receptors. While we acknowledge the interest and value in characterising NBQX separately, such an endeavour extends beyond the immediate scope of our current study.

      It's pertinent to also note that the models we employed—activity (calcium) dependent homeostatic synaptic scaling and structural plasticity—are inherently phenomenological in nature. In essence, these models refrain from delving into intricate molecular mechanisms beyond the regulation of calcium concentration by firing rates. Given the highly phenomenological nature of our models, introducing a detailed molecular characterization of NBQX, or expanding into a chronic increase in network activity scenarios targeting different molecular pathways, could potentially create misleading expectations among our readers, implying a level of molecular pathway implementation that is not our immediate focus.

      Did the model successfully replicate the experimental findings? Achieving a strong agreement between computer simulations and empirical data is often a sought-after outcome, particularly when both aspects are integrated within a single study. However, this congruence is not always the primary intent. In our present investigation, we introduced three distinct ways in which experimental data merged with computational studies: to provide informative input, to validate hypotheses, and to stimulate novel ideas.

      Our experiments primarily aimed to inform the computational model through an analysis of spine density. The computational framework was envisioned to yield insights that could be broadly applicable, extending beyond the mere replication of conducted experiments. In this context, our modelling outcomes effectively mirrored the heterogeneous alterations in synapse numbers observed in various in vivo and in vitro studies following activity deprivation—ranging from homeostatic increases to non-homeostatic synapse loss.

      Our model also proposed a plausible mechanism illustrating how synaptic scaling might propel the transition from non-homeostatic synapse loss to the restoration of synapse levels, achieved by maximising inputs from active spines. This supposition found partial confirmation when considering both our experimentally obtained spine sizes and those detailed in the existing literature—pointing to a reduction in spine numbers but a conservation of larger spine sizes during complete activity blockade.

      Moreover, our experimental observations unveiled certain aspects that, while not entirely encompassed by our model, have the potential to inspire future modelling studies. For instance, we observed size-dependent changes in spine sizes under complete activity blockade; we also observed inconsistent combinations of spine density and size changes across dendritic segments upon activity deprivation. The prospect of reconfiguring the interplay between structural plasticity and synaptic scaling rules to elucidate the observed heterogeneity in outcomes stands as an intriguing avenue worth revisiting, particularly as the modelling of structural plasticity within a network of intricately detailed neurons becomes feasible.

      In summary, while the aspiration to faithfully replicate experimental outcomes exists, achieving an exact correspondence between a purposefully simplified system, like the point neural network we employed in our study, and real-world data should be approached with caution. Striving for such a match carries the risk of overfitting and prematurely advancing conclusions that might not stand the test of broader applications.

      Why did we establish strict definitions for functional and structural plasticity? The rationale behind this strategic decision lies in the historical breadth of the term "structural plasticity," encompassing a wide array of high-dimensional alterations in neural morphology throughout development and adulthood. This expansive interpretation contributed to the delayed development of computational models specifically targeting structural plasticity. Moreover, certain elements, like spine sizes, blur the boundaries with the functional facet of synapses as also mentioned by the reviewers. We hope the reviewers and readers concur with our perspective that implementing structural plasticity through the manipulation of synapse numbers—effectively enabling dynamic (re)wiring—provides a high degree of freedom and robustness. Synaptic size seamlessly translates into synaptic weights within the modelling framework. While the distinction between synaptic weight and synapse number may seem stringent, it meticulously prepares the groundwork for addressing a fundamental question: How does the gradual modification of synapse numbers, juxtaposed with the swift modulation of synaptic weights, interact within a perpetually evolving dynamic system? In this respect our study serves as a panoramic vista, unveiling possibilities wherein distinct combinations of these two governing principles can engender divergent outcomes. This contribution not only stands as a benchmark but also extends a welcoming embrace to forthcoming structural plasticity models that embrace the concept of continuous size and number alterations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript describes an interesting experiment in which an animal had to judge a duration of an interval and press one of two levers depending on the duration. The Authors recorded activity of neurons in key areas of the basal ganglia (SNr and striatum), and noticed that they can be divided into 4 types.

      The data presented in the manuscript is very rich and interesting, however, I am not convinced by the interpretation of these data proposed in the paper. The Authors focus on neurons of types 1 & 2 and propose that their difference encodes the choice the animal makes. However, I would like to offer an alternative interpretation of the data. Looking at the description of task and animal movements seen in Figure 1, it seems to me that there are 4 main "actions" the animals may do in the task: press right lever, press left lever, move left, and move right. It seems to me that the 4 neurons authors observed may correspond to these actions, i.e. Figure 1 shows that Type 1 neurons decrease when right level becomes more likely to be correct, so their decrease may correspond to preparation of pressing right lever - they may be releasing this action from inhibition (analogously Type 2 neurons may be related to pressing left lever). Furthermore, comparing animal movements and timing of activity of neurons of type 3 and 4, it seems to me that type 3 neurons decrease when the animal moves left, while type 4 when the animal moves right.

      I suggest Authors analyse if this interpretation is valid, and if so, revise the interpretation in the paper and the model accordingly.

      We thank the reviewer for the general appreciation of the study. Regarding to the interpretation of each SNr subtypes, we have compared firing activities of the same SNr neurons in both standard 2-8 s task and reversed 2-8 s task (Figure 2G-R, Figure S4). Type 1 and Type 2 neurons are related to right and left choices respectively in the standard task (Figure 2G, M, N), and this is even more evident in the reversed 2-8 s task (Figure 2J), because when the movement trajectories of the same mice in 8-s trials were reversed from left-then-right in the control task (Figure 2I) to right-then-left in the reversed task (Figure 2L), the Type 1 SNr neurons which showed monotonic decreasing dynamics in the control 2-8 s task (Figure 2M) reversed their neuronal dynamics to a monotonic increase in the reversed 2-8 s task (Figure 2P). The same reversal of neuronal dynamics was also observed in Type 2 SNr neurons in the reversed version of standard task (Figure 2N vs Figure 2Q). Therefore, Type 1 and Type 2 neurons are related to the action selection. Furthermore, Type 3 and Type 4 SNr neurons exhibiting transient change when mice switching either from left to right, or from left to right maintained the same neuronal dynamics in both standard 2-8 s task and reversed 2-8 s task (Figure S4C-F), indicating that Type 3 and Type 4 neurons are related to the switch between choices but not the specific upcoming choice to be made.

      Reviewer #1 (Recommendations For The Authors):

      Suggest to clarify if SNr neurons recorded just from a single hemisphere or bilaterally.

      We have described the recording hemisphere in our Methods (page 46, lines 974-976) as follows “For striatum recording, we implanted 11 mice in the left hemisphere and 8 mice in the right hemisphere. For the SNr recording, we implanted 5 mice in the left hemisphere and 4 mice in the right hemisphere.”

      Suggest to analyse if type 1/2/3/4 neurons are preferrably located in hemispheres contra/ipsi lateral to a particular lever or movement.

      We have addressed this issue in Figure S3 and Figure S6. In fact, we have implanted electrodes in both left and right hemispheres with mirror M-L coordinates. For striatum recording, we implanted 11 mice in the left hemisphere and 8 mice in the right hemisphere. For the SNr recording, we implanted 5 mice in the left hemisphere and 4 mice in the right hemisphere. We have analyzed the striatal and SNr neuronal activity in left vs. right hemisphere respectively, in relation to action selection. We found that SNr neurons recorded in either left or right hemisphere exhibited the same four types of neural dynamics with similar proportions (Fig. S3). Specially, the Type 1 neurons are dominant in both hemispheres. Similar in striatum, SPNs from left and right hemispheres showed the same four types of neural dynamics with similar proportions (Fig. S6). Therefore, there is no significant difference between hemispheres regarding to the proportion of neuron subtypes.

      Suggest to investigate if type 1/2 neurons are involved in preparation for lever press, please investigate if these neurons are also changing their activity during the lever press.

      In Figure S1L, we have showed the neuronal activities of example Type 1 and Type 2 SNr neurons to rewarded and non-rewarded lever presses. Type 1 SNr neuron shows higher firing activities when pressing the left lever than pressing the right lever, whereas Type 2 SNr neuron shows higher firing activities when pressing the right lever than pressing the left lever, indicating that Type 1 and Type 2 neurons firing activities are action choice dependent.

      Suggest investigating if Type 3/4 neurons are controlling movement from one location to another, please analyse if their activity is correlated with the movement on trial by trial bases.

      In Figure S2C-D, we showed firing activities of example Type 3 and Type 4 neurons on trial-by-trial bases. Type 3 neuron showed increased firing activities between 3-4 s during the 8s lever retraction period when the animal switched from left side to right side, whereas Type 4 neuron showed decreased firing activities between 3-4 s during as the animal switching from left to right. We further showed in Figure S4C-F, Type 3 and Type 4 neurons Type 3 and Type 4 neurons are related to the switch between choices but not the specific upcoming choice to be made.

      Suggest also performing analogous analyses for striatal neurons.

      We showed 4 types of SPNs on the on trial-by-trial bases as follows. Due to the limitation of the number of figures, these data were not included in the manuscript. We have now included these results in Fig. S2(E-H).

      Typo: l. 68: "can bidirectionally regulates" -> "can bidirectionally regulate"

      Thanks, we have now corrected the typos.

      Reviewer #2 (Public Review):

      In this valuable manuscript Li & Jin record from the substantial nigra and dorsal striatum to identify subpopulations of neurons with activity that reflects different dynamics during action selection, and then use optogenetics in transgenic mice to selectively inhibit or excite D1- and D2- expressing spiny projection neurons in the striatum, demonstrating a causal role for each in action selection in an opposing manner. They argue that their findings cannot be explained by current models and propose a new 'triple control' model instead, with one direct and two indirect pathways. These findings will be of broad interest to neuroscientists, but lacks some direct evidence for the proposal of the new model.

      Overall there are many strengths to this manuscript including the fact that the empirical data in this manuscript is thorough and the experiments are well-designed. The model is well thought through, but I do have some remaining questions and issues with it.

      Weaknesses:

      1) The nature of 'action selection' as described in this manuscript is a bit ambiguous and implies a level of cognition or choice which I'm not sure is there. It's not integral to the understanding of the paper really, but I would have liked to know whether the actions are under goal-directed/habitual or even Pavlovian control. This is not really possible to differentiate with this task as there are a number of Pavlovian cues (e.g. lever retraction interval, house light offset) that could be used to guide behavior.

      Sorry for the confusion of task description in the manuscript. We appreciate reviewer’s deep understanding about the complexity of the 2-8 s task we designed. Indeed, the 2-8 s task can’t be simply categorized as goal-directed/habitual or Pavlovian task. There are several behavioral aspects in this task. Lever retraction is served as a Pavlovian cue for mice to start performing the left-then-right sequential movement, but once levers are retracted, there is no cue available to mice during the lever retraction period, and mice have to make a decision to switch choice solely based on its internal estimation of the passage of time, which is considered as a cognitive process. The house light stays on for the entire training session (2 – 3 hours), and will be turned off when the task is done, so house light will not be used as a guidance for choice behavior. The behavior and neural activities during the lever retraction period is our main focus in this manuscript. The main advantage of such task design is that the animal is engaged in a self-determined, dynamic switch of action selection process, which offers a unique opportunity for investigating the role of various neuronal populations in the basal ganglia pathways during action selection.

      2) In a similar manner, the part of the striatum that is being targeted (e.g. Figures 4E,I, and N) is dorsal, but is central with regards to the mediolateral extent. We know that the function of different striatal compartments is highly heterogeneous with regards to action selection (e.g. PMID: 16045504, 16153716, 11312310) so it would have been nice to have some data showing how specific these findings are to this particular part of dorsal striatum.

      We thank the reviewer for bringing up this point. We are targeting dorsal-central part of striatum. In Figure S5G-L, we showed the specific location we targeted in striatum. Also as specified in Methods (lines 965-970), the craniotomies for electrode implantation were made at the following coordinates: 0.5 mm rostral to bregma and 1.5 mm laterally, and ~ 2.2 mm from the surface of the brain for dorsal striatum. For the virus injection and optic fiber implantation (lines 997-998), the craniotomies was made bilaterally at 0.5 mm rostral to bregma, 2 mm laterally and ~ 2.2 mm from the surface of the brain.

      3) I'm not sure how I feel about the diagrams in Figure 4S. In particular, the co-activation model is shown with D2-SPNs represented as a + sign (which is described as "having a facilitatory effect to selection" in the caption), but the co-activation model still suggests that D2-SPNs are largely inhibitory - just of competing actions rather than directly inhibiting actions. Moreover, I am not sure about these diagrams because they appear to show that D2-SPNs far outnumbers D1-SPNs and we know that this isn't the case. I realize the diagrams are not proportionate, but it still looks a bit misrepresented to me.

      We appreciate the reviewer’s comments about the diagram. We borrowed and extended the “center-surround” layout from the receptive field of neurons in the early visual system, as an intuitive analogy in describing the functional interaction among striatal pathways (also see Mink 2003 Archives of Neurology). In the co-activation model, if D2-SPNs inhibit the competing action, then the target action will be more likely to be selected due to the reduced competition, which means D2-SPNs actually facilitate the target action in an indirect way. And this is why we define the effect of D2-SPNs in the co-activation model as facilitatory. The area of each region does not represent the amount of cells but mainly qualitative functional role. To make it clearer, we have now added more explanation in the manuscript (page 17, lines 338-341).

      4). There are a number of grammatical and syntax errors that made the manuscript difficult to understand in places.

      We have now gone through the text carefully and corrected the typos.

      5) I wondered if the authors had read PMID: 32001651 and 33215609 which propose a quite different interpretation of direct/indirect pathway neurons in striatum in action selection. I wonder if the authors considered how their findings might fit within this framework.

      We appreciate the reviewer’s comments and suggestion. Miriam Matamales et al. (2020, PMID: 32001651) found that dynamic D2- to D1-SPNs transmodulation across the striatum that is necessary for updating previously learned behavior, which highlights the importance of collateral modulations between D1- and D2-SPNs as an additional layer of behavior control besides the classic direct and indirect pathways. This finding is compatible with our “Triple control” model emphasizing the influence of collateral modulations within striatum on behavior choice. James Peak et al. (2020, PMID: 33215609) demonstrated that D2-SPNs are critical to maintain the flexibility of behavior, which is reflected in our “Triple-control” model that activation of D2-SPNs could trigger the behavioral switch from the current action to another action. Although the two studies mentioned above mainly investigate the roles of striatal D1- and D2-SPNs in action learning and behavioral strategies, their functions in general fit within our new ‘Triple-control’ model of basal ganglia pathways for action selection.

      6) There is no direct evidence of two indirect pathways, although perhaps this is beyond the scope of the current manuscript and is a prediction for future studies to test.

      As accumulating RNA-seq and physiological data implying the heterogeneity of D2-SPNs, the further investigation of the subtypes of D1- and D2-SPNs and their functionality are likely a direction the field will continue to explore. On the other hand, we have discussed other possible anatomical circuits within basal ganglia circuitry that could fulfill the functional role of a third pathway in our new ‘Triple-control’ model, together with or independent of the second indirect pathway (page 32-33, lines 689-700). We certainly hope that our new model will inspire future work to identify and dissect the additional functional pathways in the basal ganglia circuits for action control.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for authors:

      1) Consider how specific to the dorso-central striatum these findings are, possibly in the discussion.

      We have specified in the Discussion that the study is targeting dorsal-central part of striatum (page 29, lines 609-612).

      2) Modify the diagrams in 4S to make them more representative of the model's features.

      We have responded this comment above.

      3) Consider whether the findings here might fit within the role for direct pathway in excitatory action-outcome learning and the indirect pathway in response flexibility more generally.

      The current study is mainly focus on selection and execution of actions. It will definitely be important to continue exploring the functionality of direct vs. indirect pathways in the action learning process.

      4) Correct typos and grammatical errors including (but not limited to):

      a) Line 62-64 - explain why this is controversial? Is it because we don't know which one applies?

      In the “Go/No-go” model, indirect pathway inhibits the desired action and function as gain modulation, while in the “Co-activation” model, indirect pathway inhibits the competing action and in turn facilitates the desired action in an indirect manner, therefore these two existing models disagree with each other on the explanation the function of indirect pathway in its targeting action and the net outcome of behavior.

      b) Line 68 - Regulates should be regulate.

      This has been corrected in the revised manuscript.

      c) Line 86 - should read "there are neuronal populations in either the direct or indirect pathway that are activated..."

      This has been corrected in the revised manuscript.

      d) Line 146-147 - "these types of neuronal dynamics in Snr only appeared in the correct but not incorrect trials" - It seems the authors are suggesting this only for Types 1 and 2 neurons, but this confused me the first time I read it and I suggest it is made clearer.

      Line 146-147 now reads “These four types of neuronal dynamics in SNr only appeared…”

      e) Line 346 - significant should be significantly.

      This has been corrected in the revised manuscript.

      f) Line 360 "contrast" should be "contrasting".

      This has been corrected in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive remarks. We have addressed the reviewers’ recommendations in the point-by-point response below to improve our revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. The authors carry out their HDX-MS work on Prestin (and SLC26A9) solubilized in glycol-diosgenin. The authors should carefully rationalize their choice of detergent and discuss how their key findings are also pertinent to the native state of Prestin when residing in an actual phospholipid bilayer. More native membrane mimetic models are available, for instance, nano-discs etc. While I am not insisting that the authors have to repeat their measurements in a more native membrane system, it would be a very nice control experiment, and in any case, a detailed discussion of the limitations of the approach taken and possible caveats should be included - possibly with additional references to other studies.

      Response: We have added a paragraph rationalizing the choice of detergent in lines 174-176. We have also added requested HDX data comparing prestin reconstituted in nanodisc to prestin solubilized in micelle (Fig 5). The HDX for prestin under these two membrane mimetics were indistinguishable, including the anion-binding site, suggesting that our major findings are likely pertinent to prestin residing in a lipid bilayer. The only major HDX difference we observed was that a lipid-facing helix TM6 is more dynamic for prestin in nanodisc compared to in micelles. In our previous structural studies, we identified TM6 as the “eletromotile elbow” that is important for prestin’s mechanical expansion (Bavi et al., Nature, 2021). We are currently conducting a more thorough investigation to understand the role of TM6 in prestin’s electromotility.

      1. As far as I understand, the HEPES state represents the apo-state and thus assumes that HEPES does not bind to Prestin - the authors should support this assumption or include a discussion of the possible effect of HEPES on Prestin. Also, the HEPES state has fewer time-points - this should also be discussed.

      Response: We have included a discussion of the possible effects of HEPES in lines 331-345. In fact, in an attempt to support our assumption that HEPES does not bind to prestin, we set out to determine the structure of prestin in the HEPES-based buffer using single particle cryo-EM. However, we did not find evidence that HEPES binds to prestin. Details are discussed in lines 331-345 and Supporting Information Text 3.

      We employed a denser sampling of HDX labeling times for prestin in Cl- because it is critical for fitting and ∆G calculation. The earlier time points are used mainly to evaluate the dynamics of the less stable cytosolic domain. Since the cytosolic domain does not directly participate in prestin’s voltage-sensing mechanism and electromotility, we only measured the HEPES states with longer time points which mainly probe the dynamics of the transmembrane domain.

      1. Overall, the HDX-MS data provided and the statistical analysis done is in my view sufficiently detailed and well done - the authors are advised to make reference to and include a HDX Summary table and HDX Data Table according to the HDX-MS community-guidelines (Masson et al. Nature Methods 2019).

      Response: An HDX summary table was provided in Table S1 and referred in lines 81 and 388. We have included a reference to Masson et al., Nature Methods, 2019, in line 389.

      1. Figure 5 - I like the detailed analysis of the helix folding - but in my experience, one can provide a great fit of many HDX curves to a 4 -term exponential function - I think the authors would need more time-points to provide a more convincing case. But it does provide a compelling theory - even if the data strictly does not prove it. The authors should discuss this in more detail - including limitations etc.

      Response: We presented a statistical analysis describing the accuracy of the fitting in Fig 6A. We acknowledge that the values of the exponentials may not be precisely determined, but the fundamental result is robust – TM3 exchanges through fraying from the N-terminal end of the helix while TM6 exchanges much more cooperatively. Collecting additional time points may reduce the error on the rates but would not contribute to additional mechanistic insights.

      Reviewer #2 (Recommendations For The Authors):

      1. I suggest toning down more speculative/ hypothetical aspects. Specifically, I believe that the following sentence should not be in the abstract in its present form: "This event shortens the TM3-TM10 electrostatic gap, thereby connecting the two helices such that TM3-anion-TM10 is pushed upwards by forces from the electric field, resulting in reduced cross-sectional area."

      Response: The sentence has been rephrased.

      1. The "nuance" between helix fraying and helix unfolding is an important aspect of the author's hypothesis but this should be explained better. In that regard, have the authors performed HDX-MS analysis of the mutant P136T? That would nicely support their claim regarding the importance of helix fraying as being foundational to allow electromotility.

      Response: More explanation for helix fraying and unfolding has been provided in the main text. We have not performed HDX-MS analysis of the mutant P136T. However, we performed molecular dynamics simulations using Upside, and consistently, showed that a P136T mutation in prestin results in a highly stabilized TM3 (Fig. S4B).

      1. Why do measurements at two pDs? Did the authors observe any differences?

      Response: The purpose of two pDs is to increase the effective dynamic range of the HDX measurement by two orders of magnitude because the intrinsic exchange rate scales with pD & Temp. This allows us to determine the stability of both the highly and minimally stable regions within the protein. We have rephrased lines 83-87 to better rationalize this choice of pDs. With the time points performed in this study, we did not observe noticeable differences for HDX performed under the two pDs when corrected for the changes in the intrinsic rates (Fig. S7A).

      1. I can't help but wonder what is the interest in doing HDX-MS measurements after 27h of incubation. Membrane proteins are known for their instability once purified and a few odd HDX profiles at that specific timepoint (especially in the 80-100 residues area) make one question whether local unfolding preceding aggregation could happen. This actually weakens the author's claims about cooperative unfolding and localized and directional helix fraying. Could they provide some evidence (CD, thermostability measurements such as trp fluorescence quenching, or SEC analysis) that the prestin is still folded after 27h in GDN.

      Response: We appreciate reviewer’s comments on membrane proteins can be unstable once purified. In our system, we did not observe evidence of unfolding or aggregation caused by long-term incubation after purification. This is mostly supported by the fact that our HDX reactions were initiated and injected to MS in random order, yet are still highly reproducible among biological and technical replicates. A specific example included HDX on freshly purified SLC26A9 gave the same deuteration levels as SLC26A9 purified in GDN after 4 days. For prestin, although we don’t have direct comparison between fresh samples and old samples (24-27h post-purification) due to the lack of samples, 30s HDX in SO42- performed 24h post-purification gave a %D that fell between 10s and 90s of labeling done on fresh sample. Additionally, HDX on prestin in Cl- performed on freshly purified sample gave the sample %D as prestin in the presence of 1M urea labeled after 24~48h of purification, suggesting that prestin is relatively resistant to aggregation at least within 48h after purification even in the presence of 1 M urea (data not shown).

      Furthermore, the HDX for prestin in nanodisc are essentially identical to prestin in micelles except for a functionally important helix (TM6), suggesting minimal aggregation or misfolding.

      We think the “a few odd HDX profiles” at 27h time points for residues 80-100 are caused by two reasons. Firstly, TM1 unfolds cooperatively and its stability in HEPES falls within the detection range when long labeling time points were employed (within one log unit of 27h). Secondly, we observed two non-interconverting and structurally distinct populations for TM1 (Supporting Information Text 1 & Fig. S8), and in long labeling times, the two isotope distributions merge and sometimes can skew the %D calculations. Nevertheless, the HDX differences we observed comparing across conditions are clear and such %D calculation skewing, if present, should be minimal and does not change our main conclusions.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model substrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperone-deficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaK-independent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Anderson, Henikoff, Ahmad et al. performed a series of genomics assays to study Drosophila spermatogenesis. Their main approaches include (1) Using two different genetic mutants that arrest male germ cell differentiation at distinct stages, bam and aly mutant, they performed CUT&TAG using H3K4me2, a histone modification for active promoters and enhancers; (2) Using FACS sorted pure spermatocytes, they performed CUT&TAG using antibodies against RNA PolII phosphorylated Ser 2, H4K16ac, H3K9me2, H3K27me3, and ubH2AK118. They also compare these chromatin profiling results with the published single-cell and single-nucleus RNA-seq data. Their analyses are across the genome but the major conclusions are about the chromatin features of the sex chromosomes. For example, the X chromosome is lack of dosage compensation as well as inactivation in spermatocytes, while Y chromosome is activated but enriched with ubH2A in spermatocytes. Overall, this work provides high-quality epigenome data in testes and in purified germ cells. The analyses are very informative to understand and appreciate the dramatic chromatin structure change during spermatogenesis in Drosophila. Some new analyses and a few new experiments are suggested here, which hopefully further take advantage of these data sets and make some results more conclusive.

      Major comments: 1. The step-wise accumulation of H3K4me2 in bam, aly and wt testes are interesting. Is it possible to analyse the cis-acting sequences of different groups of genes with distinct H3K4me2 features, in order to examine whether there is any shared motif(s), suggesting common trans-factors that potentially set up the chromatin state for activating gene expression in a sequential manner?

      While the histone H3K4me2 mark is low and more widespread at genes active in late spermatocytes and in spermatids (shown in Figure 2C and some examples in Figure 1C-D), we suggest that this may be due to a general decrease in the importance of this modification in late spermatogenesis rather than a specific feature of those genes. We point this out in lines 146-152. This idea is supported by the widespread change in RNAPII distribution in all genes in the germline, shown in Figure 3F and supplementary Figure 2.

      1. Pg. 4, line 141-142: "we cannot measure H3K4me2 modification at the bam promoter in bam mutant testes or at the aly promoter in aly mutant testes", what are the allelic features of the bam mutant and aly mutant? Are the molecular features of these mutations preventing the detection of H3K4me2 at the endogenous genes' promoters? Also, the references cited (Chen et al., 2011) and (Laktionov et al., 2018) are not the original research papers where these two mutants were characterized.

      We have corrected these citations to the original papers. We clarified in the text that the bamΔ86 allele is a deletion of almost all of the coding sequence (reported in Bopp, D., Horabin, J.I., Lersch, R.A., Cline, T.W., Schedl, P. (1993). Expression of the Sex-lethal gene is controlled at multiple levels during Drosophila oogenesis. Development 118(3): 797--812.). The aly1 allele is also a P element-induced mutation; it is not molecularly characterized (it was first described here: Lin, T.Y., Viswanathan, S., Wood, C., Wilson, P.G., Wolf, N., Fuller, M.T. (1996). Coordinate developmental control of the meiotic cell cycle and spermatid differentiation in Drosophila males. Development 122(4): 1331--1341.) We noticed a lack of reads for various histone modifications in aly mutants in part of the gene, suggesting that the deletion is limited to the promoter and the first exon. Signal for the H3K4me2 modification is at background levels for the distal portion of aly, suggesting that the deletion inactivates the gene.

      1. The original paper that reported the Pc-GFP line and its localization is: Chromosoma 108, 83 (1999).

      We are citing the first published description of this marker in the male germline (lines 291-293).

      The Pc-GFP is ubiquitously expressed and almost present in all cell types. In Figure 6B, there is no Pc-GFP signals in bam and aly mutant cells.

      We apologize, our labeling of the figure was easily overlooked - the bam and aly genotypes do not carry the PcGFP marker, since we didn’t need it for staging the germline nuclei. We have clarified this in the figure.

      According to the Method "one testis was dissected", does it mean that only one testis was prepared for immunostaining and imaging? If so, definitely more samples should be used for a more confident conclusion.

      We corrected the text to make it clear that all cytological examinations were repeated at least times (lines 438-439).

      Also, why use 3rd instar larval testes instead of adult testes?

      Generally, we find that immunostaining of the larval testes is cleaner, and we now mention this in the Methods (lines 439-440). We have immunostained both larval and adult testes for these markers with consistent results.

      Finally, it is better to compare fixed tissue and live tissue, as the Pc-GFP signal could be lost during fixation and washing steps. Please refer to the above paper [Chromosoma 108, 83 (1999)] for Pc-GFP in spermatogonial cells and Development 138, 2441-2450 (2011) for Pc-GFP localization in aly mutant.

      We are using PcGFP staining for staging with antibody detection of other chromatin features, which requires fixed material, although we have compared PcGFP signal in both live and fixed tissue. We have added the 1999 reference for nuclear staging in the male germline.

      1. Ubiquitinylation of histone H2A is typically associated with gene silencing, here it has been hypothesized that ubH2A contributes to the activation of Y chromosome. This conclusion is strenuous, as it entirely depends on correlative results.

      We agree that this is a correlation. We cite in the text examples where uH2A is associated with gene activation. We have added a comment to clarify that this is a correlation (lines 318-320), and now present an alternative that uH2A on the Y chromosome may be moderating expression from these highly active genes (lines 405-407).

      For example, the lack of co-localization of ubH2A immunostaining and Pc-GFP are not convincing evidence that ubH2A is not resulting from PRC1 dRing activity. It would be a lot stronger conclusion by using genetic tools to show this. For example, if dRing is knocked down (using RNAi driven by a late-stage germline driver such as bam-Gal4) or mutated in spermatocytes (using mitotic clonal analysis), would they detect changes of ubH2A levels?

      We have tested multiple constructs to knockdown dRING using the bam-GAL4 driver although we have not reported it in the manuscript. These knockdowns have no effect on uH2A staining in the testis, on motile sperm production, or on male fertility, although these RNAi constructs do produce Polycomb phenotypes when expressed in somatic cells from an en-GAL4 driver. This is the reason why we point out in the text that there are multiple alternative candidates for an H2A ubiquitin ligase in the Drosophila genome and that in other species RING1 is not responsible for sex body uH2A in the male germline (lines 394-396).

      1. Regarding "X chromosome of males is thought to be upregulated in early germline cells", it has been shown that male-biased genes are deprived on the X chromosome [Science 299:697-700 (2003); Genome Biol 5:R40 (2004); Nature 450:238-241 (2007)], so are the differentiation genes of spermatogenesis [Cell Research 20:763-783 (2010)]. It would be informative to discuss the X chromatin features identified in this work with these previous findings.

      We now mention that the Drosophila X chromosome is moderately depleted of male germline-expressed genes (lines 362-363).

      For example, the lack of RNAPII on X chromosome in spermatocytes could be due to a few differentiation genes expressed in spermatocytes located on the X chromosome.

      We show in Figure 3B that there is a minor non-significant reduction in RNAPII on the X chromosome in spermatocytes. This small reduction might be due to the moderate paucity of male germline-expressed genes on this chromosome, but since it is non-significant we have not discussed it.

      Reviewer #2 (Public Review):

      Anderson et al profiled chromatin features, including active chromatin marks, RNA polymerase II distribution, and histone modifications in the sex chromosomes of spermatogenic cells in Drosophila. The results are new and the experiments and analyses look well done, including with appropriate numbers of replicates. Results were parsed by comparing them among two arrest mutants and wildtype, as well as in FACS-sorted spermatocytes. The authors also profiled larval wing discs to serve as reference-somatic cells, which allowed them to focus only on features in their testis data that were associated with germ cells. Their results were further refined by categorizing the genes of interest based on available single nucleus RNA seq expression profiles. The authors document interesting phenomena, such as differences in the distribution of RNAPIIS2p on some genes in germ cells vs somatic cells, the presence of a uH2A body beginning in early spermatocytes, and high levels of uH2A on the Y chromosome and little or none on the X. The former is intriguing because this modification is usually associated with silencing, yet the Y chromosome is active in spermatogenic cells. The authors interpret some of their data as implying a lack of dosage compensation of the X chromosome in spermatocytes.

      The data are believable and new, but it is not fully clear how to interpret them. The paper's interpretations rely on subtractive logic to parse results from mixtures of cells down to cell type, extracting spermatogonia, spermatocyte, etc. features by comparing bam mutants (only spermatogonia) to aly mutants (spermatogonia and early spermatocytes but no later stages) to wildtype (all spermatogenic stages), and extracting testis germline data by comparison to wing disc soma; their FACS sorted spermatocytes also have heterogeneity. I recognize that the present paper was a lot of work and am not suggesting that the authors redo their study using methods that give more purity and precision of stage (https://doi.org/10.1126/science.aal3096, https://doi.org/10.1101/gad.335331.119), but they should be aware of them and of their results.

      The pulse-release system that the reviewer points to is an interesting system, but more limited in material and in useable markers than the systems we used here. We have added to our discussion of the the limitations of subtractive comparisons between arrest genotypes, both in regards to using mutants that may alter gene expression programs, and to how subtractive comparisons may limit our detection of differences between cell types (lines 143-147).

      The conclusions about dosage compensation are indirect, but are consistent with the current model documented in the studies cited by the authors, as well as earlier studies (doi: 10.1186/jbiol30).

      We disagree; our data directly speaks to the molecular mechanisms at play. Our profiling of the H4K16acetylation mark and RNAPII in isolated spermatocytes (Figure 4) demonstrates that current models are correct, and so are useful for settling this point in the literature.

      Reviewer #1 (Recommendations For The Authors):

      Throughout the manuscript, it is better to cite the original research papers.

      We have added citations for the original characterizations of bam and aly alleles used, for the descriptions of PCGFP in spermatocytes, and for issues raised by reviewer comments.

      Minor comments:

      Pg.2, line 70-71: "Germline stem cells at the apical tip of the testis asymmetrically divide to birth spermatogonia", should be gonialblast.

      Fixed (line 71).

      Pg.2, line 71: "four rapid mitotic divisions", the spermatogonial cell cycle lasts several hours-- "rapid" is subjective and relative, better to leave this word out.

      Fixed (line 71).

      Reviewer #2 (Recommendations For The Authors):

      Other than the major issue raised in the public review this paper only needs a few minor modifications, listed by line number below. The first one would be considered essential by this reviewer.

      27: In the sentence that ends on this line, please add the word testis after Drosophila.

      Fixed (line 27).

      119: It must be known from the Fly Cell Atlas data whether these genes do begin to express in spermatogonia.

      Collated expression values from the FCA are provided in Supplementary Table 2. In many cases there is detectable expression of these genes in spermatogonia, although transcript abundance peaks in early spermatocytes.

      198: remove "distribution of".

      Fixed (line 200).

      311: enrichment relative to what?

      Fixed (line 313). It is relative to signal in wing discs.

      344: other aspects could be regulated such as elongation, termination.

      We have added caveats to our speculations in this sentence (lines 340-356). The increased signal we see in gene bodies could be due to slower RNAPII elongation, but we don’t see a way that changes in termination would produce this pattern.

      369: This part of the paper seems overly speculative, given the many molecular differences between dosage compensation mechanisms of Drosophila vs mammals, and studies that indicate that MSCI does occur in Drosophila (DOI: 10.3390/genes12111796).

      We disagree, and this is a central point in our manuscript. The paper referred to here does not directly assess MSCI in Drosophila, instead they argue that MSCI could be the force driving the evolutionary depletion of male-germline-expressed genes they describe. These and many studies in the literature have conflated the effects of a lack of X dosage compensation and of MSCI in the male germline. Our direct measurements of RNAPII in spermatocytes demonstrates that there is no dosage compensation nor is there MSCI. Further, profiling of histone modifications associated with Drosophila somatic dosage compensation (H4K16ac) or with mammalian MSCI (uH2A, H3K9me2) show that the molecular mechanisms found in these other settings are not in play in the Drosophila male germline. As we have established these biological differences between mammals and Drosophila, it is appropriate to now speculate on why these differences may be, which we do on lines 374-384.

      (several lines): Can the authors justify their assumption that chromatin features of larval wing disc cells will match those of somatic cells of adult testes?

      We don’t only compare germline features to somatic cells of the wing disc, but also to genes with somatic expression in the testes annotated by FCA expression data (H3K4me2 in Figure 2C, RNAPII in Figure 3F). Note in Supplementary Figure 2 the distribution of RNAPII in whole testes (which includes somatic cells) is similar to that of larval wing discs, confirming that the differences we describe are specific to germline cells.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Point-to-Point Responses to Reviewers’ Comments

      We are a bit surprised by the comments of Reviewer 1, but that our further responses can help communications with Reviewer 1. We have also responded to comments of Reviewers 2 and 3.

      Public Reviews:

      *Reviewer #1 (Public Review):

      The overall tone of the rebuttal and lack of responses on several questions was surprising. Clearly, the authors took umbrage at the phrase 'no smoking gun' and provided a lengthy repetition of the fair argument about 'ticking boxes' on the classic list of criteria. They also make repeated historical references that descriptions of neurotransmitters include many papers, typically over decades, e.g. in the case of ACh and its discovery by Sir Henry Dale. While I empathize with the authors' apparent frustration (I quote: '...accept the reality that Rome was not built in a single day and that no transmitter was proven by a one single paper') I am a bit surprised at the complete brushing away of the argument, and in fact the discussion. In the original paper, the notion of a receptor was mentioned only in a single sentence and all three reviewers brought up this rather obvious question. The historical comparisons are difficult: Of course many papers contribute to the identification of a neurotransmitter, but there is a much higher burden of proof in 2023 compared to the work by Otto Loewi and Sir Henry Dale: most, if not all, currently accepted neurotransmitter have a clear biological function at the level of the brain and animal behavior or function - and were in fact first proposed to exist based on a functional biological experiment (e.g. Loewi's heart rate change). This, and the isolation of the chemical that does the job, were clear, unquestionable 'smoking guns' a hundred years ago. Fast forward 2023: Creatine has been carefully studied by the authors to tick many of the boxes for neurotransmitters, but there is no clear role for its function in an animal. The authors show convincing effects upon K+ stimulation and electrophysiological recordings that show altered neuronal activity using the slc6a8 and agat mutants as well as Cr application - but, as has been pointed out by other reviewers, these effects are not a clear-cut demonstration of a chemical transmitter function, however many boxes are ticked. The identification of a role of a neurotransmitter for brain function and animal behavior has reasonably more advanced possibilities in 2023 than a hundred years ago - and e.g. a discussion of approaches for possible receptor candidates should be possible.

      Again, I reviewed this positively and agree that a lot of cumulative data are great to be put out there and allow the discovery to be more broadly discussed and tested. But I have to note, that the authors simply respond with the 'Rome was not built in a single day' statement to my suggestions on at least 'have some lead' how to approach the question of a receptor e.g. through agonists or antagonists (while clearly stating 'I do not think the publication of this manuscript should not be made dependent' on this). Similarly, in response to reviewer 2's concerns about a missing receptor, the authors' only (may I say snarky) response is ' We have deleted this sentence, though what could mediate postsynaptic responses other than receptors?' The bullet point by reviewer 3 ' • No candidate receptor for creatine has been identified postsynaptically.' is the one point by that reviewer that is simply ignored by the authors completely. Finally, I note that my reivew question on the K stimulation issues (e.g. 35 neurons that simply did not respond at all) was: ' Response: To avoid the disadvantage of K stimulation, we also performed optogenetic experiments recently and obtained encouraging preliminary results.' No details, not data - no response really.

      In sum, I find this all a bit strange and the rebuttal surprising - all three reviewers were supportive and have carefully listed points of discussion that I found all valid and thoughtful. In response, the authors selectively responded scientifically to some experimental questions, but otherwise simply rather non-scientifically dismissed questions with 'Rome was not built in a day'-type answers, or less. I my view, the authors have disregarded the review process and the effort of three supportive reviewers, which should be part of the permanent record of this paper.

      Response:

      We were very surprised by the tone of Reviewer 1 in the second round of reviewing. The corresponding author has spent some time including a long holiday to cool down and re-read our earlier responses. The following is entirely by the correspond author.

      I have finally checked the term “smoking gun”, and found out that I interpreted it wrongly while I had thought that Reviewer 1 was wrong. This came from a long story in that I was lectured by a native speaker for my English when submitting the first paper from my own paper. In that case, the Reviewer was wrong (in arguing that only adjectives but not nouns can be used to define nouns), I was quite offended and remembered it vividly. In the case of “smoking gun”, I wrongly believed that it meant a hint (while the definite evidence would be “the final nail in the coffin”). By interpreting is as a hint, I was then rebutting Reviewer 1 for negating all our experimental results as “not a single piece of suggestive evidence”.

      For the above, I apologize.

      I have another disagreement about “smoking gun”. For a transmitter, multiple criteria have to be met. For example, finding a receptor for a small molecule would not be definitive for a transmitter because if it is not present in the SVs, it is unlikely to be a typical transmitter. If a molecule has a receptor but they are not even in the nervous system, it is definitely no a transmitter.

      The title of our paper is “Evidence suggesting creatine as a new central neurotransmitter”, not “Evidence proving creatine as a new central neurotransmitter”. In the Abstract, after “Our biochemical, chemical, genetic and electrophysiological results are consistent with the possibility of Cr as a neurotransmitter”, we are adding “though not yet reaching the level of proof for the now classic transmitters”. In the last sentence of the introduction, we have now added “though the discovery of a receptor for Cr would prove it”.

      I do, however, believe that, however strong the wordings are, criticisms and rebuttals in science are normal and should be conducted even when emotions are involved.

      One of my major point of differences with at least two of the reviewers is that the criteria for neurotransmitters should be those listed in major textbooks. While everyone can have one’s own opinions, the textbooks, especially those accepted by readers of the field for more than 40 years, should be the standards. Kandel has listed the 4 criteria not only 40 years ago but also just 2 years ago in their latest 6th edition. The reviewers have asked for more, while discounting Kandel et al. (2021). So, in essence, the Reviewer is not shy in scientific criticisms when stating “The identification of a role of a neurotransmitter for brain function and animal behavior has reasonably more advanced possibilities in 2023 than a hundred years ago”.

      Reviewer 1 raised another new criterion: brain function and behavior, while this is not in any textbook lists. However, lack of Cr caused behavioral problems, as cited by us in the introduction: both humans and mice were defective in brain function with loss of function mutations in the gene for the specific Cr transporter SLC6A8. If the reviewer meant behavioral abnormalities caused by Cr injection, that was unclear. But that criterion may not be met by other transmitters which is the likely reason that it was not a criterion in any textbook.

      Reviewer #2 (Public Review):

      Summary:

      Bian et al studied creatine (Cr) in the context of central nervous system (CNS) function. They detected Cr in synaptic vesicles purified from mouse brains with anti-Synaptophysin using capillary electrophoresis-mass spectrometry. Cr levels in the synaptic vesicle fraction was reduced in mice lacking the Cr synthetase AGAT, or the Cr transporter SLC6A8. They provide evidence for Cr release within several minutes after treating brain slices with KCl. This KCl-induced Cr release was partially calcium dependent and was attenuated in slices obtained from AGAT and SLC6A8 mutant mice. Cr application also decreased the excitability of cortical pyramidal cells in one third of the cells tested. Finally, they provide evidence for SLC6A8-dependent Cr uptake into synaptosomes, and ATP-dependent Cr loading into synaptic vesicles. Based on these data, the authors propose that Cr may act as neurotransmitter in the CNS.

      Strengths: 1. A major strength of the paper is the broad spectrum of tools used to investigate Cr. 2. The study provides evidence that Cr is present in/loaded into synaptic vesicles.

      Weaknesses: 1. There is no significant decrease in Cr content pulled down by anti-Syp in AGAT-/- mice when normalized to IgG controls. Hence, blocking AGAT activity/Cr synthesis does not affect Cr levels in the synaptic vesicle fraction, arguing against a Cr enrichment.

      Response: Evidence for Cr enrichment in the SVS was obtained robustly with wild type mice. When brain Cr is very low in AGAT-/- mutant mice, because there is little Cr, there is also little Cr in the SVs. One does not require that as a criterion: it does not argue against the normal levels of Cr could be transported into the SVs even if when the much reduced levels of AGAT-/- Cr in mutant mice could be enriched in SVs.

      1. There is no difference in KCl-induced Cr release between SLC6A8-/Y and SLC6A8+/Y when normalizing the data to the respective controls. Thus, the data are not consistent with the idea that depolarization-induced Cr release requires SLC6A8.

      Response: This comment of Reviewer 2 was based on Figure 5D. But if one carefully examines Figure 5G, it was clear that the Ca++ dependent component of KCl -induced Cr release was lower in SLC6A8-/Y than that in SLC6A8+/Y.

      1. The rationale of grouping the excitability data into responders and non-responders is not convincing because the threshold of 10% decrease in AP rate is arbitrary. The data do therefore not support the conclusion that Cr reduces neuronal excitability.

      Response: Comparison of the same neuron, before and after Cr did show effects on neuronal excitability though that would have no statistics if one does not group multiple cells into the same categories.

      Reviewer #3 (Public Review):

      SUMMARY:

      The manuscript by Bian et al. promotes the idea that creatine is a new neurotransmitter. The authors conduct an impressive combination of mass spectrometry (Fig. 1), genetics (Figs. 2, 3, 6), biochemistry (Figs. 2, 3, 8), immunostaining (Fig. 4), electrophysiology (Figs. 5, 6, 7), and EM (Fig. 8) in order to offer support for the hypothesis that creatine is a CNS neurotransmitter.

      STRENGTHS:

      There are many strengths to this study. • The combinatorial approach is a strength. There is no shortage of data in this study. • The careful consideration of specific criteria that creatine would need to meet in order to be considered a neurotransmitter is a strength. • The comparison studies that the authors have done in parallel with classical neurotransmitters is helpful. • Demonstration that creatine has inhibitory effects is another strength. • The new genetic mutations for Slc6a8 and AGAT are strengths and potentially incredibly helpful for downstream work.

      WEAKNESSES: • Some data are indirect. Even though Slc6a8 and AGAT are helpful sentinels for the presence of creatine, they are not creatine themselves. Of note, these molecules themselves are not essential for making the case that creatine is a neurotransmitter.

      Response: We agree, but those data are not inconsistent with the possibility.

      • Regarding Slc6a8, it seems to work only as a reuptake transporter - not as a transporter into SVs. Therefore, we do not know what the transporter into the TVs is.

      Response: SLC6A8 is not the transporter on the SVs, but is an excellent candidate for the transporter on the presynaptic cytoplasmic membrane for uptake of Cr into the presynaptic structure.

      • Puzzlingly, Slc6a8 and AGAT are in different cells, setting up the complicated model that creatine is created in one cell type and then processed as a neurotransmitter in another. This matter will likely need to be resolved in future studies.

      Response: We agree.

      • No candidate receptor for creatine has been identified postsynaptically. This will likely need to be resolved in future studies.

      Response: We agree.

      • Because no candidate receptor has been identified, it is important to fully consider other possibilities for roles of creatine that would explain these observations other than it being a neurotransmitter? There is some attention to this in the Discussion.

      Response: We agree.

      There are several criteria that define a neurotransmitter. The authors nicely delineated many criteria in their discussion, but it is worth it for readers to do the same with their own understanding of the data.

      By this reviewer's understanding (and combining some textbook definitions together) a neurotransmitter: 1) must be present within the presynaptic neuron and stored in vesicles; 2) must be released by depolarization of the presynaptic terminal; 3) must require Ca2+ influx upon depolarization prior to release; 4) must bind specific receptors present on the postsynaptic cell; 5) exogenous transmitter can mimic presynaptic release; 6) there exists a mechanism of removal of the neurotransmitter from the synaptic cleft.

      Response: While any of us can come up with a list according to our own understanding, the paper copies lists from textbooks, especially from Kandel et al. (2021), which lists the same 4 criteria as Kandel et al. (1983), providing consistency and consensus.

      For a paper to claim that the published work has identified a new neurotransmitter, several of these criteria would be met - and the paper would acknowledge in the discussion which ones have not been met. For this particular paper, this reviewer finds that condition 1 is clearly met.

      Conditions 2 and 3 seem to be met by electrophysiology, but there are caveats here. High KCl stimulation is a blunt instrument that will depolarize absolutely everything in the prep all at once and could result in any number of non-specific biological reactions as a result of K+ rushing into all neurons in the prep. Moreover, the results in 0 Ca2+ are puzzling. For creatine (and for the other neurotransmitters), why is there such a massive uptick in release, even when the extracellular saline is devoid of calcium?

      Response: Classic transmitters are released in a Ca++ dependent manner when stimulated by KCl, though they also had a Ca++ independent component as also shown in our Figure 5 E and F.

      Condition 4 is not discussed in detail at all. In the discussion, the authors elide the criterion of receptors specified by Purves by inferring that the existence of postsynaptic responses implies the existence of receptors. True, but does it specifically imply the existence of creatinergic receptors? This reviewer does not think that is necessarily the case. The authors should be appropriately circumspect and consider other modes of inhibition that are induced by activation or potentiation of other receptors (e.g., GABAergic or glycinergic).

      Response: Kandel et al. did not list this.

      Condition 5 may be met, because authors applied exogenous creatine and observed inhibition. However, this is tough to know without understanding the effects of endogenous release of creatine. if they were to test if the absence of creatine caused excess excitation (at putative creatinergic synapses), then that would be supportive of the same. Nicely, Ghirardini et al., 2023 study cited by the reviewers does provide support for this exact notion in pyramidal neurons.

      Response: For most commonly accepted transmitters, this criterion has never been met. For example, the simplest case would be ACh at the neuromuscular junction. Howver, we have now found that choline is clearly present in SVs. So, how does anyone be sure that only ACh is released only, or how does anyone rule out effects of choline on postsynaptic cells when cholinergic neurons are stimulated?

      Many synapses are now known to release more than one transmitter, making it difficult to define the effect of one transmitter released endogenously.

      These are perhaps reasons why some textbooks do not emphasize similarities of endogenously released vs exogenously applied molecules.

      For condition 6, the authors made a great effort with Slc6a8. This is a very tough criterion to understand or prove for many synapses and neurotransmitters.

      Response: SLC6A8 is a transporter on the cytoplasmic membrane, thus a good candidate for removal of Cr from the synaptic cleft.

      In terms of fundamental neuroscience, the story should be impactful. There are certainly more neurotransmitters out there than currently identified and by textbook criteria, creatine seems to be one of them taking all of the data in this study and others into account.

      Response: We hope that more will join our lonely efforts in trying to discover more transmitters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Since the authors largely disregarded questions in the review process, I do not see a point in listing recommendation for the authors again.

      Reviewer #2 (Recommendations For The Authors):

      1. The different sections of the manuscript are not separated by headers.

      Response: We do have separate subheadings.

      1. The beginning of the results section either does not reference the underlying literature or refers to unpublished data.

      Response: We have a very long introduction which was criticized for being too long and with too much historical citations. We therefore refrained from citation again in the beginning part of the Results section.

      1. The text contains many opinions and historical information that are not required (e.g., "It has never been easy to discover a new neurotransmitter, especially one in the central nervous system (CNS). We have been searching for new neurotransmitters for 12 years."; l. 17).

      Response: We would like to keep these because most readers are young and do not know the history and difficulties of discovering transmitters.

      1. Almeida et al. (2008; doi: 10.1002/syn.20280) provided evidence for electrical activity-, and Ca2+-dependent Cr release from rat brain slices. This paper should be introduced in the introduction.

      Response: Done.

      1. Fig. 7: A Y-scale for the stimulation protocol is missing.

      Response: Done.

      Reviewer #3 (Recommendations For The Authors):

      The main suggestion by this reviewer (beyond the details in the public review) was to consider the full spectrum of biology that is consistent with these results. By my reading, creatine could be a neurotransmitter, but other possibilities also exist. The authors have highlighted some of those for their Discussion.

    1. Author Response

      The following is the authors’ response to the previous reviews

      eLife assessment

      The manuscript offers important findings on the potential influence of maternally derived extracellular vesicles on embryo metabolism. However, while the content is convincing, the title appears to overstate the study's conclusions due to its speculative nature on the DNA transmission and embryo bioenergetics connection. A more measured title would better represent the evidence presented.

      We want to extend our heartfelt appreciation to the editors and reviewers for their invaluable comments on our research. Their feedback has played a crucial role in improving the quality of our manuscript.

      We acknowledge the concern regarding the manuscript's title and are fully open to making modifications. Following the recommendation of Reviewer 2, the proposed new title of the manuscript will be “Vertical transmission of maternal DNA through extracellular vesicles associates with altered embryo bioenergetics during the periconception period.”

      Reviewer #1 (Public Review):

      Q1. Bolumar et al. isolated and characterized EV subpopulations, apoptotic bodies (AB), Microvesicles (MV), and Exosomes (EXO), from endometrial fluid through the female menstrual cycle. By performing DNA sequencing, they found the MVs contain more specific DNA sequences than other EVs, and specifically, more mtDNA were encapsulated in MVs. They also found a reduction of mtDNA content in the human endometrium at the receptive and post-receptive period that is associated with an increase in mitophagy activity in the cells, and a higher mtDNA content in the secreted MVs was found at the same time. Last, they demonstrated that the endometrial Ishikawa cell-derived EVs could be taken by the mouse embryos and resulted in altered embryo metabolism.

      This is a very interesting study and is the first one demonstrating the direct transmission of maternal mtDNA to embryos through EVs.

      A1. Thank you for your kind comments.

      Reviewer #2 (Public Review):

      Q2. In Bolumar, Moncayo-Arlandi et al. the authors explore whether endometrium-derived extracellular vesicles contribute DNA to embryos and therefore influence embryo metabolism and respiration. The manuscript combines techniques for isolating different populations of extracellular vesicles, DNA sequencing, embryo culture, and respiration assays performed on human endometrial samples and mouse embryos.

      Vesicle isolation is technically difficult and therefore collection from human samples is commendable. Also, the influence of maternally derived DNA on the bioenergetics of embryos is unknown and therefore novel. However, several experiments presented in the manuscript fail to reach statistical significance, likely due to the small sample sizes. This manuscript is a good but incomplete start as to the potential function of maternal DNA transfer via vesicles.

      In my opinion the manuscript supports the following of the authors' claims:

      1. Different amounts of nDNA and mtDNA are shed in human endometrial extracellular vesicles during different phases of the menstrual cycle.
      2. Endometrial microvesicles are more enriched for mitochondrial DNA sequences compared to other types of vesicles present in the human samples.
      3. Fluorescently labelled DNA from extracellular vesicles derived from an endometrial adenocarcinoma cell line can be incorporated into hatched mouse embryos.
      4. Culture of mouse embryos with endometrial extracellular vesicles can influence embryo respiration and the effect is greater when cultured with isolated exosomes compared to other isolated microvesicles.

      My main concerns with the manuscript:

      1. Several experiments presented fail to reach statistical significance or are qualitative.
      2. The definitive experiments presented in the manuscript are limited to the transfer of DNA in general not mtDNA. Therefore a strong connection with metabolism is missing, diminishing the significance of the findings.

      A2. We thank you for your detailed feedback. While we acknowledge the reviewer's concerns regarding sample sizes, we emphasize that this study was intentionally designed as a pilot study and was approved by the IRB with a specific sample size to serve as proof of concept. We fully agree that further research is essential for a more comprehensive understanding of the novel biological process described in this manuscript. When this manuscript is finally accepted, we can submit a new IRB application to obtain a larger sample size, allowing us to delve deeper into demonstrating the connection with metabolism

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Q3. The authors have made significant improvements, and the manuscript now is appropriate for eLife.

      A3. Thank you for your consideration.

      Reviewer #2 (Recommendations For The Authors):

      The authors have made several changes that have improved the manuscript. However, I still have some concerns.

      Q4. The title is still too definitive. Something like "Vertical transmission of maternal DNA through extracellular vesicles is associated with changes in embryo bioenergetics during the periconception period" would be more appropriate.

      A4. As mentioned earlier in the response to the editors, we acknowledge the concerns regarding the manuscript's title.

      Following your recommendation, the proposed new title of the manuscript is “Vertical transmission of maternal DNA through extracellular vesicles associates with altered embryo bioenergetics during the periconception period.”

      Q5. I am confused by the incorporation of the new experiment (supplementary figure 7) where embryos are cultured in free-floating synthesized mtDNA. If these sequences were not encapsulated in vesicles I don't think the experiment is relevant. If they were similarly prepared as in the section "Tagged-DNA production and EV internalization by murine embryos" I stand corrected but please clarify or omit. Otherwise, the new data/figure in response to Q11 showing co-localization of mitochondria and EdU-tagged DNA from MVs from Ishikawa cells is more compelling. However, this doesn't separate the uptake of mtDNA alone from the potential uptake of mitochondria, which this manuscript is not focused on.

      A5. We apologize for any confusion that may have arisen for the reviewer. We conducted this experiment in response to question Q4 posed by the same reviewer, which specifically inquired about the detection of internalized mtDNA by the embryos.

      As previously stated in the revised manuscript, the EdU system does not selectively label mtDNA; instead, it labels any newly synthesized DNA, both nuclear and mitochondrial. We have not found a system that specifically labels mtDNA for subsequent tracing inside EVs or for encapsulation within artificial EVs (which falls outside our expertise). Therefore, we employed labeled mtDNA that we could trace after the embryos' internalization.

      While we acknowledge that this approach is not perfect, it does demonstrate the internalization of mtDNA sequences within the embryo. We have revised the manuscript to eliminate any potential sources of confusion. If the reviewer or editors still have concerns about the experiment's suitability, we are open to removing it from the final version of the manuscript. Please refer to page 9 and lines 234-238 for more details."

    1. Author Response

      The following is the authors’ response to the original reviews.

      General comments:

      To reviewer 1 and 3: The following sentences below were added at the beginning of the result section to clarify that the Gr gene expression analysis was performed using bimodal expression systems and to provide a reference that these expression profiles can generally be expected to represent endogenous Gr expression.

      "Note that this and all previous Gr expression studies were performed using bimodal expression systems, mostly GAL4/UAS, whereby Gr promotors driving GAL4 are assumed to faithfully reproduce expression of the respective Gr genes. Importantly, we analyzed two or more Gr28-GAL4 insertion lines for each transgene, and at least two generated the same expression profiles (Mishra et al., 2018; Thorne and Amrein, 2008) providing evidence that the drivers reflect a fairly accurate expression profile of respective endogenous genes."

      Specific comments:

      Reviewer #1 (Recommendations For The Authors):

      The important chemogenetic behavioral data would benefit from a clearer presentation including a cartoon to explain what the behavior is and how it is scored. Figure 2 is the key figure in this paper and it would be helpful if the figure were reorganized to guide the non-expert reader to the key result. I recommend labeling the positive controls Gr43a as "sweet" and Gr66a as "bitter" and perhaps organize the presentation to have the negative control at the left, then Gr28ba that had no effect, then group Gr28a with Gr43a for positive valence and Gr28bc with Gr66a for negative valence. I'm not sure what the value is of showing both 0.1 mM and 0.5 mM capsaicin, the text does not explain. The experiment in Figure 2B is important but non-experts will not understand what is being done here - can the authors please provide a cartoon like those in Figure 1 showing what cells are being subjected to chemogenetics and how this differs from Figure 2A?

      The reviewer is correct that much can be improved, which we hope to have accomplished with the modifications in Figure 2. We re-organized it to deliver the key result to non-expert readers in an easy way. We added cartoons both explaining how the two-choice preference assays were conducted and indicating which cells express UAS-VR1. The cartoon in Figure 1E and Figure 2A are now directly relatable and should clarify what cells express VR1 (in Figure 2). Positive and negative control experiments using Gr43aGAL4 (a GAL4 knock-in; Miyamoto et al., 2013) and Gr66a-GAL4 are highlighted in the Figure and mentioned upfront in the text to make clear to what the experimental larvae can be compared. We also excluded larvae responses to 0.5 mM capsaicin.

      1. The AlphaFold ligand docking in Figure 8 is conducted with Gr28bc monomers, which are unlikely to be the in vivo relevant structure, given that the related OR/ORCO ancestor structures are tetramers. I recommend that this component of the paper either be removed entirely or that the authors redo the in silico work using the AlphaFold-Multimer package reported by Hassabis and Jumper in 2022 https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2. It will be interesting to see what a tetramer structure looks like with the ligand.

      We tried but were able to use the recommended package. Even if it were, the problem is that we do not know the partner of Gr28b.c. And while it is not clear whether and how extensive changes in the ligand binding pockets occur when using the monomer prediciton vs a multimer package, we followed the reviewer’s suggestion and removed the modeling from the manuscript.

      Minor points:

      1. Line 80: I do not think it is biophysically or biochemically plausible that GRs and IRs would assemble into functional heteromeric channels and suggest that the authors either explain how that would work or remove this speculative comment.

      We have removed this sentence.

      1. Line 246-248: I would tone down the speculation about GR subunit composition - it's still too early days to understand the stoichiometry or the extent that any of the broadly expressed GRs is a co-receptor.

      We did not indulge in the possible stoichiometry of Gr complexes, but merely mention that they are composed in general of two or more Gr subunits, for which clear genetic evidence exists: Up to three different putative bitter Gr genes are necessary to elicit responses to bitter compounds, and at least two putative sugar Gr genes are necessary to restore behavioral responses to any sweet tasting chemicals (sugars). Regardless, we have toned down the language, stating now:

      “Given the multimeric nature of bitter taste receptors (Sung et al., 2017), one possibility is that the absence of a Gr subunit not required for the detection of denatonium (Gr66a) could favor formation of multimeric complexes containing Gr subunits that recognize this compound (Gr28b.a and/or Gr28b.c).”

      1. Line 284: I don't think that co-expression necessarily means that GRs form heteromultimeric channels. It's equally possible that the cell controls subunit assembly to avoid mixing and matching ligand-selective subunits at will. I would tone this down - it's still speculative at this stage. We don't even know yet how this works for OR-Orco, where we do have structures. There is not yet an OR-Orco Cryo-EM structure, so we do not know what the subunit stoichiometry is.

      We are not sure what the reviewer’s concern is. While direct biochemical or biophysical evidence is currently lacking, there is strong genetic evidence for heteromeric composition of Gr complexes, both from studies of bitter and sweet receptors/neurons (see response above). It is likely that intrinsic properties facilitate assembly of certain Grs within a taste receptor complex. We have refrained from any speculation about stoichiometry, though given the relatedness of Grs and Ors, it would not be far-fetched to propose that taste receptor complexes are also tetrameric in nature, which was recently proposed for a homomeric channel of the bombyx mori homolog of Gr43a, BmGr9 (Morinaga et al., 2022).

      1. Line 305: the work of Emily Troemel and Cori Bargmann PMID: 9346234 should be cited in the Discussion. Theirs was the first experiment to show that valence was a feature of the neuron and not the receptor(s) it expresses.

      We have now cited this work in the discussion to acknowledge this important discovery.

      1. Figure 1 - the clarity of the organization of the figure could be improved for non-experts. For instance, can the key for the abbreviations be written out at the right of Figure 1A? Second, it is confusing to talk about DOG/TOG neurons "projecting" to the DO/TO - I think the authors mean dendritic innervation, not axons projecting. Maybe having a diagram that cartoons a closeup of the DOG/TOG neurons and how they innervate the cuticular structures would make this clearer. I struggled to go from the pretty staining at the left of B and C to the schematics at the right that colored in which neurons express which receptors.

      We appreciate these comments regarding clarity and have amended Figure 1 and made necessary changes in the text and the Figure legend.

      1. Figure 3 would benefit from a summary cartoon relating back to the cartoons in Figure 1 to summarize what neurons the authors think are necessary for bitter avoidance.

      We very much appreciate this suggestion and have increased clarity by referring to the carton in Figures 1 and 2.

      1. Figure 4B - the lowercase letters indicating Gr28 subunits that are being expressed under UAS control (bottom row of table "UAS-Gr28") are easily confused for the lowercase letters a, b used throughout to signify significant differences. I recommend that the authors write out the gene names in this figure to clarify the genes in the rescue experiment.

      We changed the text in the Figure accordingly.

      1. For non-experts it would be helpful to have a map of the Gr28 gene locus so that people understand the arrangement of the genes and how the Gal4 driver lines map onto the locus.

      We have now included such a map in Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      1. In the title and multiple times in the text (e.g. lines 121-122), the authors make the claim that different Gr28 genes mediate opposing behaviors. At first, I was not convinced of this claim, but I now believe it may be warranted if integrating the present results with results from Mishra et al., 2018. In the present study, the authors show that different neurons drive opposing behaviors, but they did not show that the genes themselves mediate opposing behaviors. They show evidence for the role of Gr28bc and Gr28ba in aversion, but not the role of Gr28a in attraction. I was thinking that there could be other receptors in Gr28a-expressing neurons that mediate attraction. However, Mishra et al. showed that mutation of all Gr28 genes abolishes preference for RNA/ribose as well as detection of these compounds by Gr28a+ neurons of the terminal organ, an impairment that could be rescued by expressing Gr28a (although Gr28b genes seem to have similar functions), and the present study shows that the other Gr28 genes are not co-expressed with Gr28a in the terminal organ. Is this the line of reasoning that we must take to come to the conclusion in the title? If so, I don't believe it comes through clearly in the paper.

      We appreciate this observation. We have modified language in the abstract and the introduction to reflect previous reports of Gr28a as an RNA/ribose receptor (Mishra et al., 2018) and its conversation across dipteran insects (Fujii et al., 2023) where we showed that appetitive behavior for RNA can be mediated via the mosquito homologs in transgenic Drosophila larvae. The reviewer is correct in that there are other appetitive neurons, namely those expressing Gr43a, which defines a set distinct from and non-overlapping with Gr28a neurons (Mishra 2018). This additional information is included in the Figure 1, summarizing expression of the Gr28 genes, Gr66a and Gr43a.

      1. The Figure 6 schematic does not show Gr66a+ Gr28- cells as being connected to avoidance behavior. This seems misleading because it seems likely that these cells do promote avoidance (based on known functions of other Gr66a cells). Also, it is not clear what the red dashed line represents.

      The Gr66a neurons are indeed also avoidance mediating, but it is not clear which subgroup of these neurons is necessary. Our analysis in Figure 2 using Gr28b.c driving Kir2.1 suggests that a small subset of Gr66a neurons is sufficient to mediate avoidance. It is, however, possible that other subsets not including Gr28b.c can also mediate avoidance. The figure has been modified accordingly, as has the model in Figure 7.

      1. I would suggest including the description of Figures 7-8 in the Results instead of the Discussion. In Figure 8, it would be helpful to superimpose labels for the transmembrane domains and extracellular/intracellular sides to better interpret the models.

      The modeling was removed from the manuscript (see response above to reviewer 1).

      1. The finding that Gr66a mutants show increased denatonium and quinine avoidance (Figure 4 - figure supplement 1) seems like a non sequitur, as it does not relate to the analysis of Gr28 genes. I support the inclusion of these interesting results, but perhaps it could be stated why this experiment was conducted (e.g. as a positive control).

      We have reworded this section to make clear why Gr66a mutants were tested (possibly being part of a denatonium receptor complex).

      1. An introduction to the nomenclature and gene structure for the Gr28 genes would be helpful. It's not clear how they're all related, e.g. that the Gr28b genes share some exons whereas Gr28a is separate. The Results section alludes to "the high level of similarity between these receptors", and some sort of reference or quantification for this statement would be useful. I also think naming the Gr28b genes with a period (e.g. "Gr28b.c") may be more consistent with the literature.

      We have added the structure of the Gr28 genes in the Figure 1B, which was also a suggestion by reviewer 1, and we have amended the naming of the genes.

      1. Lines 79-80 state "some GRNs express members of both families", but no citation is provided.

      As this sentence was deleted, based on a comment by reviewer 1, this point becomes mute.

      1. There are several typos or grammatical mistakes that the authors may wish to correct (e.g. lines 73, 75, 91, 232, 334, 780, 788).

      We appreciate the reviewer pointing these errors out to us. The mistakes were corrected.

      Reviewer #3 (Recommendations For The Authors):

      • Silencing experiments suggest a role for Gr28bc in the avoidance of quinine (Figure 3), while imaging experiments do not support this role (Figure 5G). An explanation is needed to reconcile these findings.

      The imaging experiments do support a role for Gr28b proteins in quinine detection in the specific TOG GRN used for all live imaging (Figure 5). This GRN in DGr28 larvae has a significantly lower Ca2+ responses to quinine compared to controls. However, the Ca2+ response could not be rescued to wild type levels by supplementing single Gr28b subunits, suggesting multiple Gr28b proteins are present in a quinine specific receptor complex in this GRN. Also note that Ca2+ responses of DGr28 larvae to quinine is not completely abolished, suggesting some redundancy, possible via Gr33a (Apostolopoulou et al., 2014), also supported by DGr28 larvae, which have still a robust avoidance to quinine. We are confident we have been clearer in arguing this point, both the result and especially the discussion section.

      • Silencing experiments specifically targeted neurons expressing Gr28bc and Gr28be (Figure 3). It is important to note why other neurons expressing different members of the Gr28 family were not included in this analysis.

      • Inconsistency is observed in the use of different reagents across the experiments. Specifically, all six Gal4 lines were utilized in the Chemical Activation experiments, while only two lines were employed in the silencing experiments.

      The silencing experiments asked the specific questions as to what neurons are necessary for avoidance of bitter chemicals. Gr28a-GAL4 and Gr28b.a-GAL4 neurons were omitted because the former mediate feeding preference and not avoidance, and the latter is expressed in the same neurons as Gr28b.e (Figure 1). The remaining two Gr28b genes, Gr28b.b-GAL4 and Gr28b.d-GAL4 are not expressed in the larval taste system (Mishra et al., 2018) as we stated in the introduction/result section, and they were therefore not included in the chemogenetic or Kir2.1 inactivation experiments. We included these genes in rescue experiments, simply to test whether or not they can restore function for sensing denatonium.

      As for the chemogenetic activation experiments: two of the GAL4 lines are controls (Gr66a-GAL4 and Gr43GAL4), that were needed to show what can be expected from these experiments.

      • The authors did not acknowledge that neurons expressing members of the GR28 family also express other Gr family members, which could potentially contribute to the detection and behavioral responses to the tested bitter compounds.

      We believe we did, but we have made that much more explicit in the revised manuscript.

      • Gal4 lines from various studies exhibit varying expression patterns, highlighting the necessity for improved reagents. These findings also suggest the importance of employing different Gal4 lines for each receptor to validate the results of the current study.

      See response at the beginning of our rebuttal.

      • Activating or silencing neurons pertains to the function of the neurons rather than the receptors.

      We agree and nothing in the manuscript states otherwise.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REPLIES TO REVIEWERS

      For instance, The DynaMut2 and thermal shift assays point towards less stable variants than wild type, with Tm values slightly lower. On the other hand, the Kd value of variants reported stronger binding of NSP10 with NSP16. How do authors explain this, as the change due to point mutation may not fall within error range?

      Concerning the lower Tm values for the mutants compared to wild type NSP10, the errors of the measurements conducted in triplicate are very low (0.1 degrees) indicating that they do not fall into the error range, in particular as the changes in Tm are significant with changes of up to 4 degrees. This is consistent with the DynaMut23 calculations. Furthermore, the differences in Kd values between wild type and mutants are partially significant. Whereas one of the mutants did not display any changes in Kd value. Compared to wild-type NSP10 for both NSP14 and NSP16, the other show a 2 to 3 fold better Kd, with reasonable errors and we consider those as small but significant, and not within error range.

      For instance, the conformational ensemble could be utilized for docking with NSP16 and NSP14. There could be a potential alternative pathway for explaining the above changes in Kd. This should be attempted for understanding the role in its functional activity.

      We agree with the reviewer. We are working on a follow up manuscript exclusively looking into the NSP10-NSP14/16 interfacial interactions. Our preliminary results from biophysical and biochemical analysis suggests a range of Kd values observed between the mutants and the NSP14/NSP16. We are also investigating changes in the interfacial interactions via crystallography.

      Therefore, more quantitative analysis is required to explain structural changes. The free energy landscape reported in the paper may not capture rare transition events or slight rearrangements in side chain dynamics, both these could offer better understanding of mutations.

      We agree with the point raised by the reviewer. As mentioned above, we are exclusively looking into these interfacial interactions and binding between different partners, which will be reported in a follow up manuscript.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      1. Line 206, V104 need to be corrected to A104.

      done

      1. Line333, does it mean the Kd value of NSP10 binding to NSP16 similar to the Kd value of binding to NSP14?

      Yes. Overall, they are in about the same range with a Kd value of around 1 µM for the NSP10-NSP16 complex and 4 µM for the NSP10-NSP14 complex.

      1. Figure 3, the colors corresponding to different variants or native NSP10 could be consistent for easier reading and understanding.

      The colors have been edited.

      1. The data presented in Figure 3d are not clear enough to draw conclusions about the Kd Value in the main text.(Values of variants are smaller than that of wild-type NSP10, indicating a slightly stronger binding to NSP16)

      The measured differences are small with 2 to 3 fold differences, but significant and are not within the error range as can be derived from the data and calculated Kd values and their errors.

      1. Are there other mutations in the sequence with the top 3 mutations? If yes, is it possible to do the same experiments with that protein? Why not choose the NSP10 of the popular strain for the determination of the binding ability to NSP14 and NSP16.

      No, the top three were single point mutations.

      1. Enzyme activity assays like ExoN activity detection of NSP14 and vitro activity detection of NSP16 2′-O-MTase could be performed to characterize the effect of these three mutations on biological function.

      Yes, it would be good to consider these. We are considering these assays in the follow up manuscript as mentioned above.

      1. More details on image acquisition and writing errors need to be clarified and corrected.

      Done.

      1. Typo in Results section T12, T102, V104 should be A104

      Done.

      1. DynaMut analysis is extrapolated to explain that "Mutation to a hydrophobic side chain such as Ile, results in a loss of this interaction." There is no data to support this as complexes have not been studied. Perhaps this is speculative at best.

      We have changed this sentence to “Mutation to a hydrophobic side chain such as Ile, is predicted to result in the loss of this interaction”, since this was a prediction

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: Hansen et al. dissect the molecular mechanisms of bacterial ice nucleating proteins mutating the protein systematically. They assay the ice nucleating ability for variants changing the R-coils as well as the coil capping motifs. The ice nucleation mechanism depends on the integrity of the R-coils, without which the multimerization and formation of fibrils are disrupted.

      Strengths: The effects of mutations are really dramatic, so there is no doubt about the effect. The variants tested are logical and progressively advance the story. The authors identify an underlying mechanism involving multimerization, which is plausible and compatible with EM data. The model is further shown to work in cells by tomography.

      Weaknesses: The theoretical model presented for how the proteins assemble into fibrils is simple, but not supported by much data.

      Agreed. This theoretical INP multimer model was introduced to promote discussion and elicit ideas on how to prove or disprove it. The length and width of the fibres are defined by cryo-ET results, in which the narrow width is just sufficient to accommodate a dimer of the INPs, and the long length requires that several INPs are joined end to end. Their antiparallel arrangement produces identical ends to the dimer and avoids steric clash of the C-terminal cap structures as well as the C-terminal GFP tag. This model can accommodate the wide range of INPs lengths seen in nature (due to different numbers of water-organizing coils) and introduced in mutagenesis experiments (Forbes et al. 2022). It defines a critical role for the R-coil subdomain in joining the dimers together and explains why this region cannot be shortened by more than a few coils either in nature or by experimentation.

      In response to specific criticisms of the model (Fig. 9), we have redesigned this to be less schematic and to incorporate several copies of the AlphaFold-predicted structure.

      Reviewer #2 (Public Review):

      Summary:

      This paper further investigates the role of self-assembly of ice-binding bacterial proteins in promoting ice-nucleation. For the P. borealis Ice Nucleating Protein (PbINP) studied here, earlier work had already determined clearly distinct roles for different subdomains of the protein in determining activity. Key players are the water-organizing loops (WO-loops) of the central beta-solenoid structure and a set of non-water-organizing C-terminal loops, called the R-loops in view of characteristically located arginines. Previous mutation studies (using nucleation activity as a read-out) had already suggested the R-loops interact with the WO loops, to cause self-assembly of PbINP, which in turn was thought to lead to enhanced ice-nucleating activity. In this paper, the activities of additional mutants are studied, and a bioinformatics analysis on the statistics of the number of WO- and R-loops is presented for a wide range of bacterial ice-nucleating proteins, and additional electron-microscopy results are presented on fibrils formed by the non-mutated PbINP in E coli lysates.

      Strengths:

      -A very complete set of additional mutants is investigated to further strengthen the earlier hypothesis.

      -A nice bioinformatics analysis that underscores that the hypothesis should apply not only to PbINP but to a wide range of (related) bacterial ice-nucleating proteins.

      -Convincing data that PbINP overexpressed in E coli forms fibrils (electron microscopy on E coli lysates).

      Weaknesses:

      -The new data is interesting and further strengthens the hypotheses put forward in the earlier work. However, just as in the earlier work, the proof for the link between self-assembly and ice-nucleation remains indirect. Assembly into fibrils is shown for E coli lysates expressing non-mutated pbINP, hence it is indeed clear that pbINP self-associates. It is not shown however that the mutations that lead to loss of ice-nucleating activity also lead to loss of self-assembly. A more quantitative or additional self-assembly assay could shine light on this, either in the present or in future studies.

      The control cryo-ET experiment where the R-coils were deleted and INP fibres were not seen is consistent with a link between the loss of ice-nucleating activity and the loss of self-assembly. However, we agree that a more direct measurement of the physical state of INP molecules is needed to prove the link.

      -Also the "working model" for the self-assembly of the fibers remains not more than that, just as in the earlier papers, since the mutation-activity relationship does not contain enough information to build a good structural model. Again, a better model would require different kinds of experiments, that yield more detailed structural data on the fibrils.

      Reviewer #1 also raised these criticisms of the model, which we have responded to (above). Testing the model is a focus of our continuing experiments on INPs.

      Reviewer #3 (Public Review):

      Summary: in this manuscript, Hansen and co-authors investigated the role of R-coils in the multimerization and ice nucleation activity of PbINP, an ice nucleation protein identified in Pseudomonas borealis. The results of this work suggest that the length, localization, and amino acid composition of R-coils are crucial for the formation of PbINP multimers.

      Strengths: The authors use a rational mutagenesis approach to identify the role of the length, localisation, and amino acid composition of R-coils in ice nucleation activity. Based on these results, the authors hypothesize a multimerization model. Overall, this is a multidisciplinary work that provides new insights into the molecular mechanisms underlying ice nucleation activity.

      Weaknesses: Several parts of the work appear cryptic and unsuitable for non-expert readers. The results of this work should be better described and presented.

      In revising the manuscript for reposting we have rewritten sections to make it more accessible to the non-expert. Incorporating the detailed recommendations of the reviewers has been helpful in this effort.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Introduction: Curiously, there is no mention at all in the introduction of what the biological function of these ice-nucleating proteins is.

      We added the following text to the first paragraph of the Introduction: ”INP-producing bacteria are widespread in the environment where they are responsible for initiating frost (4) and atmospheric precipitation (5). As such, these bacteria play a significant role in the Earth’s hydrological cycle and in agricultural productivity.”

      Line 70: TXT, SLT, and Y motifs are mentioned, but only the first is described. Also, TXT name alternates between TXT and TxT in the manuscript. (I think the latter is more correct).

      These putative water-organizing motifs are introduced in the preceding paper (new ref 8). We now use TxT consistently throughout the manuscript and have converted SLT to SxT because L is an inward-pointing residue that is not directly involved in water organization.

      Line 236: A construct with repeats deleted is tested for thermostability, but it is not really explained what hypothesis this experiment is supposed to test.

      This is an observation that adds information about the stability of the INP multimers and will need to be explained by the structure.

      Line 267: The authors test a mutant where the N-terminal coil is disrupted and find a big effect. Nevertheless, no conclusion is drawn. What does this result mean?

      On the contrary, INP activity is not appreciably affected by N-terminal deletion.

      Line 269: The CryoEM begins rather abruptly with technical details. Consider introducing the paragraph with a brief statement about what you want to investigate. Also, the analysis seems a little half-hearted.

      Given that the authors describe other EM studies of fibrils of the same protein it would be nice with a clear statement about what is new in their study and how it compares to previous studies.

      We have added this statement about why we used Cryo-EM: “The idea that INPs must assemble into larger structures to be effective at ice nucleation has persisted since their discovery (6). In the interim the resolving power of cryo-EM has immensely improved. Here we elected to use cryo-electron tomography to view the INP multimers in situ and avoid any perturbation of their superstructure during isolation.”

      Fig. 7B: Single-letter amino acid codes are always capitalized.

      We have revised this figure to use capital letters for the amino acids.

      Fig. 9: This figure is really hard to read even though it is very simplistic. I would consider making a figure with several copies of the AlphaFold model instead. Especially panel D, I do not know what is supposed to show.

      We have followed this advice and have completely revised the figure using copies of the AlphaFold model. Panel D (now C) shows two cross-sections through the AlphaFold model.

      Line 355 onwards: The model of the INP is the weakest part of the manuscript. This reviewer considers that the model is crude and it is unclear what information the model is supported by. The authors might want to consider running an AlphaFold multimer to get a better model of at least the dimer.

      Our objective now is to validate or disprove the model by experimentation using protein-protein cross-linking in conjunction with mass spectrometry, and higher resolution cryo-EM methods.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest more frankly discussing the weaknesses mentioned in my public review, as well as approaches that could be used in the future to address these.

      In the cryo-ET analysis, INP mutations of the R-coils that lead to loss of ice-nucleating activity fail to show fibres in the bacteria (Fig. S4), which is consistent with the loss of self-assembly. We are working on physical methods that can assess the degree of assembly of the different INP constructs and mutations. We are working to validate and improve the working model of INP multimers.

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      Line 18. Below 0 Celsius should be < 0 {degree sign}C.

      Done

      Line 25. E. coli should be Escherichia coli

      Done

      Line 29. E. coli should be in italics.

      Done

      Introduction

      The introduction is weak and not suitable for non-expert readers. Moreover, in some parts it is cryptic and it is not clear whether the authors are describing INP in general or PbINP. The introduction should be reorganized to highlight the novelty of this paper compared to Forbes et al. 2022.

      The changes we have made to the Introduction can be seen in the ‘documents compared’ version where the changes are tracked.

      Line 45. It is unclear whether this paragraph is a result reported in the literature or the result of this work. Please clarify.

      These are results reported in the literature as indicated by the references cited in the paragraph.

      Line 54. It is not clear whether this paragraph describes PbINP or INP in general.

      This paragraph begins with INPs in general and then focuses on PbINP.

      Results

      Line 109. This section would benefit from a paragraph in which the authors describe the rationale for this bioinformatic analysis.

      We added the following Statement: “A bioinformatic analysis of bacterial INPs was undertaken to identify their variations in size and sequence to understand what is common to all that could guide experiments to probe higher order structure and help develop a collective model of the INP multimer.”

      Some information is needed on the selected sequences such as sequence identity, what do the authors mean by nr database?

      The abbreviation nr has been replaced by ‘non-redundant’. As explained in that same paragraph the sequences selected were those from long-read sequences that could be relied on to accurately count the number of solenoid coils.

      Line 144. The standard deviation is necessary to understand whether these differences are statistically significant.

      These have been added as p values.

      Figure 2. I noticed that the authors used GFP-tagged PbINP. Why? In addition, panel C is never mentioned in the manuscript.

      The GFP tag was used to confirm expression of the PbINP in E. coli. We have added this sentence: “As previously described these constructs were tagged with GFP as an internal control for INP production, and its addition had no measured effect on ice nucleation activity (8).”The GFP tag was also useful as in internal control for the heat denaturation experiments featured in Fig. 6, where it lost its fluorescence between 65 and 75 °C.

      Fig. 2C is now cited alongside Fig. 2B.

      Figure 3. In my opinion, the results of the R-coil deletion should also be shown in Figure 2. Line 171. This section is cryptic. A logo sequence or an alignment of WO-coils and R-coils of PbINP could be helpful for the reader. Instead of the architecture of the whole protein, it would be useful to have the sequence of the R-coils with the residues that the authors mutagenised.

      The logo sequences are available in Fig. 1.

      Line 202. Here, the authors describe a new experimental setup. As the Materials and Methods section follows the Discussion, the authors should state in the first paragraph of the Results section that IN activity was measured on whole cells.

      We have now modified the introductory sentence to read: “Ice nucleation assays were performed on intact E. coli expressing PbINP to assess the activity of the incremental replacement mutants.”

      Line 202. The authors investigated the effects of pH and temperature (Line 223) on the IN activity. The authors should better introduce the rationale for these experiments and how they fit within the work.

      We have now modified the following sentence to provide the rationale: “To see how important electrostatic interactions were in the multimerization of PbINP as reflected by its ice nucleation activity, it was necessary to lyse the E. coli to change the pH surrounding the INP multimers.”

      Line 245. This work is supported by a model provided by Alphafold. I wonder how reliable this model is; the authors should indicate the quality of the model and provide the accuracy values of the residuals.

      This information is now provided in Figure S1.

      Line 259. Typically in mutagenesis studies, a key residue is substituted with alanine to create a loss of function variant. In this case, the authors have made the following substitutions F1204D, D1208L, and Y1230D, it is not clear to me why the authors have replaced an aromatic residue with one of aspartic acid that is negatively charged.

      We have justified these more extreme changes as follows: “For an enhanced effect of the mutations hydrophobic residues were replaced with charged ones and vice versa.”

      Line 269. This paragraph seems completely unrelated to the section entitled: The β-solenoid of INPs is stabilized by a capping structure at the C terminus, but not at the N terminus.

      We had omitted the sub-heading “Cryo-electron tomography reveals INPs multimers form bundled fibres in recombinant cells”, which is now in place.

      Discussion

      Overall, the discussion is too long and some parts appear cryptic, this section should be reorganized.

      The changes we have made to the Discussion can be seen in the ‘documents compared’ version where the changes are tracked.

      Line 354. It is not clear what experimental evidence supports this model. In the results, this model is never mentioned and it is not clear whether it was obtained by computational analysis or not.

      The model is presented in the Discussion because it was not arrived at by experimentation but is an attempt to integrate the observations made in the Results section. The experimental evidence that supports this model is reviewed in the Discussion section: “Working model of the INP multimer is consistent with the properties of INPs and their multimers.”

      Line 354. The authors used GFP-tagged PbINP. The Authors should discuss the role of GFP in this model and IN activity. A measurement of IN activity on PbINP without GFP would be useful.

      We have previously shown in Ref 8 that the GFP tag has no detrimental effect on ice nucleation activity. Our model for the INP multimer can accommodate this C-terminal tag without any steric hindrance.

      Line 364. The Authors hypothesize that electrostatic interactions stabilize end-to-end dimer associations. To test this hypothesis, the authors should measure the activity of IN at increasing concentrations of NaCl. It is known that high salt concentrations shield charges by preventing the formation of electrostatic intermolecular interactions.

      We have added this sentence to the Discussion: “Another useful test of the electrostatic component to the multimer model would be to study the effects of increasing salt concentration on ice nucleation activity of the E. coli extracts.”

      Line 439. Conclusions should be useful for the reader.

      Material and Methods

      In several sections, the authors refer to what has already been published in Forbes et al. However, the minimum information should also be described in this work. In addition, the Authors should indicate the number of replicates.

      The ice nucleation assays on whole cells were done on the WISDOM apparatus, which integrates 100’s of individual measurements to obtain a T50 value. These T50 values were confirmed by assays on the nanoliter osmometer apparatus. The numbers of replicates used on the nanoliter osmometer apparatus are indicated by box and whisker plots in Figs. 5 & 6 with boxes and bars showing quartiles, with medians indicated by a centre line.

      Line 500. This paragraph should be removed as the results are not described in the manuscript.

      This is a Methods section that describes how that INPs were expression in E. coli. It has details that are important for researchers who want to repeat our findings, such as the use of the Arctic Express strain for producing INP.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the e-mail of 27th September that includes the eLife assessment and reviewers comments on manuscript eLife-RP-RA-2023-91861. We have considered these, added additional data and made various changes to the text as detailed below. We now submit a modified version that we would be happy to view as the ‘Version of Record’.

      We are very pleased to note the highly positive reports from the reviewers. The major change we have made is to alter the Introduction to include further consideration of the development of the ‘bar-code’ hypothesis. As highlighted by reviewer 2 the Lefkowitz/Duke University Group have been major proponents of this concept. However, as with many topics their views did not emerge in isolation. Indeed we (specifically Tobin) were developing similar ideas in the same period (see Tobin et al., (2008) Trends Pharmacol Sci 29, 413-420). Moreover, other groups, particularly that of Clark and collaborators at University of Texas, were developing similar ideas using the beta2-adrenoceptor as a model at least as early as this (e.g. Tran et al., (2004) Mol Pharmacol 65, 196-206). As such we have re-written parts of the Introduction to reflect these early studies whilst retaining information on more recent studies that have greatly expanded such early work. This has resulted in the addition of extra references and re-numbering of the Reference section. We have also provided statistical analysis of agonist-induced arrestin interactions with the receptor as requested by a reviewer and performed additional studies to assess the effect of the GRK2/3 inhibitor in agonist-regulation of phosphorylation of the hFFA2-DREADD receptor. This has led to an additional author (Aisha M. Abdelmalik) being added to the paper.

      To address first the ‘public reviews’

      Reviewer 1

      1. We agree that we do not at this point explore the implications of the tissue specific barcoding we observe and report. However, as noted by the reviewer these will be studies for the future.

      2. The question of why these are only 2 widely expressed arrestins and very many GPCRs is not one we attempt to address here and groups using various arrestin ‘conformation’ sensors are probably much better placed to do so than we are.

      Reviewer 2

      1. It is difficult to address the potential low level of ‘background’ staining in some of the immunocytochemical images versus the ‘cleaner’ background in some of the immunoblotting images. The methods and techniques used are very distinct. However, it should be apparent that the immunoblotting studies are performed (both using cell lines and tissues) post-immunoprecipitation and this is likely to reduce such background to a minimum. This is obviously not the case in the immunocytochemical studies. It is also likely, even though the antisera are immune-selected against the peptide target, there may be some level of immune-recognition this is not limited to the phosphorylated residues.

      2. Whilst this reviewer has commented in detail in the ‘recommendations’ section on the use of English, the other reviewers have not, and we do not find the manuscript challenging to follow or read.

      Reviewer 3

      1. We agree that the mass-spectrometry presented is not quantitative. The intention was for the mass spec to be a guide for the development of the antisera used in the study. We have re-written the initial part of the Results section (page 7) to state that phosphorylation of Ser297 was evident in the basal and agonist-stimulated receptor whilst phosphorylation of Ser296 was only evident following agonist addition.

      2. Immunoblotting is intrinsically variable as parameters of antiserum titre in re-used samples is not assessed and although we are aware that FFA2 displays a degree of constitutive activity (see for example Hudson et al., (2012) J Biol Chem. 287(49):41195-209) we did not make any specific effort to supress this by, for example, including an inverse agonist ligand. Agonist-regulation of phosphorylation of the receptor, as detected in cell lines by the anti- pThr306/pThr310antiserum, is exceptionally clear cut in all the images displayed, and as we note for the pSer296/pSer297 antiserum this was always, in part, agonist-independent.

      The point about compound 101 not being tested directly in the immunoblotting studies performed on the cell line-expressed receptor is a good one. We have now performed such studies which are shown as Figure 2E. These illustrate that the GRK2/3 inhibitor compound 101 does not reduce substantially agonist-induced phosphorylation of the receptor at least as detected by the pThr306/pThr310antiserum or by the pSer296/pSer297 antiserum. Equally this compound had little effect on recognition of the receptor. As the PD2 mutations which correspond to the targets for the pThr306/pThr310antiserum have no significant effect on recruitment of arrestin 3 in response to MOMBA (please see additional statistical analysis in modified Figure 2C) this is perhaps not surprising. Moreover, the PD1 mutations that correspond to the pSer296/pSer297antiserum also, in isolation, only have a partial effect of MOMBA-induced interactions with arrestin 3.

      1. The use of phosphatase inhibitors is an integral part of these studies. As noted in Materials we used PhosSTOP (Roche, 4906837001). However, we failed to make it sufficiently clear that this reagent was present throughput sample preparation for both cell lines and tissue studies. This had been specified previously by two of us (SS, FN, see Fritzwanker S, Nagel F, Kliewer A, Stammer V, Schulz S. In situ visualization of opioid and cannabinoid drug effects using phosphosite-specific GPCR antibodies. Commun Biol. 6, 419 (2023)) but we agree this was insufficient and we now correct this oversight by making this explicit in Results.

      Recommendations

      Reviewer 1

      Competing interest: We apologise for this typographic error. It is now corrected.

      Figures: We have upgraded the figure images to 300dpi and this markedly improves readability

      Reviewer 2

      Revisiting writing: We thank the reviewer for their assessment of the text. However, we do not feel that ‘every sentence in the entire manuscript could be clarified’ is a reasonable statement. Neither of the other reviewers commented on this. Each of the authors read and approved the manuscript.

      Figures: see response to Reviewer 1. We have greatly enhanced image quality at this part of the process.

      Statistics on Figure 2: We apologise for this oversight. Although there were no significant differences in potency for MOMBA to promote interactions with arrestin-3 to each of the PD mutants versus wild type receptor, there were in terms of maximal effect. Statistical analysis was performed via one-way ANOVA followed by Dunnett’s multiple comparisons test. This is now detailed directly in Figure 2C and its associated legend. As noted by the reviewer there was indeed a highly significant effect of the GRK2/3 inhibitor compound 101 and this is now also noted in Figure 2D and its associated legend.

      Units on page 9: pEC50 is considered as Molar by default but we have now specified this. PD1-4: It would be cumbersome to write out (and to read) 8 mutations that make up PD1-4 and hence we think this is specified appropriately in the Figure.

      Reviewer 3

      1. Mass spec: Please see comment point 1 to reviewer 3.

      2. Immunoblotting and compound 101: We have done so.

      3. Phosphatase inhibition: see public comments, reviewer 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to all the reviewers for their thoughtful comments and the efforts they put into reviewing our manuscript. These are highly positive and constructive reviews. Thank you! We have updated our manuscript to include further discussion of several important points (as suggested by reviewers) and addressed reviewer suggestions individually below.

      Reviewer #1 (Public Review):

      This remarkable and creative study from the Asbury lab examines the extent to which mechanical coupling can coordinate the growth of two microtubules attached to isolated kinetochores. The concept of mechanical coupling in kinetochores was proposed in the mid-1990s and makes sense intuitively (as shown in Fig. 1B). But intuitive concepts still need experimental validation, which this study at long last provides. The experiments described in this paper will serve as a foundation for the transition of an intuitive concept into a robust, quantitative, and validated model.

      The introduction cites at least 5 papers that proposed mechanical coupling in kinetochores, as well as 5 theoretical studies on mechanical coupling within microtubule bundles, so it's clear that this manuscript will be of considerable interest to the field. The intro is very well written (as is the manuscript in general), but I recommend that the authors include a brief review of the variable size of k-fibers across species, to help the reader contextualize the problem.

      We agree with the reviewer’s suggestion and have added a brief review of variable k-fiber sizes to the Introduction section (lines 30-35).

      For example, budding yeast kinetochores are built around a single microtubule (Winey 1995), so mechanical coupling is not relevant for this species.

      Indeed, the use of yeast kinetochores to study mechanical coupling is an odd fit, because these structures did not evolve to support such coupling. There is no doubt that yeast kinetochores are useful for demonstrating mechanical coupling and for measuring the stiffnesses necessary to achieve coupling, but I recommend that the authors include a caveat somewhere in the manuscript, perhaps in the place where they discuss their use of simple elastic coupling as compared to viscoelastic coupling or strain-stiffening. It's easy to imagine that kinetochores with large k-fibers might require complex coupling mechanisms, for example.

      Even though yeast kinetochores are built around single microtubules, mechanical coupling has still been proposed to help coordinate the dynamics of sister kinetochores in yeast (Gardner et al. 2005, see main text for full reference). We have added this important point to the Introduction section of the manuscript (lines 33-35). The microtubules attached to sister kinetochores are oriented oppositely to one another, in an anti-parallel arrangement that differs from the parallel arrangement we studied here. Nevertheless, it seems likely to us that coordination of anti-parallel microtubule growth between the single microtubules attached to sister kinetochores in yeast relies at least partly on mechanical coupling. One of the many ways we foresee our dual-trap assay being useful in the future is to test how anti-parallel microtubule growth and shortening can be coordinated via mechanical coupling. Of course, since kinetochores can change the dynamics of their attached microtubules (Umbreit et al., 2012, “The Ndc80 kinetochore complex directly modulates microtubule dynamics”), the kinetochores from different species may have also evolved unique mechanisms of modifying microtubule tension-dependent dynamics to achieve coordination of their attached microtubules. Thus far, in vitro reconstitutions using kinetochore assemblies from metazoans have not yet achieved the coupling stability that we routinely achieve with isolated yeast kinetochores. As reconstitutions with kinetochores from other species improve, it will be very interesting to test for species-specific differences in how the kinetochores influence microtubule dynamics and in how effectively they can coordinate microtubules via mechanical coupling.

      We note that the (visco)elastic properties of yeast kinetochores, and their relative simplicity compared to other kinetochores, shouldn’t significantly affect our primary experimental results. Yeast kinetochores are relatively small and the force on each bead changes very slowly in our experiments (see Figure S3-1 for examples), so the kinetochore’s change in length over time is very slow and very small. We have added this point to the Methods section of the manuscript (lines 479-484). We agree that mechanical coupling in species with large k-fibers might rely on more complex material properties, such as viscoelasticity or strain-stiffening. In principle, that type of complexity could be incorporated into our dual-trap experiments by altering the simulated linker. We view this as an interesting area for future study.

      And is mechanical coupling relevant for holocentric kinetochores like those found in C. elegans?

      This is a very interesting question. While holocentric kinetochores do not form k-fiber bundles (O’Toole et al., 2003, “Morphologically distinct microtubule ends in the mitotic centrosome of Caenorhabditis elegans” and Redemann et al., 2017, C. elegans chromosomes connect to centrosomes by anchoring into the spindle network), mechanical coupling could be even more important for them compared to monocentric kinetochores because tip-attached microtubules both near each other AND at opposite ends of the same chromosome must grow at similar enough rates to stay attached to the same chromosome. In C. elegans prometaphase, opposite chromosome ends move towards the same pole as the chromosome itself oscillates, suggesting that microtubule plus ends attached to the same chromosome are growing in the same direction at the same time (Maddox et al., 2004, ““Holo”er than thou: Chromosome segregation and kinetochore function in C. elegans”). Microtubules appear to stop growing or shortening after chromosome alignment is complete (Redemann et al., 2017), at which time the plus ends of kinetochore microtubules are in close proximity to the chromosome surface (O’Toole et al., 2003, Redemann et al., 2017). The tight clustering of kinetochore microtubule tips near the chromosome at metaphase, as well as the coordinated movement of chromosome arms preceding metaphase, suggests a high level of inter-microtubule coordination in the congression leading up to metaphase. We propose this coordination could be achieved by mechanical coupling through the kinetochore proteins on the surface of holocentric chromosomes and through the underlying chromosome itself.

      The paper shows considerable rigour in terms of experimental design, statistical analysis, and presentation of results. My only comment on this topic relates to the bandwidth of the dual-trap assay, which I recommend describing in the main text in addition to the methods. For example, the authors note that the stage position is updated at 50 Hz. The authors should clearly explain that this bandwidth is sufficiently fast relative to microtubule growth speeds.

      Thank you for this suggestion. We have added to the Results section (lines 131-133) that updating the stage position at 50 Hz is sufficient to maintain the desired force. We also modified the Methods section (lines 488-491) to clarify that the stage position is sampled at 200 Hz, which is more than sufficient to accurately show the growth variability present in dual-trap experiments.

      After describing their measurements, the authors use Monte Carlo simulations to show that pauses are essential to a quantitative explanation of their coupling data. Apparently, there is a history of theoretical approaches to coupling, as the introduction cites 5 theoretical studies. I would have appreciated it if the authors had engaged with this literature in the Results section, e.g. by describing which previous study most closely resembles their own and/or comparing and contrasting their approach with the previous work.

      Thank you for this excellent suggestion. We have added a brief comparison of our work to previous theoretical studies examining the role of mechanical coupling in k-fiber coordination to the Results section (lines 179-185).

      Overall, this paper is rigorous, creative, and thought-provoking. The unique experimental approach developed by the Asbury lab shows great promise, and I very much look forward to future iterations.

      Reviewer #2 (Public Review):

      Leeds et al. investigated the role of mechanical coupling in coordinating the growth kinetics of microtubules in kinetochore-fibers (k-fibers). The authors developed a dual optical-trap system to explore how constant load redistributed between a pair of microtubules depending on their growth state coordinates their growth.

      The main finding of the paper is that the duration and frequency of pausing events during individual microtubule growth are decreased when tension is applied at their tips via kinetochore particles coupled to optically trapped beads. However, the study does not offer any insight into the possible mechanism behind this dependency. For example, it is not clear whether this is a specific property of the kinetochore particles that were used in this experiment, whether it could be attributed to specific proteins in these particles, or if this could potentially be an inherent property of the microtubules themselves.

      We agree that the experiments described in our work do not distinguish between tension-dependence inherent to the microtubule itself and tension-dependence conferred by the kinetochore. We speculate about reasons why tension might disfavor pausing in paragraph 5 of the discussion (lines 356-366). Given that microtubule growth is suppressed by compression without the presence of kinetochores or other microtubule-associated proteins (Dogterom & Yurke, 1997, Janson et al., 2003, see main text for full reference), it seems plausible to us that tension-dependent dynamics, including pausing behaviors, might be inherent to microtubules. However, experiments with different tension-bearing plus-end couplers will be required to test this idea rigorously. We view this as an interesting area for future study.

      The authors simulate the coordination between two microtubules and show that by using the parameters of pausing and variability in growth rates both measured experimentally they can explain coordination between two microtubules measured in their experiments. This is a convincing result, but k-fibers typically have many more microtubules, and it seems important to understand how the ability to coordinate growth by this mechanism scales with the number of microtubules. It is not obvious whether this mechanism could explain the coordination of more than two microtubules.

      We wholeheartedly agree, it is of vital importance to understand how the coordination of growth via mechanical coupling scales with the number of microtubules. Indeed, we have already begun studying simulations of bundles of ten to twenty microtubules based on the pausing model developed in this paper. Simulated microtubule tips appear significantly limited when linked by mechanical couplers of similar stiffnesses to those used in the dual-trap assay, supporting the idea that mechanical coupling may be able to explain much of the coordination between microtubules in growing k-fiber bundles. We hope to use these simulations to continue exploring the degree to which mechanical coupling can coordinate k-fiber microtubules in future publications.

      The range of stiffnesses chosen to simulate the microtubule coupling allows linkers to stretch hundreds of nanometers linearly. However, most proteins including those at kinetochore must have finite size and therefore should behave more like worm-like chains rather than linear springs. This means they may appear soft for small elongations, but the force would increase rapidly once the length gets close to the contour length. How this more realistic description of mechanics might affect the conclusions of the work is not clear.

      While the worm-like chain is likely a better model for individual linker molecules, deformation of the underlying centromeric chromatin is also likely to be important, with viscoelastic properties that are still poorly understood. Rather than using a complicated (viscoelastic or worm-like-chain-based) model with many unconstrained parameters, we felt a simple model with a single stiffness parameter to characterize the coupling material was a better starting point, allowing a straightforward comparison between stiffer and softer coupling. In future work, simulations could be used to study the effects of strain-stiffening and viscoelasticity and ask if these effects might further improve (or degrade) the efficacy of mechanical coupling for coordinating kinetochore microtubules.

      The novel dual-bead assay is interesting. However, it only provides virtual coupling between two otherwise independently growing microtubules. Since the growth of one affects the growth of the other only via software, it is unclear whether the same insight can be gained from the single-bead setup, for example, by moving the bead at a constant speed and monitoring how microtubule growth adjusts to the fixed speed. The advantages of the double-bead setup could have been demonstrated better.

      Thank you for your suggestion to clarify the advantages of our dual-trap approach compared to single-trap experiments. We have added a paragraph to the Discussion section (lines 315-327) to explain the following points: In a real k-fiber bundle, each microtubule can dynamically adjust its growth speed to the current force being applied. In the same way, the dual-trap assay allows us to examine how both leading and lagging tips dynamically adjust to the other’s growth speed simultaneously. In addition, in our dual-trap assay each microtubule in the pair is grown at the same time relative to preparing the slide and comes from an identical batch of kinetochore-bead and tubulin-containing growth buffer. Any differences in growth speeds between paired microtubules can be attributed to intrinsic microtubule variability, rather than prep-to-prep or sample-to-sample differences in microtubule dynamics.

      Reviewer #3 (Public Review):

      Leeds et al. employ elegant in vitro experiments and sophisticated numerical modeling to investigate the ability of mechanical coupling to coordinate the growth of individual microtubules within microtubule bundles, specifically k-fibers. While individual microtubules naturally polymerize at varying rates, their growth must be tightly regulated to function as a cohesive unit during chromosome segregation. Although this coordination could potentially be achieved biochemically through selective binding of polymerases and depolymerases, the authors demonstrate, using a novel dual laser trap assay, that mechanical coupling alone can also coordinate the growth of in vitro microtubule pairs.

      By reanalyzing recordings of single microtubules growing under constant force (data from their own previous work), the authors investigate the stochastic kinetics of pausing and show that pausing is suppressed by tension. Using a constant shared load, the authors then show that filament growth is tightly coordinated when pairs of microtubules are mechanically coupled by a material with sufficient stiffness. In addition, the authors develop a theoretical model to describe both the natural variability and force dependence of growth, using no freely adjustable parameters. Simulations based on this model, which accounts for stochastic force-dependent pausing and intrinsic variability in microtubule growth rate, fit the dual-trap data well.

      Overall, this study illuminates the potential of mechanical coupling in coordinating microtubule growth and offers a framework for modeling k-fibers under shared loads. The research exhibits meticulous technical rigor and is presented with exceptional clarity. It provides compelling evidence that a minimal, reconstituted biological system can exhibit complex behavior. As it currently stands, the paper is highly informative and valuable to the field.

      To provide further clarity regarding the implications of their study, the authors may wish to address the following points in more detail:

      • Considering the authors' understanding of the quantitative relationship between forces, microtubule growth, and coordination, is the dual trap assay necessary to demonstrate this coordination? What advantages does the (semi)experimental system offer compared to a purely in silico treatment?

      Thank you for your suggestion to explain the advantages of our dual-trap approach compared to simulations based on previous recordings of individual microtubules growing under tension. We have added a paragraph about this to the Discussion section (lines 315-327). Previously we knew that a shared load should theoretically tend to coordinate a growing microtubule pair, but we did not know how well, nor did we know the degree of variability that would need to be overcome to achieve coordination. Moreover, there are myriad ways one could model the variability and force dependence in microtubule growth, but not all of them can successfully explain the tip separations we now measure between real microtubule pairs. For instance, our non-pausing model, although entirely derived from force-clamp data, had too much variability and too little coordination between microtubule pairs when we compared simulation results to our dual-trap measurements. Thus, the dual-trap assay allows us to test our assumptions about how variability in microtubule growth arises and how mechanical coupling affects it using real microtubules. Reviewer 2 likewise asked about the advantages of the dual-trap approach relative to single-trap experiments, and we suggest also examining our response to their comment above.

      • What are the limitations of studying a system comprising only two individual microtubules? How might the presence of crosslinkers, which are typically present in vivo between microtubules, influence their behavior in this context?

      This is a very interesting question. K-fiber microtubules in many organisms are subject to forces along their lattices from crosslinkers that attach them to each other and to other microtubules outside the k-fiber. Bridging fibers, for example, are pushed apart at the spindle equator by kinesin motors like Eg5, and are thought to maintain tension on k-fiber microtubule tips by sliding them towards the pole (Vukusic et al., 2017, “Microtubule Sliding within the Bridging Fiber Pushes Kinetochore Fibers Apart to Segregate Chromosomes"). Passive crosslinkers can also produce diffusion-like forces that drive microtubules to move relative to one another (although to our knowledge this has only been demonstrated with antiparallel microtubules—see Braun et al., 2017, “Changes in microtubule overlap length regulate kinesin-14-driven microtubule sliding”). Testing how these various lattice-based forces might affect k-fiber coordination is of great interest to us, but it is not easy to envision how it could be done in our dual-trap setup, where the two coupled microtubules only interact through mechanical forces and are biochemically isolated from one another (in separate assay chambers). Perhaps a clever new assay could be devised in the future to study the role of crosslinkers in combination with mechanical coupling on the coordination of growing microtubules in parallel.

      • How dependent are the results on the chosen segmentation algorithm? Can the distributions of pause and run durations truly be fitted by "simple" Gaussians, as indicated in Figure S5-2? Given the inherent limitations in accurately measuring short durations and the application of threshold durations, it is likely that the first bins in the histograms underestimate events. Cumulative plots could potentially address this issue.

      The qualitative trends of tension suppressing pause entrance and promoting pause exit seemed to be insensitive to the choices we made in our segmentation algorithm. We have added a paragraph to the Methods section (lines 558-569) to explain how other choices we tried (a smoothing window of 5 s compared to 2 s and a minimum event duration of 0.01 s compared to 1 s) had only mild effects on the measured force sensitivities but did not affect their signs. This suggests that while imposing a threshold duration almost certainly underestimates the number of shorter events, it does not substantially affect our overall conclusion that tension reduces the rate of pause entry, accelerates pause exit, and speeds assembly during the ‘runs’ between pauses.

      For segmenting each position-vs-time record into pause and run intervals, we fit the velocity distribution for each individual recording with a mixture of Gaussians. The distributions from some recordings fit quite well to a sum of Gaussians, while others did not fit as well. However, we found that the exact threshold used to separate runs from pauses (typically between 2 and 4 nm/s) had a surprisingly small effect on what the algorithm differentiated as a pause or a run. The segmentation algorithm and its performance on every record we analyzed can be directly viewed by downloading and running our force-clamp viewer, publicly available at https://doi.org/10.5061/dryad.6djh9w16v.

      Reviewer #2 (Recommendations For The Authors):

      In Figure 3a it would be helpful to see the traces of forces applied to individual microtubules. This would help to understand both, how the force is distributed between individual microtubules depending on their dynamic state and also to see the fluctuations of individual forces.

      We completely agree that understanding how force is distributed between microtubules in our dual-trap assay is both interesting and of great value. Although we decided not to include force vs time traces in the main figures, please refer to Figure S3-1, which shows the force-vs-time curves corresponding to the example position-vs-time traces displayed in Figure 3a, plus examples from two additional microtubule pairs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The paper offers some potentially interesting insight into the allosteric communication pathways of the CTFR protein. A mutation to this protein can cause cystic fibrosis and both synthetic and endogenous ligands exert allosteric control of the function of this pivotal enzyme. The current study utilizes Gaussian Network Models (GNMs) of various substrate and mutational states of CFTR to quantify and characterize the role of individual residues in contributing to two main quantities that the authors deem important for allostery: transfer entropy (TE) and cross correlation. I found the TE of the Apo system and the corresponding statistical analysis particularly compelling. I found it difficult, however, to assess the limitations of the chosen model (GNM) and thus the degree of confidence I should have in the results. This mainly stems from a lack of a proposed mechanism by which allostery is achieved in the protein. Proposing a mechanism and presenting logical alternatives in the introduction would greatly benefit this manuscript. It would also allow the authors to place the allosteric mechanism of this protein in the broader context of protein allostery.

      As detailed below, we went to great lengths to address these concerns, with an emphasis on the limitations of the model and a proposed mechanism. These revisions should hopefully warrant a re-evaluation of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. It would greatly benefit the paper to state a proposed mechanism by which allostery is achieved in this protein. Is this through ensemble selection, ensemble induction, or a purely dynamic mechanism? What is the rationale for choosing the proposed mechanism and what are reasonable alternative mechanisms? How does this mechanism fit in the broader context of protein allostery?

      Following this comment, we added a VERY extensive description of the proposed mechanism by which allostery is achieved in CFTR and present the rationale for choosing this mechanism (lines 445-97 and Figure 7). Briefly, based on previous experimental results and our results we propose that no single model can explain allostery in CFTR, and that its allosteric mechanism is a combination of induced fit, ensemble selection, and a dynamic mechanism.

      1. With a proposed mechanism in place, the choice of a GNM to investigate the mechanism and eliminate alternative mechanisms should be rationalized.

      The rational for choosing GNM (and ANM-LD) to study the proposed mechanism is now given in lines 498-510. Please note however, that as mentioned in the response to point 1 (and detailed in lines 445-97), the choice of allosteric mechanism, and ruling out other alternatives was not based solely on GNM and ANM-LD, but also on previous experimental results.

      1. A discussion of the strengths and limitations of the GNM are pivotal to understanding the limitations of the results shown. How sensitive are the results to specific details of the model(s)?

      a. A discussion of the strengths and limitations of the GNM have been added to the introduction. Please see lines 107-122.

      b. Sensitivity of the results to the specific details of GNM:

      GNM uses two parameters: the force constant of harmonic interactions and the cutoff distance within which the existence of the interactions is considered. The force constant is uniform for all interactions and is taken as unity. Its value affects only the absolute values of the fluctuations (i.e., their scale) but not their distribution. As we are only looking at fluctuations in relative terms our results are insensitive to its value. GNM uses a cutoff distance of 7-10 Å in which interactions are considered (10 Å used in this study). To test the sensitivity of the results to the cutoff distance we repeated the calculations using 7 Å. As now discussed in lines 170-73 and shown in Figure S2 the results remained largely unchanged.

      c. Sensitivity of the results to the specific details of TE: To identify cause-and-effect relations TE introduces a time delay (τ) between the movement of residues. The choice of τ is important: when τ is too small, only local cause-and-effect relations (between adjacent amino acids) will be revealed. if τ is too big, few (if any) cause-and-effect relations will manifest. This is analogous to the effects of a stone throne into a lake: look too soon, before the stone hits the water, and you’ll see no ripples. Look too late, the ripples will have already subsided. In a previous work (PMID 32320672), we studied in detail the effects of choosing different τ values and found that an optimal value of τ which maximizes the degree of collectivities of net TE values is in most cases 3× τopt (τopt is the time window in which the total TE of residues is maximized). Details of how τ was chosen were added to the methods section.

      In general, the limitations of the chosen model(s) is difficult to determine from the current manuscript because it is devoid of details of the model. While I understand that GNMs have been widely used to study protein systems, the specifics of the model are central to the current work and thus should be provided somewhere in the manuscript.

      a. As mentioned in our response above, the limitations of GNM are now presented (lines 107-122).

      b. The specifics of the model are now given in more detail in the methods section.

      c. In addition, as mentioned above, the results are largely independent of the values of the model’s parameters.

      b. Would changing the force constants to a more anisotropic model qualitatively change the results?

      a. GNM assumes isotropic fluctuations, and the calculations are based on this assumption. Therefore, GNM is inherently an isotropic model.

      b. Importantly, we complement the GNM-TE calculations with ANM-LD simulations, which predict the normal modes in 3D using an anisotropic network model.

      1. How repeatable is the difference between no ATP bound and ATP bound CFTR? I worry that the differences in TE in Figures 1 and 3A are mainly due to two different crystallization conditions. Is there evidence that two different structures of the same protein in the same ligand state lead to small changes in TE?

      To address this concern, we repeated the calculations using the structures of the ATP-free and bound forms of zebrafish CFTR. As now explained in text (lines 298-303) and shown in Figure S8 the effects of ATP are highly repeatable.

      1. Collective modes - why should we expect allostery to be in the most collective modes? Let alone the 10 most? Why not do a mode by mode analysis? Why, for example, were two modes removed page 9 first full paragraph?

      a. Collective modes: We have erroneously referred to the slow modes as collective modes. This has now been corrected throughout the manuscript.

      b. Let alone the 10 most?

      c. why should we expect allostery to be in the most collective modes? Residues that are allosterically coupled are expected to display correlated motions. The slow modes (formerly referred to as “collective modes”) are generally the most collective ones, i.e., display the greatest degree of concerted motions. We therefore expect these modes to contain the allosteric information.

      d. Furthermore, as now explained in the text (lines 163-69) and in Figure S1 the Eigenvalue decays of ATP-free and -bound CFTR demonstrate that the 10 slowest GNM modes sufficiently represent the entire dynamic spectrum (the distribution converges after the 10th slow mode).

      e. Why not do a mode by mode analysis? It is entirely possible to do a mode-by-mode analysis. However, our view is that the allosteric dynamics of a protein is best represented by an ensemble of modes, rather than by individual ones. We found (as detailed here PMID 32320672) that it is more informative to first use the complete set of modes that encompasses the dynamics (the 10 slowest modes in our case) and then gradually remove the dominant modes.

      f. As explained in text (lines 254-7) and more elaborately in our previous work (PMID 35644497), the large amplitude of the slowest modes may hide the presence of “faster” modes that may nevertheless be of functional importance. Removal of the 1-2 slowest modes often helps reveal such modes.

      g. Why, for example, were two modes removed page 9 first full paragraph? As explained for the ATP-free form (lines 257-60), removal of these two slowest modes allowed the “surfacing” of dynamic features which were hidden before. We propose that these dynamic features are functionally relevant (see lines 304-19). Removal of other modes did not provide additional insight.

      Minor issues:<br /> 1. Statements like "see shortly below" should be made more specific (or removed completely).

      Corrected as suggested

      1. "interfered" should be "inferred" page 10 middle of the first full paragraph

      Corrected as suggested

      1. End parenthesis after "(for an excellent explanation about the correlation between TE and allostery see (41)." Page 4 middle of first full paragraph

      Corrected as suggested

      Reviewer #2 (Public Review):

      In this study, the authors used ANM-LD and GNM-based Transfer Entropy to investigate the allosteric communications network of CFTR. The modeling results are validated with experimental observations. Key residues were identified as pivotal allosteric sources and transducers and may account for disease mutations.

      The paper is well written and the results are significant for understanding CFTR biology.

      Reviewer #2 (Recommendations For The Authors):

      Technical comments:

      p4 Please explain how is the time delay parameter tau chosen (ie. three times the optimum tau value...)? It seems this unknown time should depend on the separation between i and j. Is the TE result sensitive to the choice of tau? How does the choice of cutoff distance of GNM affect the TE result?

      a. The choice of τ is important: when τ is too small, only local cause-and-effect relations (between adjacent amino acids) will be revealed. if τ is too big, few (if any) cause-and-effect relations will manifest. This is analogous to the effects of a stone throne into a lake: look too soon, before the stone hits the water, and you’ll see no ripples. Look too late, the ripples will have already subsided. In a previous work (PMID 32320672), we studied in detail the effects of choosing different τ values and found that an optimal value of τ which maximizes the degree of collectivities of net TE values is in most cases 3× τopt (τopt is the time window in which the total TE of residues is maximized). Details of how τ was chosen were added to the methods section.

      b. To test the sensitivity of the results to the cutoff distance we repeated the calculations using 7 Å. As now discussed in lines 170-173 and shown in Figure S2 the results remained largely unchanged.

      It would be nice to directly validate the causal prediction by GNM-based TE. For example, is it in agreement with direct causal observation of MD simulation? If the dimer is too big for MD, perhaps MD is more feasible for the monomer (NBD1+TMD1).

      a. The causality we determined using GNM-based TE is in good agreement with conclusions drawn from single channel electrophysiological recordings and rate-equilibrium free-energy relationship analysis (Sorum et al; Cell 2015, and see lines 8691, and 364-70).

      b. To the best of our knowledge, causality relations in CFTR are yet to be determined by MD simulations (This is likely because the protein is too big and the conformational changes are very slow). We cannot therefore compare the causality.

      c. Conducting MD simulations on half of CFTR (NBD1+TMD1) is not likely to be very informative: the ATP binding sites are formed at the interface of NBD1 and NBD2, and the ion translocation pathway at the interface of the TMDs.

      p5 How are the TE peak positions different from other key positions as predicted by GNM, such as the hinge positions with minimal mobility of the dominant GNM modes?

      Following this comment, we compared the positions of the GNM-TE peaks and the hinge positions as determined by GNM. As now discussed in lines 173-178 and shown in Figure S3 we observed partial overlap which was nevertheless statistically significant (Figure S3).

      p7 How to select the 10 most collective GNM modes? Why not use the 10 slowest GNM modes?

      We have actually used the 10 slowest GNM modes, but in an attempt to cater for the non-specialist reader, we referred to them as the most collective ones. This has now been corrected throughout the manuscript and the terminology that is now used is “10 slowest modes”

      p9 There exist other ANM-based methods for conformational transition modeling. So it would be nice to discuss their similarity and differences from ANM-LD, and compare their predictions.

      Alternative ANM (and other elastic network models) -based methods are now mentioned and referenced in lines 144-50. These methods are different from ANM-LD in the details of the all atom simulations and in their integration with the elastic network model. It is not trivial to reanalyze CFTR’s allostery using these methods and is beyond the scope of this work.

      Regarding the prediction of order of residue motions, can one directly observe such order by superimposing some intermediate conformation of ANM-LD with the initial and end structure?

      This would indeed be very attractive approach to visualize the order of events and following this comment we have tried to do just so. Unfortunately, we failed: Superimposing pairs of frames provided little insight, and we therefore compiled a video comprising all frames, or videos based on averages of several time delayed frames. We found that it is next to impossible to discern (using the naked eye) the directionality of the fluctuations and follow the order of conformational changes. Therefore, at this point, we have abandoned this endeavor.

      Reviewer #3 (Public Review):

      This study of CFTR, its mutants, dynamics, and effects of ATP binding, and drug binding is well written and highly informative. They have employed coarse-grained dynamics that help to interpret the dynamics in useful and highly informative ways. Overall the paper is highly informative and a pleasure to read.

      The investigation of the effects of drugs is particularly interesting, but perhaps not fully formed.

      This is a remarkably thorough computational investigation of the mechanics of CFTR, its mutants, and ATP binding and drug binding. It applies some novel appropriate methods to learn much about structure's allostery and the effects of drug bindings. It is, overall, an interesting and well written paper.

      There are only two main questions I would like to ask about this quite thorough study.

      Reviewer #3 (Recommendations For The Authors):

      1. Is it possible that the relatively large exothermic ATP hydrolysis itself exerts a force that causes the observed transitions? Jernigan and others have explored this effect for GroEL and some other structures. The effects of ATP binding and hydrolysis are likely often confused, and both are likely to be important.

      It is well established by many studies that ATP hydrolysis is not required to drive the conformational changes or to open the channel, and that ATP binding per-se is sufficient (e.g., We have clarified this point in lines 521-30.

      1. For the case of ivacaftor, would a comparison of the motion's directions show that ivacaftor might be compensating simply by its mass being located in a site to compensate for the mass changes from the mutations (ENMs with masses needed to address this). We have observed such cases on opposite sides of a hinge.

      We do not think that this is the case, from the following reasons:

      a. Ivacaftor corrects many gating mutations (e.g., G551D, G178R, S549N, S549R, G551S, G970R, G1244E, S1251N, S1255P, G1349D) which are spread all over the protein. Ivacaftor binds to a single site in CFTR, and it is therefore unlikely that its mass contribution corrects all these diverse mass changes.

      b. The residues that comprise the Ivacaftor binding were identified as allosteric “hotspots” in both the ATP-free and -bound forms (Figures 2B, 3B, and 6A), also in the absence of the drug. This indicates that the dynamic traits of this site is intrinsic to it, and that once bound, the drug acts by modulating these dynamics

      The Abstract does not repeat some of the more interesting points made in the paper and would benefit from a substantial revision.

      Corrected as suggested

      There are just a few minor points (just words):

      P 3 line 2 of first full ¶: "effects" should be "affects"

      Corrected as suggested

      P 6 first lilne "per-se" should be "per se"

      Corrected as suggested

      Further down that page "two set" should be "two sets"

      Corrected as suggested

      Even further down that same page "testimony" should be "support"

      Corrected as suggested

      P 10, 5 lines from the bottom "impose that" is awkward

      Changed to “define”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors have previously employed micrococcal nuclease tethered to various Mcm subunits to the cut DNA to which the Mcm2-7 double hexamers (DH) bind. Using this assay, they found that Mcm2-7 DH are located on many more sites in the S. cerevisiae genome than previously shown. They then demonstrated that these sites have characteristics consistent with origins of DNA replication, including the presence of ARS consensus sequences, location of very inefficient sites of initiation of DNA replication in vivo, are free of nucleosomes, they contain a G-C skew and they locate to intergenic regions of the genome. The authors suggest, consistent with published single molecule results, that there are many more potential origins in the S. cerevisiae genome than previously annotated.

      The results are convincing and are consistent with prior observations. The analysis of the origin associated features is informative.

      Reviewer #2 (Public Review):

      By mapping the sites of the Mcm2-7 replicative helicase loading across the budding yeast genome using high-resolution chromatin endogenous cleavage or ChEC, Bedalov and colleagues find that these markers for origins of DNA replication are much more broadly distributed than previously appreciated. Interestingly, this is consistent with early reconstituted biochemical studies that showed that the ACS was not essential for helicase loading in vitro (e.g. Remus et al., 2009, PMID: 19896182). To accomplish this, they combined the results of 12 independent assays to gain exceptionally deep coverage of Mcm2-7 binding sites. By comparing these sites to previous studies mapping ssDNA generated during replication initiation, they provide evidence that at least a fraction of the 1600 most robustly Mcm2-7-bound sequences act as origins. A weakness of the paper is that the group-based (as opposed to analyzing individual Mcm2-7 binding sites) nature of the analysis prevents the authors from concluding that all of the 1,600 sites mentioned in the title act as origins. The authors also show that the location of Mcm2-7 location after loading are highly similar in the top 500 binding sites, although the mobile nature of loaded Mcm2-7 double hexamers prevents any conclusions about the location of initial loading. Interestingly, by comparing subsets of the Mcm2-7 binding sites, they find that there is a propensity of at least a subset of these sites to be nucleosome depleted, to overlap with at least a partial match to the ACS sequence (found at all of the most well-characterized budding yeast origins), and a GC-skew. Each of which is a characteristic of previously characterized origins of replication.

      Overall, this manuscript greatly broadens the number of sites that are capable of loading Mcm2-7 in budding yeast cells and shows that a subset of these additional sites act as replication origins. Although these sites do have a propensity to include a match to the ACS, these studies suggest that the mechanism of helicase loading in yeast and multicellular organisms is more similar than previously thought.

      Reviewer #1 (Recommendations For The Authors):

      Specific Comments:

      1. The proposal, based on this study, that replication in S. cerevisiae is similar to that in Human cells (mentioned in the abstract, introduction and end of discussion) is not supported by the evidence, either in this paper or elsewhere. The authors suggest that even these inefficient origins are directed by specific sequences that load Mcm2-7 DH, but there is no evidence that this occurs outside a limited clade of budding yeasts and certainly no in human cells. Furthermore, the distribution and efficiency of origins of replication Human cells has not been shown to parallel the findings in this paper. Thus, the conclusion should be removed since it makes a statement that S. cerevisiae and Human cells have similar mechanisms for origin location. This might confuse non-specialists who do not appreciate the subtleties.

      The reviewer's concern that we could confuse non-specialists is well-founded. We have made the following changes to emphasize the point that, while a wider distribution of origins makes S phase in yeast more like that in humans, the genome replication programs in the two organisms remain distinctly different:

      1) The last sentence of the abstract was changed as follows:

      a. These results shed light on recent reports that as many as 15% of replication events initiate outside of known origins, and they reveal S phase in yeast to be surprisingly similar to that in humans.

      b. These results shed light on recent reports that as many as 15% of replication events initiate outside of known origins, and this broader distribu5on of replica5on origins suggest that S phase in yeast may be less dis5nct from that in humans than is widely assumed.

      1. A sentence in the results was changed as follows:

      a. Another characteris5c of known origins that we could use as a criterion to assess the nature of Mcm binding sites is the presence of an ACS.

      b. Another characteris5c of known origins in S. cerevisiae (although not in most other organisms) that we could use as a criterion to assess the nature of Mcm binding sites is the presence of an ACS.

      1. We changed the last sentence of the Discussion as follows:

      a. On the other hand, the sharply focused nature of its replication origins made S phase in yeast appear distinct from that in other organisms. Our discovery that sites of replica5on ini5a5on in yeast are much more widely dispersed than previously believed, with at least 1600 and possibly as many as 5500 origins, emphasizes its continued relevance to understanding genome duplication in humans.

      b. On the other hand, the sharply focused nature of its replication origins made S phase in yeast appear dis?nct from that in other organisms. Although by no means elimina5ng this dis5nc5on, our discovery that sites of replication ini5a5on in yeast are much more widely dispersed than previously believed, with at least 1600 and possibly as many as 5500 origins, emphasizes yeast's continued relevance to understanding S phase in humans.

      1. The authors discuss in the introduction that origins in S. cerevisiae are equivalent to ARS sequences. Why didn't they ask if the inefficient origins also confer ARS activity? This would be a valuable addition and a very simple experiment.

      The inefficient origins are not expected to confer ARS activity, because origins that are not licensed in essentially every G1 will be diluted out by cell division. We confirmed the absence of our inefficiently licensed origins in a data set generated by high throughput sequencing of a genomic library that was selected for origin activity (PMID: 23241746), but we did not note the results of this analysis in our manuscript, because the low complexity of the library used made this negative result uninformative. To clarify this point, we added the bolded clauses to the following sentences in the Introduction and Discussion:

      1. Origins vary widely in their efficiency, with some being used in almost every cell cycle while others may be used in only one in one thousand S phases (Boos and Ferreira, 2019), with only the former being capable of supporting plasmid replication in the traditional ARS assay.
      2. "Thus, we can detect Mcm complexes that are loaded in as few as 1 in 500 cells (Foss et al., 2021), even though such low affinity Mcm binding sites are not expected to be capable of supporting autonomous replication of a plasmid."
      1. While the authors have shown that Mcm2-7 is loaded adjacent to the principal ARS consensus sequence, consistent with biochemical studies on pre-RC assembly, two reports have shown that the Mcm2-7 ChIP is dependent on the B2 element of ARS1, but the ORC ChIP is not, suggesting that Mcm2-7 is loaded there (See Lipford and Bell, Mol. Cell 2007 and Zou and Stillman, Mol. Cell. Biol. 2000).

      We have added the following two sentences in the Results section to note these reports:

      "Furthermore, in the case of ARS1, two reports have demonstrated a requirement for the B2 element for Mcm loading, though not for Orc binding, suggesting that Orc may bind to the ACS but then load Mcm at the B2 element (Zou and Stillman 2000; Lipford and Bell 2001). This would still leave Mcm loaded downstream of the ACS, but we note this result to emphasize that not all details of Mcm loading in vitro have been definitively established."

      **Reviewer #2 (Recommendations For The Authors):>>

      Specific points:

      1. The authors state "It is notable that the Mcm-ChEC panel of Figure 3A shows no obvious change in Mcm stoichiometry across the entire range, from low abundance, at the bottom, to high abundance, at the top." The ChEC method does not intrinsically measure stoichiometry so this conclusion needs more explanation. The authors appear to be referring to the distribution of Mcm2-7 reads being similar across all origins, but this does not measure how many double hexamers are present at an origin. If the stoichiometry argument is based on a finding that each origin has only a single 60 bp region that is protected by Mcm2-7 (rather than a distribution of 60 bp regions spread across the origin), then the authors should provide more compelling evidence than what is shown in Fig. 3A.

      We agree with the reviewer that our conclusion needs more explanation, and we have therefore made the following change, which we believe clarifies the point that we were trying to convey:

      We agree with the reviewer that our conclusion needs more explanation, and we have therefore made the following change, which we believe clarifies the point that we were trying to convey:

      1. Original version: It is notable that the Mcm-ChEC panel of Figure 3A shows no obvious change in Mcm stoichiometry across the entire range, from low abundance, at the bottom, to high abundance, at the top. This argues against models in which higher replication activity at more active origins reflect the loading of more Mcm double-hexamers at those origins within a single cell.

      2. Updated version: It is notable that, when Mcm is present, it is present predominantly as a single double-hexamer (right panel of Figure 3A), and that this remains true across the entire range of abundance shown in Figure 3A. This argues against models in which higher replication activity at more active origins is caused by the loading of more Mcm double-hexamers at those origins within a single cell, since such models predict that multiple Mcm footprints should be more prevalent at the top (high abundance) of the Mcm-ChEC heat map in Figure 3A than at the bottom.

      1. The authors state "we estimate that ~1-2 % cells have an Mcm complex loaded at the Mcm binding sites in the eighth cohort (ranks 1401-1600)" but it is not clear how this estimate is calculated. An explanation would help the reader to understand this statement.

      We have expanded on our earlier statement to clarify how we arrived at the estimate:

      1. Original version: Based on our previous analysis of MCM occupancy (Foss et al., 2021), which showed that approximately 90% cells have an MCM complex loaded at one of the most active known replication origins, we estimate that ~1-2 % cells have an Mcm complex loaded at the Mcm binding sites in the eighth cohort (ranks 1401-1600).

      2. Updated version: We have previously used Southern blodng to demonstrate that approximately 90% of the DNA at one of the most active known origins (ARS1103) is cut by Mcm-MNase (Foss et al., 2021), and to thereby infer that 90% of cells have a doublehelicase loaded at this origin. Using this as a benchmark, we estimate that ~1-2 % cells have an Mcm complex loaded at the Mcm binding sites in the eighth cohort (ranks 14011600).

      1. Although there is evidence that some subset of the CMBS sites exhibit nucleosome depletion, an ACS, and a GCskew, the authors should do a better job of making the reader aware that it is likely that a decreasing percentage of the individual origins in a group include these characteristic and that this is a likely factor explaining the increasingly rare use of these sites as Mcm2-7 loading sites and origins of replication.

      We have added the following text to the Discussion to draw the reader's attention to this possibility, while also noting that we do not believe it to be a major factor in the increasingly rare use of sites within the first 5,500 CMBSs as replication origins:

      Furthermore, it is possible that, as one moves to lower abundance groups of CMBSs within the most abundant 5500 sites, a smaller fraction of sites within those groups have any origin function at all. If one takes this model to the extreme, it would suggest that the continuous decline in replication activity seen in Figure 2B between the group comprised of ranks 1-200 and that comprised of ranks 1401-1600 reflects an ever increasing fraction of CMBSs with zero origin activity. At the other extreme, the decline in replication activity could be interpreted within a framework in which 100% of CMBSs in each group function as replication origins, but that their replication activity declines with rank, perhaps because continuously decreasing fractions of cells in the population contain a single double-hexamer. While the truth presumably lies between these two extremes, we favor a model that tilts toward the latter view, because of the abruptness of the transition that appears around rank 5,000 in (1) nucleosomal architecture (Figures 3A, 3B and S3); (2) intergenic versus genic localization and transcription levels (Figure 4A); (3) EACS position weight matrix scores (Figure 5B); and (4) GC skew (Figure 6B). By these criteria, the CMBSs below rank 5000 appear relatively homogeneous, while still showing a gradual decline in replication activity with MCM abundance within the range of detection (11600). Our assumption is that the qualitative homogeneity is more consistent with a quantitative, but not qualitative, change in CMBSs with declining MCM abundance among the top 5000 CMBSs.

      1. The argument that there are as many as 5,500 origins is not well justified. Similarly, the evidence that there are even 1,600 origins is not compelling. As the authors state, to see the peaks observed in the various analyses (ssDNA association, nucleosome depletion, etc.) of the increasingly less populated CMBSs (e.g. those with fewer ChEC reads), only a small subset of the CMBS are likely to have a given characteristic. Given that the loading of a Mcm2-7 double hexamer makes any site a potential origin, it would be more appropriate to say that there could be as many as 5,500 potential origins but many if not most are unlikely to ever direct initiation.

      The reviewer is correct that, because many of our analyses rely on group averages rather than individual measurements, we are oien unable to make statements that can be applied to every member of a group. We had tried to emphasize this point in our original manuscript with the following two sentences (in bold), which were in the Results and Discussion, respectively:

      1. First, clear peaks of ssDNA signal extend down to the eighth cohort (brown line), which corresponds to CMBSs ranked 1401-1600. Of course, this does not imply that all of these sites function as replication origins, and nor does it imply that no sites below that rank do so, since we have reached the limits of detection of this ssDNA-based assay. Nonetheless, it suggests that replication activity is common among sites extending at least down to rank 1600.

      2. Of course, we do not conclude that all CMBSs with ranks lower than 5500 function as replication origins, nor that none with ranks above 5500 do so, but only that the number of replication origins is likely to be approximately an order of magnitude higher than widely believed.

      We have now added a third sentence to further underline this point (in bold):

      Second, by averaging signals of replication from multiple Mcm binding sites, we were able to extract weak signals of replication. This is due to the fact that noise, which is randomly distributed, will tend to cancel itself out, while signals of replication will consistently augment the signal at the midpoint of the origin (Figure 2). An inevitable shortcoming to this approach is that it precludes analysis of specific sites; in other words, not every member of the group will share the average characteristic of that group.

      A separate issue that this touches on is the distinction between a replication origin and a site at which Mcm2-7 has been loaded. While it strikes us as unlikely that a loaded Mcm complex would be completely incalcitrant to activation, it is a formal possibility. To alert the reader to this issue, we have added the following clause, in bold, to the Abstract, and we have also added the sentence below that to the Discussion:

      We conclude that, if sites at which Mcm double-hexamers are loaded can function as replication origins, then DNA replication origins are at least 3-fold more abundant than previously assumed, and we suggest that replication may occasionally initiate in essentially every intergenic region.

      Finally, it is important to note that, in equating Mcm binding sites with potential replication origins, we are assuming that if an Mcm double-hexamer is loaded onto the DNA, then it is conceivable that that complex can be activated.

      1. The author's discussion of the relationship between Mcm2-7 location relative to the ACS and the mechanism of of Mcm2-7 loading does not consider that Mcm2-7 double hexamers can slide on DNA after loading (for example, Remus et al., 2009 PMID: 19896182). Thus, the authors are not looking at sites of loading only the distribution of Mcm2-7 molecules after loading. In addition, biochemical experiments do not predict a particular Mcm2-7 position relative to the ACS. Indeed, at ARS1, one would predict that the close proximity of the second weak match to the ACS (the B2 element) to the primary ACS would lead the Mcm2-7 double hexamer being initially formed at a site overlapping the ARS1 ACS. It is much more likely that the explanation for the distribution of Mcm2-7 locations relative to the ACS is that the ORC-bound ACS and the nucleosomes immediately flanking the origin prevents Mcm2-7 from occupying the right-side of the origin as illustrated in Fig. 5D.

      We have tried to emphasize this point more clearly. In our original manuscript, we had brought up the possibility of Mcms sliding after being loaded in the following context (see bolded clause):

      Specifically, in 112 out of 146 instances in which a peak of Mcm signal was within 100 base pairs of a known ACS, that peak was downstream of the ACS. The 34 exceptions may reflect (1) incorrect identification of the ACS; (2) incorrect inference of the directionality of the site; or (3) sliding of the Mcm complex after it has been loaded.

      We have now added the following to further emphasize the point:

      In interpreting the results above, it is important to remember that the locations at which we are detecting Mcm complexes by ChEC do not necessarily reflect the locations at which those complexes were loaded, since Mcm double-hexamers can slide along the DNA after loading (Remus et al. 2009; Gros et al. 2015; Foss et al. 2019).

      We have also softened the following conclusion by changing "confirmation of" to "support for":

      "...our results...provide in vivo support for in vitro predictions of the directionality of Mcm loading by Orc..."

      There are missing references in several places:

      1. "For example, 15 of the 56 genes that contained a high abundance site have been implicated in meiosis and sporulation and are not expressed during vegetative growth (~5 out of 56 expected from random sampling), consistent with previous observations (Mori and Shirahige, 2007)." Should include Blitzblau et al., 2012 (PMC3355065) which showed that Mcm2-7 loading was impacted by differences in meiotic and mitotic transcription.

      2. "In contrast to the low abundance sites, the most abundant 500 sites showed a preference for convergent over divergent transcription (left of vertical dotted line in Figure 4B), in agreement with a previous report (Li et al., 2014)." This preference was first pointed out in MacAlpine and Bell, 2005 (PMID: 15868424).

      3. "This sequence is recognized by the Origin Recognition Complex (Orc), a 6-protein complex that loads MCM (Broach et al., 1983; Deshpande and Newlon, 1992; Eaton et al., 2010; Kearsey, 1984; Newlon and Theis, 1993; Singh and Krishnamachari, 2016; Srienc et al., 1985)." This list should include a reference to Bell and Stillman, 1992 (PMID: 1579162), which first described ORC and showed that it recognized the ACS. It would also be more helpful to the reviewer to distinguish the references that identified that ACS from those concerning ORC binding to it.

      We thank the reviewer for pointing out these missing references, and we have added them. We have also separated the references that note the identification of the ACS sequence from those that demonstrate Orc binding to that sequence.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      "MAGIC" was introduced by the Rong Li lab in a Nature letters article in 2017. This manuscript is an extension of this original work and uses a genome wide screen the Baker's yeast to decipher which cellular pathways influence MAGIC. Overall, this manuscript is a logical extension of the 2017 study, however the manuscript is challenging to follow, complicated by the data often being discussed out of sequence. Although the manuscripts make claims of a mechanism being pinpointed, there are many gaps and the true mechanisms of how the factors identified in the screen influence MAGIC is not clear. A key issue is that there are many assumptions drawn on previous literature, but central aspects of the mechanisms being proposed are not adequately shown.

      Key comments:

      1. Reasoning and pipelines presented in the first two sections of the results are disordered and do not follow figure order. In some instances, the background to experimental analyses such as detailing the generation of spGFP constructs in the YKO mutant library, or validation of Snf1 activation are mentioned after respective results are discussed. This needs to be fixed.

      We thank the reviewer for pointing out potential confusion to readers. We have revised the first two sections according to reviewer’s suggestion. (Page 4-6)

      1. In general there is a lack of data to support microscopy data and supporting quantification analysis. The validity of this data could be significantly strengthened with accompanying western blots showing accumulation of a given constructs in mitochondrial sub compartments (as was the case in the lab’s original paper in 2017).

      We appreciate the reviewer’s suggestion on biochemical validations. However, the validity of this imaging-based assay for detecting import of cytosolic misfolded proteins into mitochondria, including the use of FlucSM as a model misfolding-prone protein, was carefully established in our previous study by using appropriate controls, super resolution imaging, APEX-based proximity labeling, and classical biochemical fractionation and protease protection assay (Ruan et al., 2017 Nature, ref. 10). We have reminded readers of these validation experiments in the previous study on Page 4, line 14-17.

      In recent years, advancements in imaging-based tools have allowed many protein interactions and dynamic processes, which were previously examined by using biochemical assays in lysates of populations of cells, to be observed with various level of quantitation in live cells with intact cellular compartments. Many of these assays, e.g., the RUSH assay for ER to Golgi transport, FRAP-based analysis for nuclear/cytoplasmic shuttling of proteins, or FRET-based assays for protein-protein interactions, have been well accepted and even embraced by the respective fields of study once validated with genetic and biochemical approaches. The advantages for live-cell imaging-based assays are often their unique ability to report dynamic processes or unstable molecular species with spatiotemporal sensitivity. Respectfully, it is our view, based on our own experience, that the traditional protease protection assay is not adequate or sufficiently quantitative for examining the presence of unstable misfolded proteins in mitochondrial sub-compartments, given the obligatorily lengthy in vitro cell lysis and mitochondrial isolation process, during which the unstable proteins are continuously being degraded. This likely explains our previous biochemical fractionation result that only weak protein signals were detected in the matrix fraction (Ruan et al., 2017 Nature, ref. 10). In addition, unlike stably folded, native mitochondrial matrix proteins, misfolded/unfolded proteins such as Lsg1 or FlucSM are highly susceptible to protease treatment. This sensitivity makes the assay unreliable for detecting such proteins if trace amount of the protease penetrates mitochondrial membranes during cell lysis even without detergent treatment.

      While we agree that protease protection assay is highly valuable for qualitative detection of the presence of a protein in certain mitochondrial compartments or determining its topology on membranes, this assay (regrettably in our hands) does not allow quantitative comparisons that were necessary for this study, because of inherent sample to sample variation, yet the laborious and low throughput nature of this assay makes it difficult for adequate statistical analysis. Furthermore, the level of protein detection in various fractions is highly sensitive to how the sample is treated with protease and detergent. Our imaging-based quantification, on the other hand, allows us to compare increased or decreased presence of GFP11-tagged proteins in mitochondria under different metabolic conditions or in different mutant or wild-type strains. Data from hundreds of cells and at least three independent biological replicates allowed us to apply adequate statistical analysis to aid our conclusion.

      1. Much of the mechanisms proposed relies on the Snf1 activation. This is however not shown but assumed to be taking place. Given that this activation is central to the mechanism proposed, this should be explicitly shown here - for example survey the phosphorylation status of the protein.

      Both REG1 deletion and low glucose conditions have been demonstrated extensively for Snf1 phosphorylation and activation in yeast (e.g., many seminal papers from Marian Carlson’s and other lab, such as ref. 24-28). In our study, we have indeed corroborated this by showing that Mig1 was exported from the nucleus in Δreg1 mutant and in low glucose conditions (Figure 1—figure supplement 2H and I. The mechanism of Snf1-mediated nuclear export of Mig1 has been characterized in detail as well (e.g., ref. 29-31).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      SPECIFIC COMMENTS

      Genetic Screen o Line 20 - the narrative moves to SNF1, but the reasoning for the selection of this Class I substrate is not defined. What was the basis for this selection - what happened to the other Class I substrates. It is stated in the text that the other Class I proteins show the same increase in spGFP signal. The data showing this should be included in the Supp Figure 1 for transparency.

      We have moved the narratives of Snf1 function to the second section and clarified that we were interested in this gene due to its central role in metabolism and mitochondrial functions that may influence MAGIC (Page 5: line 16-20). Other genes in class 1 were shown in Table S1. Detailed discussion of other genes in this category is beyond the scope of this study.

      Snf1/AMPK prevents MP accumulation in mitochondria:

      The FlucDM data in human RPE-1 mitochondria seems to be added to only increase the significance of the work. The mechanisms suggested here with Hap4 would not be possible in human cells as there is no homologue of this protein in human cells. Making generalisations that these pathways are conserved based on this one experiment is not appropriate.

      We appreciate this feedback. Although the focus of this study is the regulation of MAGIC by the yeast AMPK Snf1, we would like to share our initial observation that suggests a similar role of AMPK in human RPE-1 cells. We acknowledge that the underlying mechanisms regarding the downstream transcription factors and pathway for misfolded protein import could be different in mammalian cells, but the overall effect of AMPK in mitochondrial biogenesis is well known to resemble that of Snf1. To avoid making over-generalization, we changed our statement of conclusion to: ‘These results suggest that AMPK in human cells regulates MP accumulation in mitochondria following a similar trend as in yeast, although the underlying mechanisms might differ between these organisms.’ (Page 7: line 2-4)

      Mechanisms of MAGIC regulation by Snf1:

      While the lysosome is ruled out here the authors have not considered the proteasomes. Is there a reason for this? Given accumulation of aggregates outside of mitochondria, and previous connections of the proteasome to mitochondrial quality control this would be an obvious thing to check. We examined the role of lysosomal degradation here because it is known to be activated under Snf1active condition (ref. 37). We appreciate this feedback and have included a new analysis on MG132treated FlucSM spGFP strains in which PDR5 gene was deleted to avoid drug efflux.

      This result suggests that the proteosome inhibitor did not ablate the difference in FlucSM accumulation between these conditions. That MG132 promoted mitochondrial accumulation of FlucSM in both high glucose and low glucose conditions was not surprising, as FlucSM is also degraded by proteasome in the cytosol (Ruan et al., 2017 Nature, ref. 10), and preventing this pathway could divert more of such protein molecules toward MAGIC. (Page 7: line 26-29).

      Line 13 "we hypothesized that elevated expression of mitochondrial preproteins induced by the activation of Snf1-Hap4 axis (REF) may outcompete MPs for import channels". This statement has some assumptions. The authors have not shown that Snf1 is activated in thier models and more importantly that they have an accumulation of mitochondrial preproteins. The data that follows using the cytosolic domains of the receptors is hard to rationalise without seeing evidence that there is in fact pre-protein accumulation or impacts on the mitochondrial proteome in this system.

      As stated in our response to main point [3], Snf1 activation in reg1 mutant or in low glucose is evidenced by our data showing Mig1 export from nucleus to cytoplasm and had also been shown in many previous publications. A recent study (Tsuboi et al., 2020 eLife) also showed a dramatic increase in mitochondrial volume fraction in Δreg1 cells and wild-type cells in respiratory conditions, further supporting the role of Snf1 in mitochondrial biogenesis. We have provided relevant references in the manuscript (ref. 24-28).

      The ability of Tom70 cytosolic domain (Tom70cd), which can bind mitochondrial preproteins but not localize to mitochondria due to lack of N-terminal targeting sequence, to compete with endogenous Tom70 for mitochondrial preproteins has been well documented (ref. 47-49). However, we agree with the reviewer that a future quantitative proteomics study to measure changes in mitochondrial proteome under Tom70cd over-expression could allow more accurate interpretation of our experimental result.

      AMPK protects cellular fitness during proteotoxic stress:

      The inhibition of preprotein import by overexpressing the cytosolic domains of receptors is not supported with some proof of principle data. If this was working as the authors assume, it is not clear why only an effect with Tom70 is observed. The majority of the mitochondrial proteome is imported via Tom20/Tom22 so this does not align with what the authors are suggesting. Is the Tom70CD and any associated Hsp proteins facilitating the observed changes to the MPs?

      We thank the reviewer for raising this point. We expressed different TOM receptor cytosolic domains but found that Tom70cd had the strongest rescue on MAGIC under AMPK activation conditions. It is possible that certain Tom70 substrates or Tom70-assoicated heat shock proteins inhibit the import of MAGIC substrates. We admit that a clear explanation of this unexpected observation necessitates a better understanding of how native and MAGIC substrates are selected and imported by the outer-membrane channel. We can only offer our best interpretation based on the current state of the understanding, and we feel that we have been careful to acknowledge such in the manuscript.

      While the effect of AMPK inactivation reducing FUS accumulation was striking, this was all in the context of overexpression and may not be physiologically relevant - or may occur very transiently under basal conditions. Is GST an appropriate control here, why not use WT FUS? Likewise, one representative image is shown in Figure 5 - can the authors show western blotting that mitochondrial accumulation of FUS can be reduced with AMPK activation?

      We thank the reviewer for this suggestion, however, overexpressed FUS WT is also aggregation prone (Zhihui Sun et al., 2011, PloS Biology; Shulin Ju, 2011, PloS Biology; Jacqueline C. Mitchell et., 2013, Acta Neuro). We believe that GST, as a well-folded protein, is an appropriate control (Ruan et al., 2017 Nature, ref. 10). As we discussed in response to main point [1], the in vitro assay involving protease protection and western blots do not allow reliable quantitative comparison in our hands.

      In text changes.

      The analysis pipeline of the YKO mutant library should be introduced at the very start of the first paragraph, not the end.

      Addressed on Page 4, second paragraph

      "Fluc" should be introduced as "Firefly luciferase" within the first paragraph of the first section, also need to define SM and DM in FlucSM/FlucDM - these appear to be missing.

      Addressed in both Introduction (Page 2: line 29; Page 3: line 8-9) and re-clarified in Result (Page 5: line 27-29)

      The role of Reg1 should be explicitly stated in the text, not just in the figure.

      Addressed on Page 6: line 3-6

      Figure 1H legend states Reg1 (WT) is Snf1-inactive and Reg1 KO is Snf1-active. This wording is confusing and is not supported by data, but by assumption. If the authors want to use this wording then evidence needs to be provided - as suggested above.

      We have changed this and other legends to only show genotypes and medium conditions.

      "Tom70cd overexpression also exacerbated growth rate reduction due to FlucSM expression in HG medium (Figure 4A; Figure 4 - figure supplement 1A)" should be figure supplement 1B.

      Fixed on Page 10: line 10

      "These results suggest that glucose limitation protects mitochondria and cellular fitness during FlucSM induced proteotoxic stress through Snf1-dependent inhibition of MP import into mitochondria". The phrase "Snf1-dependent inhibition of MP import into mitochondria" may be misleading, as Snf1 isn't modulating import directly but is acting on transcriptional regulators to modulate mitochondrial import under stress.

      We restated the conclusion as follows: ‘These results suggest that Snf1 activation under glucose limitation protects mitochondrial and cellular fitness under FlucSM-associated proteotoxic stress.’ (Page 10: line 20- 21)

      "... Significantly increased the fraction of spGFP-positive and MMP-low cells in both HG and LG medium (Figure 4G-K)" should be (Figure 4J-K).

      Fixed on Page 11: line 3

      Reviewer #2 (Public Review):

      Work of Rong Li´s lab, published in Nature 2017 (Ruan et al, 2017), led the authors to suggest that the mitochondrial protein import machinery removes misfolded/aggregated proteins from the cytosol and transports them to the mitochondrial matrix, where they are degraded by Pim1, the yeast Lon protease. The process was named mitochondria as guardian in cytosol (MAGIC).

      The mechanism by which MAGIC selects proteins lacking mitochondrial targeting information, and the mechanism which allows misfolded proteins to cross the mitochondrial membranes remained, however, enigmatic. Up to my knowledge, additional support of MAGIC has not been published. Due to that, MAGIC is briefly mentioned in relevant reviews (it is a very interesting possibility!), however, the process is mentioned as a "proposal" (Andreasson et al, 2019) or is referred to require "further investigation to define its relevance for cellular protein homeostasis (proteostasis)" (Pfanner et al, 2019).

      Rong Li´s lab now presents a follow-up story. As in the original Nature paper, the major findings are based on in vivo localization studies in yeast. The authors employ an aggregation prone, artificial luciferase construct (FlucSM), in a classical split-GFP assay: GFP1-10 is targeted to the matrix of mitochondria by fusion with the mitochondrial protein Grx5, while GFP11 is fused to FlucSM, lacking mitochondrial targeting information. In addition the authors perform a genetic screen, based on a similar assay, however, using the cytosolic misfolding-prone protein Lsg1 as a read-out.

      My major concern about the manuscript is that it does not provide additional information which helps to understand how specifically aggregated cytosolic proteins, lacking a mitochondrial targeting signal could be imported into mitochondria. As it stands, I am not convinced that the observed FlucSM-/Lsg1-GFP signals presented in this study originate from FlucSM-/Lsg1-GFP localized inside of the mitochondrial matrix. The conclusions drawn by the authors in the current manuscript, however, rely on this single approach.

      In the 2017 paper the authors state: "... we speculate that protein aggregates engaged with mitochondria via interaction with import receptors such as Tom70, leading to import of aggregate proteins followed by degradation by mitochondrial proteases such as Pim1." Based on the new data shown in this manuscript the authors now conclude "that MP (misfolded protein) import does not use Tom70/Tom71 as obligatory receptors." The new data presented do not provide a conclusive alternative. More experiments are required to draw a conclusion.

      In my view: to confirm that MAGIC does indeed result in import of aggregated cytosolic proteins into the mitochondrial matrix, a second, independent approach is needed. My suggestion is to isolate mitochondria from a strain expressing FlucSM-GFP and perform protease protection assays, which are well established to demonstrate matrix localization of mitochondrial proteins. In case the authors are not equipped to do these experiments I feel that a collaboration with one of the excellent mitochondrial labs in the US might help the MAGIC pathway to become established.

      We thank Reviewer 2 for these suggestions, but we would like to respectfully offer our difference in opinion:

      a. Regarding the suggestion “to isolate mitochondria from a strain expressing FlucSM-GFP and perform protease protection assays”, in our previous study (Ruan et al., 2017 Nature, ref. 10), we have indeed applied two independent biochemical approaches: APEX-mitochondrial matrix proximity labeling and classic protease protection assay using non-spGFP strains, both consistently confirmed the entry of misfolded proteins into mitochondria under proteotoxic stress. Our super-resolution imaging further confirmed the import of the split GFP-labeled proteins to be inside mitochondria. Moreover, as we discussed in response to Reviewer 1’s main point [2], while the suggested biochemical assay is useful for validating topology within mitochondria, it is not quantitative and may not reliably report the in vivo accumulation of misfolded proteins in mitochondria due to the isolation process that takes hours, during which the unstable proteins could be continuously degraded within mitochondria.

      While we agree with the reviewer that we do not yet understand how misfolded proteins are imported into mitochondria, it would be unfair to state “as it stands, I am not convinced..” simply because the underlying mechanism remains to be elucidated. We would like to point out that targeting sequences for many well-established mitochondrial proteins are still not well defined. It is well known that mitochondrial targeting sequences are not as uniformly predictable as, for example, nuclear targeting sequences. Our finding that deletion of TOM6 enhances the import of misfolded proteins suggest that their import may involve the TOM channel in a more promiscuous conformation, which may reduce the requirement for a specific sequence-based targeting signal associated with the substrate.

      b. Regarding the role of Tom70, in our 2017 study, using proteomics and subsequently immunoprecipitation we validated the binding, albeit not necessarily direct, between misfolded protein FlucSM and Tom70. Therefore, “we speculate that protein aggregates engaged with mitochondria via interaction with import receptors such as Tom70”. Recent studies from different labs confirmed the interactions between Tom70 and aggregation prone proteins (Backes et al., 2021, Cell Reports; Liu et al., 2023, PNAS). In the current study, surprisingly, knockout of TOM70 did not block MAGIC, suggesting redundant components of mitochondria import system may facilitate the recruitment of misfolded proteins in the absence of Tom70, and this does not contradict the notion that Tom70 helps tether protein aggregates to mitochondria.

      c. Regarding other studies also showing the import of misfolding or aggregation-prone cytosolic proteins into mitochondria, there have been at least several recent studies in the literature for mammalian cells involving either model substrates or disease proteins (e.g., ref. 12-15; 56-58; Vicario, M. et al. 2019 Cell Death Dis.). The studies are briefly mentioned in Introduction (Page 3, paragraph 2). The present manuscript documents a major effort from our group using whole genome screen in yeast to understand the mechanism and regulation of MAGIC. Many of the screen hits have yet to be studied in detail. We full agree that much remains to be understood about whether and how this pathway affects proteostasis and what might be the evolutionary origin for such a mechanism.

      Additional comments:

      The genetic screen:

      The genetic screen identified five class 1 deletion strains, which lead to enhanced accumulation of Lsg1GFP and a larger set of class 2 mutants, which lead to reduced accumulation. Please note, in my opinion it is not clear that accumulation of the reporters occurs inside the mitochondria. In any case, the authors selected one single protein for further analysis: Snf1, the catalytic subunit of the yeast SNF complex, which is required for respiratory growth of yeast.

      The results of the screen are not discussed in any detail. The authors mention that ribosome biogenesis factors are abundant among class 2 mutants. Noteworthy, Lsg1 is involved in 60S ribosomal subunit biogenesis. As Lsg1-GFP11 is overexpressed in the screen this should be discussed. Class 2 mutants also .include several 40S ribosomal subunit proteins (only one of the 60S subunit). What does this imply for the MAGIC model? Also, it should be discussed that the screen did not identify reg1 and hap4, which I had expected as hits based on the data shown in later parts of the manuscript.

      We apologize for the confusion, but the GFP11 tag was in fact knocked into the C-terminus of Lsg1 in the endogenous LSG1 locus, and so Lsg1 was not overexpressed in the screen. We have made sure that this information is clearly conveyed in the revised manuscript (Page 4: line 20-22). How the ribosome small subunit affects MAGIC is beyond the focus of the current study and will be pursued in the future.

      Regarding why certain mutants did not come out of our initial screen, this is not unexpected as the YKO collection, although extremely valuable to the community, is known to be potentially affected by false knockouts, suppressor accumulation and cross contamination (for references, e.g., Puddu et al., 2019 Nature). Additionally, high-through screens can also miss real hits. In our experience using this collection in several studies, we often found additional hits from analysis of genes implicated by known genetic or biochemical interactions.

      Mutant yeast strains and growth assays:

      The Δreg1 strain grows poorly in all growth conditions and frequently accumulates extragenic suppressor mutations (Barrett et al, 2012). It would be good to make sure that this is not the case in the strains employed in this study. My suggestion is to do (and show) standard yeast plating assays with the relevant mutant strains including Δreg1, snf1, hap4, Δreg1Δhap4 without the split GFP constructs and also with them (i.e. the strains that were used in the assays).

      We thank the reviewer for the suggestion. We were indeed aware of potential accumulation of suppressor mutations from the YKO library. Therefore, deletion mutants like Δreg1 and loss of TFs downstream of Snf1 that we used in the study after the initial screen were all freshly made and validated. At least 3 independent colonies were analyzed for each mutant (mentioned in Methods & Materials; Page 33, line 57). Moreover, the plating assay suggested here may not reveal additional information other than growth, which was taken into consideration during our experiments.

      Activation of Snf1 in the relevant strains should be tested with the commercially available antibody recognizing active Snf1, which is phosphorylated at Snf1-T210.

      Snf1 activation was validated by the Mig1 exporting from the nucleus. We also noted above that many studies have clearly demonstrated Snf1 activation in reg1 mutant and under low glucose growth (e.g., ref. 24-28).

      Effects of Snf1, Reg1, Hap4 and respiratory growth conditions:

      The authors show that split GFP reporters show enhanced accumulation during fermentative growth, in Δsnf1, and Δreg1Δhap4 and fail to accumulate during respiratory growth, in Δreg1 and upon overexpression of HAP4. Analysis of Δhap4 should be included in Fig. 2. The suggestion that upon activation of Snf1 enhanced Hap4-dependent expression "outcompetes" misfolded protein import seems unlikely as only a fraction of mitochondrial genes is under control of Hap4. Without further experimental evidence I do not find that a valid assumption. More likely, the membrane potential plays a role: it is low during fermentative growth, in Δsnf1 and Δreg1Δhap4, and high during respiratory growth and in Δreg1 (Hübscher et al, 2016). Such an effect of the membrane potential seems to contradict the findings in the 2017 paper and the issue should be clarified and discussed. In any case, these data do not reveal that GFP reporters accumulate inside of the mitochondria. Based on the currently available evidence they may accumulate in close proximity/attached to the mitochondria. This has to be tested directly (see above).

      We have included our analysis of Δhap4 in Page 8: line 14-15 and Figure 2—figure supplement 1H. Consistent with our result for Δreg1Δhap4 in glucose-rich medium, HAP4 deletion also resulted in a significant increase in mitochondrial accumulation of FlucSM in low glucose medium compared to WT. It did not have effect in high glucose condition in which Snf1 is largely inactive.

      It is our view that the importance of Hap4 should not be judged by the number of nuclear encoded mitochondrial proteins they regulate. Still, this sub-group comprises a considerable number of proteins (at least 55 genes upregulated by Hap4 overexpression, ref. 43), and certain substrates may be more competitive with misfolded cytosolic proteins for import. Our genetic data strongly suggest that the inhibitory effect of active Snf1 on MAGIC is through Hap4, although we agree with the reviewer that detailed mechanism on how Hap4 substrates may compete with misfolded proteins need to be addressed in future studies.

      Membrane potential is important for mitochondrial import. During respiratory growth and in Δreg1, membrane potential is well known to be elevated comparing to fermentative condition (e.g., Figure 4C). Our observation that the import of misfolded proteins into mitochondria is reduced under these conditions simply suggests that this reduction is not due to a lack of membrane potential. This is not in any way contradictory to our 2017 finding that misfolded protein import requires membrane potential (ref. 10).

      Again, the accumulation of misfolded proteins in mitochondria, especially the model protein FlucSM, has been validated by using super resolution imaging (Figure 1—figure supplement 1A) in addition to the protease protection assay in our 2017 study.

      Introduction and Discussion:

      Both are really short, too short in my view. Please provide some background of the general principals of mitochondrial protein import and information of how exactly translocation of cytosolic, aggregated proteins (lacking targeting information) is supposed to work. I do not understand exactly how the authors actually envisage the process.

      We thank the reviewer for the suggestion. In the revised manuscript, we have extended both Introduction (Page 2-3) and Discussion section (Page 11-13)

      The results from the 2022 eLife paper (Liu et al, 2022), which suggests that Tom70 may "regulate both the transcription/biogenesis and import of mitochondrial proteins so the nascent mitochondrial proteins do not compromise cytosolic proteostasis or cause cytosolic protein aggregation" should be discussed with regard to the data obtained with overexpression of the Tom70 soluble domain.

      We thank the reviewer for pointing out that study and we have included a brief comment in Discussion section (Page 12: line 13-16). As the function of Tom70 appears to be complex, we cannot exclude the possibility that overexpression of the cytosolic domain has additional or indirect effects in addition to that due to preprotein binding.

      Andreasson, C., Ott, M., and Buttner, S. (2019). Mitochondria orchestrate proteostatic and metabolic stress responses. EMBO Rep 20, e47865.

      Barrett, L., Orlova, M., Maziarz, M., and Kuchin, S. (2012). Protein kinase A contributes to the negative control of Snf1 protein kinase in Saccharomyces cerevisiae. Eukaryot Cell 11, 119-128.

      Hubscher, V., Mudholkar, K., Chiabudini, M., Fitzke, E., Wolfle, T., Pfeifer, D., Drepper, F., Warscheid, B., and Rospert, S. (2016). The Hsp70 homolog Ssb and the 14-3-3 protein Bmh1 jointly regulate transcription of glucose repressed genes in Saccharomyces cerevisiae. Nucleic Acids Res. 44, 5629-5645.

      Liu, Q., Chang, C.E., Wooldredge, A.C., Fong, B., Kennedy, B.K., and Zhou, C. (2022). Tom70-based transcriptional regulation of mitochondrial biogenesis and aging. Elife 11

      Pfanner, N., Warscheid, B., and Wiedemann, N. (2019). Mitochondrial proteins: from biogenesis to functional networks. Nat Rev Mol Cell Biol 20, 267-284.

      Ruan, L., Zhou, C., Jin, E., Kucharavy, A., Zhang, Y., Wen, Z., Florens, L., and Li, R. (2017). Cytosolic proteostasis through importing of misfolded proteins into mitochondria. Nature 543, 443-446.

      I prefer to have "all in one", also due to time limitation.

      It would be great to be able to upload the review file as otherwise formatting and symbols get lost.

      Reviewer #3 (Public Review):

      In this study, Wang et al extend on their previous finding of a novel quality control pathway, the MAGIC pathway. This pathway allows misfolded cytosolic proteins to become imported into mitochondria and there they are degraded by the LON protease. Using a screen, they identify Snf1 as a player that regulates MAGIC. Snf1 inhibits mitochondrial protein import via the transcription factor Hap4 via an unknown pathway. This allows cells to adapt to metabolic changes, upon high glucose levels, misfolded proteins an become imported and degraded, while during low glucose growth conditions, import of these proteins is prevented, and instead import of mitochondrial proteins is preferred.

      This is a nice and well-structured manuscript reporting on important findings about a regulatory mechanism of a quality control pathway. The findings are obtained by a combination of mostly fluorescent protein-based assays. Findings from these assays support the claims well.

      While this study convincingly describes the mechanisms of a mitochondria-associated import pathway using mainly model substrates, my major concern is that the physiological relevance of this pathway remains unclear: what are endogenous substrates of the pathway, to which extend are they imported and degraded, i.e. how much does MAGIC contribute to overall misfolded protein removal (none of the experiments reports quantitative "flux" information). Lastly, it remains unclear by which mechanism Snf1 impacts on MAGIC or whether it is "only" about being outcompeted by mitochondrial precursors.

      We thank Reviewer 3 for the positive and encouraging comments on our manuscript. We agree with the reviewer that identifying MAGIC endogenous substrates and understanding what percentage of them are degraded in mitochondria are very important issues to be addressed. We are indeed carrying out projects to address these questions. We also agree with Reviewer 3 that the effect of Snf1 on MAGIC may have additional mechanisms in addition to precursors competition, such as Tom6 mediated conformational changes of TOM pores. In the revised manuscript, we had added a discussion to address these comments (Page 12: line 21-28).

      Reviewer #3 (Recommendations For The Authors):

      1. In their screen, the authors utilize differences in GFP intensity as a measure for import efficiency. However, reconstitution of the GFP from GFP1-10 and GFP11 in the matrix might also be affected (folding factors, differential degradation).

      Upon Snf1 activation, the protein abundance of mitochondrial chaperones such as Hsp10, Hsp60, and Mdj1, and mitochondrial proteases such as Pim1 are not significantly changed (ref. 35). Therefore, it is unlikely that the folding and degradation capacity of mitochondrial matrix is drastically affected by Snf1 activation.

      To examine the effect of Snf1 activation on spGFP reconstitution, Grx5 spGFP strain was constructed in which the endogenous mitochondrial matrix protein Grx5 was C-terminally tagged with GFP11 at its genomic locus, and GFP1-10 was targeted to mitochondria through cleavable Su9 MTS (MTS-mCherryGFP1-10) (ref. 10). Only modest reduction in Grx5 spGFP intensity was observed in LG compared to HG, and no significant difference after adjusting the GFP1-10 abundance (spGFP/mCherry ratio) (Figure 1— figure supplement 3A-D). These data suggest that any effect on spGFP reconstitution is insufficient to explain the drastic reduction of MP accumulation in mitochondria under Snf1 activation. Overall, our results demonstrate that Snf1 activation primarily prevents mitochondrial accumulation of MPs, but not that of normal mitochondrial proteins. (Page 6: line 17-25).

      We admit, however, that to fully rule out these factors, specific intra-mitochondrial folding or degradation reporter assays would be needed.

      1. Scoring of protein import always takes place using fluorescence-based assays. These always require folding of the "sensors" in the matrix. An additional convincing approach that would not rely on matrix folding could be pulse chase approaches coupled to fractionation assays and immunoprecipitation.

      We thank reviewer 3 for this suggestion. In our previous study, we applied two different biochemical assays: APEX proximity labeling, and mitochondrial fractionation followed by protease protection. Both confirmed the entry of misfolded proteins into mitochondria as observed by using split GFP. As we discussed in response to Reviewer 1’s main point [3], the fractionation assays are not quantitative enough for the comparisons made in our study. In particular, during the over 2-hour assay, misfolded proteins continue to be degraded within mitochondria. By using proper controls, our spGFP system provides quantitative comparisons for mitochondrial accumulation of misfolded proteins in non-disturbed physiological conditions.

      1. Could the pathway be reconstituted in vitro with isolated mitochondria to test for the "competition hypothesis"

      This is an excellent suggestion, but setting up such a reconstituted system is a project on its own. The study documented in this manuscript already encompasses a large amount of work that we feel should be published timely.

      1. Fluorescence figures are not colour blind friendly (red-green). This should be improved by changing the color scheme.

      We thank reviewer 3 for pointing this out and sincerely apologize for any inconvenience. However, we are unfortunately unable to change all images within a limited time. We will adopt another color scheme in future work.

      1. spGFP in human cells appears to form "spot-like" structures. What are these granules?

      We indeed observed granule-like structures by spGFP labeled FUS in mitochondria, which is interesting, but we did not investigate this further because it is a not a focus of this study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers

      To whom it may concern, Thank you for your constructive feedback on our manuscript. I appreciate the time and effort that you and the reviewers have dedicated to providing your valuable feedback. We are grateful to the reviewers for their insightful comments and suggestions for our paper. I have been able to incorporate changes to reflect the majority of these suggestions provided. I have updated the analysis scripts (at https://github.com/neurogenomics/reanalysis_Mathys_2019) and have listed these changes in blue below:

      eLife assessment:

      This work is useful as it highlights the importance of data analysis strategies in influencing outcomes during differential gene expression testing. While the manuscript has the potential to enhance awareness regarding data analysis choices in the community, its value could be further enhanced by providing a more comprehensive comparison of alternative methods and discussing the potential differences in preprocessing, such as scFLOW. The current analysis, although insightful, appears incomplete in addressing these aspects.

      We thank the reviewing editors for this note. We agree that the differences in preprocessing will affect the results and conceal which step in our reanalysis resulted in the discrepancies we noted. To address this, we have split out our reanalysis into two separate parts - In the main body of the text we discuss the differences resulting from just changing the differential expression approach where we use the same processed data as the authors to enable a fair comparison. Secondly, we still provide the reprocessed data and perform differential expression analysis on it and discuss the cause and impact the differences in the processing steps made to the results.

      Reviewer 1:

      I think readers would be interested to learn more about the genes that were found "significant" by the original paper but sorted out by the authors. Did they just fall short of the cutoffs? If so, how many more samples would have been required to ascertain significance? This would yield a recommendation for future studies and an overall more positive/productive spirit to the manuscript. On the other hand, I suspect a fraction of DEGs were false positives due to differences in the proportions of cells from different individuals compared to the original analysis. Which percentage of DEGs does this apply to? Again, this would raise awareness of the issue and support the use of pseudobulk approaches.

      To investigate the relationship between the genes and how they differ across our analysis we have added a correlation analysis between our different DE approaches (using the same processed data), see paragraph 5 in the manuscript and supplementary table 3. In short, we find that there is a high correlation in the genes’ fold change values across our pseudobulk analysis and the author’s pseudoreplication analysis on the same dataset (pearson R of 0.87 for an adjusted p-value of 0.05) which is somewhat expected given the DE approaches are applied to the same dataset. However, the p-values, which pertain to the likelihood that a gene’s expressional changes is related to the case/control differences in AD, and resulting DEGs vary considerably due to the artificially inflated confidence of the author’s approach (Fig. 1c-e). Despite there being a correlation between the pseudoreplciation and pseudobulk approaches here, we do not think it makes sense to consider how many more samples would have been required to ascertain significance. The differences in results between the two approaches is not negatable with sample size as many DEGs identified by pseudoreplication will be false positives as highlighted in previous work1,2,3,4. However, perhaps we are misinterpreting the reviewer, who may have meant a power analysis which we have not conducted. Such an undertaking would require analysing a multitude of snRNA-Seq of large sample sizes to garner a confident estimate for power calculations based on pseudobulk approaches. Although we agree with the reviewer that this would be beneficial to the field, we do not believe it is in scope for this work. On the reviewer’s note regarding a fraction of DEGs being false positives due to differences in the proportions of cells from different individuals compared to the original analysis - We have analysed the same processed data the authors used to negate the differences caused by the differing processing steps. We thank the reviewer for this suggestion. We also give more insight into the cause of these differences, namely on filtering our nuclei with large proportions of mitochondrial reads and discuss their effect in paragraph 3 (also see Supplementary Figure 2).

      Given there are only a few DEGs, it would be good to show more data about these genes to allow better assessment of the robustness of the results, i.e., boxplots of the pseudobulk counts in the compared groups and perhaps heatmaps of the raw counts prior to aggregation. This could rule out concerns about outliers affecting the results.

      In Supplementary Figure 3, we have added boxplots of the sum pseudobulked, trimmed mean of M-values (TMM) normalised counts for three of our identified DEGs (b) and three of the authors’ DEGs which they discuss in their manuscript (a) to show the differences in counts across AD pathology and controls for these genes. We hope this gives some insight into the transcriptional changes highlighted by the differing approaches. In our opinion, there is a clear difference in the transcriptional signal in the genes identified from pseudobulk which is not present for the genes identified from the authors approach.

      Overall, I believe the paper would deliver a clearer message by mainlining the QC from the original study and only changing the DE analysis. However, if keeping the part about QC/batch correction:

      • Assess to which degree changes in cell type proportion are indeed due to batch correction (as suggested in the text) and not filtering by looking at the annotated cell types in the original publication and those in your analysis.

      • Also perform the analysis without changing QC and state the # of DEGs in both cases, to at least allow some disentanglement of the effect of different steps of the analysis.

      • Please state the number of cells removed by each QC step in the supplementary note.

      We thank the reviewer for this suggestion. We agree with performing the DE analysis on the same processed data as the original authors and have split out our reanalysis into two separate parts, primarily focussing on the discrepancies caused by the choice of differential expression (DE) approach. By splitting our analysis in this manner, we can identify the substantial differences in results caused by differing the DE approach in the study. Secondly, we can see how differences in preprocessing affects the DE results in isolation too – see paragraph 8 but in short, the fold change correlation between pseudobulk DE analyses on the reprocessed data vs authors processed data only had a moderate correlation (Pearson R of 0.57).

      In regards to the number of cells removed by each QC step, we have added an aggregated view for all samples in supplementary table 3 and also give the full statistics per sample in our Github repository: https://github.com/neurogenomics/reanalysis_Mathys_2019. Moreover, we investigated the root cause in the differences in nuclei numbers, uncovering filtering down to mitochondrial read proportions as the main culprit (Supplementary Figure 2).

      I recommend the authors read the following papers, assess whether their methodology agrees with them, and add citations as appropriate to support statements made in the manuscript.

      We thank the reviewer for this comprehensive list. We have updated our manuscript and supplementary file and main text throughout to cite many of these where appropriate. We believe this helps add context to our decisions for the differing tools and approaches used as part of the processing pipeline with scFlow and the differential expression approach.

      I believe the authors' intention was to show the results of their reanalysis not as a criticism of the original paper (which can hardly be faulted for their strategy which was state-of-the-art at the time and indeed they took extra measures attempting to ensure the reliability of their results), but primarily to raise awareness and provide recommendations for rigorous analysis of sc/snRNA-seq data for future studies.

      We thank the reviewer for this note, this was exactly our intent. Furthermore, we are based in a dementia research institute and our aim is to ensure that ensure that the Alzheimer’s disease research field does not focus on spuriously identified genes.We have updated the text of the manuscript (start paragraph 2) to explicitly state this so our message is not misconstrued.

      In my opinion, the purpose of the paper might be better served by focusing on the DE strategy without changing QC and instead detailing where/how DEGs were gained/lost and supporting whether these were false positives.

      We agree that the differences in preprocessing will affect the results and conceal which step in our reanalysis resulted in the discrepancies we noted. To address this, we have split out our reanalysis into two separate parts - In the main body of the text we discuss the differences resulting from just changing the differential expression approach where we use the same processed data as the authors to enable a fair comparison. Secondly, we still provide the reprocessed data and perform differential expression analysis on it and discuss the impact the differences in the processing steps made to the results. As previously mentioned, we have also added further investigation into the DEGs identified, looking at the correlation across the differing approaches and plotting the counts for selected genes.

      For instance, removal with a mitochondrial count of <5% seems harsh and might account for a large proportion of additional cells filtered out in comparison to the original analysis. There is no blanket "correct cutoff" for this percentage. For instance, the "classic" Seurat tutorial https://satijalab.org/seurat/articles/pbmc3k_tutorial.html uses the 5% threshold chosen by the authors, an MAD-based selection of cutoff arrived at 8% here https://www.sc-best-practices.org/preprocessing_visualization/quality_control.html, another "best practices" guide choses by default 10% https://bioconductor.org/books/3.17/OSCA.basic/quality-control.html#quality-control-discarded, etc. Generally, the % of mitochondrial reads varies a lot between datasets.

      Apologies, the 5% cut-off was a misprint – the actual cut-off used was 10% which, as the reviewer notes, is on the higher side of what is recommended. We have updated our manuscript to rectify this mistake and discuss the differences in the number of cells caused by the two approaches to mitochondrial filtering in the manuscript (paragraph 3). We found that over 16,000 nuclei that were removed in our QC pipeline were kept by the author’s (Supplementary Fig. 2), explaining the discrepancy in the number of nuclei after QC. Based on Supplementary Fig. 2, it is clear the author’s approach was ineffective at removing nuclei with high proportions of mitochondrial reads which is indicative of cell death5,6. We hope this alleviates the reviewer’s concerns around our alternative processing approach. Moreover, as mentioned, we swapped to compare the differences by DE approaches on the same data to avoid any effect by this.

      Reviewer 2:

      The paper would be better if the authors merged this work with the scFLOW paper so that they can justify their analysis pipeline and show it in an influential dataset.

      We thank the reviewer for this note. We would like to clarify that the purpose of our work was not to show the scFlow analysis pipeline on an influential dataset but rather to raise awareness and provide recommendations for rigorous analysis of single-cell and single-nucleus RNA-Seq data (sc/snRNA-Seq) for future studies and to help redirect the focus of the Alzheimer’s disease research field away from possible spuriously identified genes. We have updated our manuscript text to highlight this (see start paragraph 2). Furthermore, we are aware our original approach reprocessing the data with scFlow will affect the results and conceal which step in our reanalysis resulted in the discrepancies we noted. Thus, we have split out our reanalysis into two separate parts - In the main body of the text we discuss the differences resulting from just changing the differential expression approach where we use the same processed data as the authors to enable a fair comparison. Secondly, we still provide the reprocessed data so that the community can benefit from it and perform differential expression analysis on it and discuss the impact the differences in the processing steps made to the results. We have also added further references supporting the choice of steps and tools used in scFlow in the supplementary text which should address the reviewer’s concerns about justifying the analysis pipeline. Moreover, we identified the cause of the nuclei count differences caused by the two processing approaches, namely on filtering our nuclei with large proportions of mitochondrial reads and discuss their effect in paragraph 3 (also see Supplementary Figure 2).

      A major contribution is the use of the authors' own inhouse pipeline for data preparation (scFLOW), but this software is unpublished since 2021 and consequently not yet refereed. It isn't reasonable to take this pipeline as being validated in the field.

      We believe our answer to the previous point addresses these concerns - We have added references supporting the choice of steps and tools used in scFlow in the supplementary text which should address the reviewer’s concerns about justifying the analysis pipeline. Moreover, as a result of the pipeline we identified that 16,000 of the nuclei kept by the authors are likely of low quality and indicative of cell death with high mitochondrial read proportions5,6.

      They also worry that the significant findings in Mathys' paper are influenced by the number of cells of each type. I'm sure it is since power is a function of sample size, but is this a bad thing? It seems odd that their approach is not influenced by sample size.

      We thank the reviewer for highlighting this point. As they noted, we conclude that the original authors number of DEGs is just a product of the number of cells. However, the reviewer states that ‘It seems odd that their approach is not influenced by sample size’. An increase in the number of cells is not an increase in sample size since these cells are not independent from one another - they come from the same sample. Therefore, an increase in the number of cells should not result in an increase in the number of DEGs whereas an increase in the number of samples would. This point is the major issue with pseudoreplication approaches which over-estimate the confidence when performing differential expression due to the statistical dependence between cells from the same patient not being considered. See these references for more information on this point1,2,7,8. We have added a discussion of this point to our manuscript in paragraph 6.

      Moreover, recent work has established that the genetic risk for Alzheimer’s disease acts primarily via microglia9,10. Thus, it would be reasonable to expect that the majority of large effect size DEGs identified would be found in this cell type. This is what we found with our pseudobulk differential expression approach – 96% of all DEGs were in microglia. We have updated the text of our manuscript (paragraph 5) to highlight this last point.

      References 1. Murphy, A. E. & Skene, N. G. A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis. Nat. Commun. 13, 7851 (2022).

      1. Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).

      2. Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 11, 6077 (2020).

      3. Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).

      4. Ilicic, T. et al. Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016).

      5. Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).

      6. Zimmerman, K. D., Espeland, M. A. & Langefeld, C. D. A practical solution to pseudoreplication bias in single-cell studies. Nat. Commun. 12, 738 (2021).

      7. Lazic, S. E. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci. 11, 5 (2010).

      8. Skene, N. G. & Grant, S. G. N. Identification of Vulnerable Cell Types in Major Brain Disorders Using Single Cell Transcriptomes and Expression Weighted Cell Type Enrichment. Front. Neurosci. 0, (2016).

      9. McQuade, A. & Blurton-Jones, M. Microglia in Alzheimer’s disease: Exploring how genetics and phenotype influence risk. J. Mol. Biol. 431, 1805–1817 (2019).

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      The findings of this article provide valuable information on the changes of cell clusters induced by chronic periodontitis. The observation of a new fibroblast subpopulation, named AG fibroblasts, is interesting, and the strength of evidence presented is solid.

      We thank the Reviewing Editor and the Senior Editor for the positive assessment and strong support for our study.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this article, the authors found a distinct fibroblast subpopulation named AG fibroblasts, which are capable of regulating myeloid cells, T cells and ILCs, and proposed that AG fibroblasts function as a previously unrecognized surveillant to orchestrate chronic gingival inflammation in periodontitis. Generally speaking, this article is innovative and interesting.

      We truly appreciate this public review.

      Reviewer #2 (Public Review):

      This study proposed the AG fibroblast-neutrophil-ILC3 axis as a mechanism contributing to pathological inflammation in periodontitis. In this study single-cell transcriptomic analysis was performed. But the signal mechanism behind them was not evaluated.

      The authors achieved their aims, and the results partially support their conclusions.

      We agree that we must conduct future studies to evaluate our hypothesis.

      The mouse ligatured periodontitis models differ from clinical periodontitis in human, this study supplies the basis for future research in human.

      This is an important subject. We have previously expressed a concern on the mouse ligature model that the microbial composition of the mouse ligature did not mirror the human oral microbial composition. Therefore, we developed the maxillary topical application (MTA) model, in which human oral biofilm was directly applied to the maxillary gingiva. In this study, the newly developed MTA model was further dissected by single cell RNA seq, which revealed that the extracellular substances of human oral biofilm might be an important trigger of gingival inflammation. RESULT has been revised.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the authors' efforts. I think it would be much better to simplify INTRODUCTION.

      INTRODUCTION has been simplified as suggested.

      Reviewer #2 (Recommendations For The Authors):

      1. Many host cells participate in immune responses, such as gingival epithelial cells. AG fibroblast is not the only cell involved in the immune response, and the weight of its role needs to be clarified. So the expression in the conclusion should be appropriate.

      RESPONSE: We agree with this comment. Our study identified the AG fibroblast–neutrophil–ILC3 axis as a previously unrecognized mechanism which could play an additional role in the complex interplay between oral barrier immune cells.

      1. The main results should be included in the Abstract.

      Abstract has been revised.


      The following is the authors’ response to the original reviews.

      We thank all reviewers for constructive critiques. We plan to perform new experiments and revise our manuscript accordingly. The text and Figures are currently undergoing the revision process. Below highlights our revision plan.

      eLife assessment

      The findings of this article provide valuable information on the changes of cell clusters induced by chronic periodontitis. The observation of a new fibroblast subpopulation, which was named as AG fibroblasts, was quite interesting, but needs further evidence. The strength of evidence presented is incomplete.

      We discovered a new subpopulation of gingival fibroblasts, named AG fibroblasts, using non-biased single cell RNA sequencing (scRNA-seq) of mouse gingival samples undergoing the development of ligature-induced periodontitis. AG fibroblasts exhibited a unique gene expression profile: [1] constitutive expression of type XIV collagen; and [2] ligatureinduced upregulation of Toll-Like Receptors and their downstream signals as well as chemokines such as CXCL12. Thus, we have hypothesized that AG fibroblasts initially sense the pathological stress including oral microbial stimuli and secrete inflammatory signals through chemokine expression.

      The current manuscript examined the relationship between AG fibroblasts and oral barrier immune cells focusing on the chemokines and other ligands derived from AG fibroblasts and their putative receptors in those immune cells. Using scRNA-seq data mining programs, our data demonstrated the compelling evidence that AG fibroblasts should play a critical role in orchestrating the oral barrier immunity, at least at the early stages of periodontal inflammation.

      We agree that it is important to explore the functional/pathological role of AG fibroblasts. In this revision, we further investigated the role of TLRs in the pathogen sensing mechanism of AG fibroblasts. To accomplish this goal, we applied a newly developed mouse model in which mice were exposed to the maxillary topical application (MTA) of oral microbial pathogens without the ligature placement. With 1 hr exposure with human oral biofilm, not with planktonic microbiota, the mice maxillary tissue exhibited measurable degradation as evidenced by the activation of cathepsin K. To dissect the role of TLRs, we applied the putative stimulants of TLR9 and TLR2/4 using the discrete MTA model. The scRNA-seq from the MTA model revealed that the application of unmethylated CpG oligonucleotide and P. gingivalis lipopolysaccharide (LPS), respectively, induced the activation of chemokines by AG fibroblast.

      The revised manuscript reported this critical data with the detailed information. As such the additional figures and corresponding results, discussion and materials & methods were included.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this article, the authors found a distinct fibroblast subpopulation named AG fibroblasts, which are capable of regulating myeloid cells, T cells and ILCs, and proposed that AG fibroblasts function as a previously unrecognized surveillant to orchestrate chronic gingival inflammation in periodontitis. Generally speaking, this article is innovative and interesting, however, there are some problems that need to be addressed to improve the quality of the manuscript.

      We appreciate this comment. As suggested, we further investigated the surveillant function of AG fibroblasts by reanalyzing the scRNA-seq data for stress sensing receptors such as Toll-Like Receptors (TLR). In the revision, we addressed the role of TLR in the activation of AG fibroblasts using a newly developed mouse model employing the maxillary topical application (MTA) of putative TLR stimulants. The new information clearly demonstrated that AG fibroblasts play a pivotal role as the surveillant and translating the pathogenic stimulants to oral barrier inflammation through chemokine expression.

      Reviewer #2 (Public Review):

      This study proposed the AG fibroblast-neutrophil-ILC3 axis as a mechanism contributing to pathological inflammation in periodontitis. However, the immune response in the vivo is very complex. It is difficult to determine which is the cause and which is the result. This study explores the relevant issue from one dimension, which is of great significance for a deeper understanding of the pathogenesis of periodontitis. It should be fully discussed.

      We appreciate this comment. We expanded the current understanding of oral immune signal communication in Discussion and highlight how AG fibroblast may fit to it. To address this question, we expanded our investigation in the pathological signal detection by AG fibroblasts by employing the newly developed maxillary topical application (MTA) model. The revised manuscript contains the new information and expanded the discussion in the context of complex immune response.

      Reviewer #1 (Recommendations For The Authors):

      Detailed comments are listed below:

      Abstract:<br /> I am confused about the expression of "human periodontitis-like phenotype". How does the authors define this concept? Periodontitis is a complex disease, despite that alveolar bone resorption is a typical manifestation of periodontitis, its characteristics remain to be further studied. I hope the authors can provide some detailed information about this concept or describe it in another way.

      This is an important comment. Radiographically, human periodontitis is diagnosed by alveolar bone resorption from the cervical region, not from root apex. To highlight this, we present dental radiographs of human periodontitis as supplementary information. However, we agree with this comment, our statement should be limited to alveolar bone resorption pattern in Rag2KO and Rag2gcKO mice. Abstract be revised accordingly.

      Introduction:<br /> It is recommended to simplify the first to third paragraphs, and briefly explain the functions of various types of cells in different stages of periodontitis, as well as the role of different cluster markers play across the time course of periodontal inflammation development.

      Following this recommendation, INTRODUCTION has been simplified.

      Results:<br /> 1. It is recommended to add HE staining and immunohistochemistry staining to observe the inflammation, tissue damage, and repair status from 0 to 7 days, so that readers can understand cell phenotype changes corresponding to the periodontitis stage. The observation index can include inflammation and vascular related indicators.

      As recommended, representative histological figures were included. We further performed new immunohistochemistry experiment of mouse gingival tissue (D0, D1, D3, D7) highlighting the infiltration of CD45+ immune cells. We found that inflammatory vascular formation in the H&E histology, which was highlighted. To characterize the tissue damage, the histological sections were stained by picrosirius red to highlight the change in collagen connective tissue of PDL and gingiva.

      1. Figure 1A-1D can be placed in the supplementary figure.

      Combining the new data above, Figure 1 was revised as suggested.

      1. I suggest the authors to put the detection of the existence of AG fibroblasts before exploring its relationship with other types of cells.

      2. The layout of the picture should be closely related to the topic of the article. It is recommended to readjust the layout of the picture. Figure 1 should be the detection of AG cells and their proportion changes from 0 to 7 days. In other figures, the authors can separately describe the proportion changes of myeloid cells, T cells and ILCs, and explored the association between AG fibroblasts and these cell types.

      As suggested, the presentation order of Figures and text was revised to bring the information about AG fibroblasts first. The chemokine-receptor analysis was moved below.

      1. Please provide the complete form of "KT" in Line 162.

      KT fibroblasts (fibroblasts keeping typical phenotype) was described in the text.

      Methods:<br /> It is recommended to separately list the statistical methods section. The statistical method used in the article should be one-way ANOVA.

      A separate statistical method section is created. As pointed out, we used one-way ANOVA with post-hoc Tukey test (when multiple groups were compared).

      Discussion:<br /> I suggest the authors remove Figures 3-6 from the discussion section. For example, in Line 283, "(Figure 3 and 4)" should be removed.

      Revised as suggested.

      Reference:<br /> Some information for the references is missing. For example, "Lin P, et al. Application of Ligature-Induced Periodontitis in Mice to Explore the Molecular Mechanism of Periodontal Disease. Int J Mol Sci 22, (2021)" should be "Lin P, et al. Application of Ligature-Induced Periodontitis in Mice to Explore the Molecular Mechanism of Periodontal Disease. Int J Mol Sci 22, 8900 (2021)". It is necessary to recheck all references.

      The reference has been checked for the accuracy and the omission pointed out was corrected. Although we used EndNote program, we found some more inaccuracy in the references that were manually corrected. We appreciate your suggestion.

      Reviewer #2 (Recommendations For The Authors):

      1. Many host cells participate in immune responses, such as gingival epithelial cells. AG fibroblast is not the only cell involved in the immune response, and the weight of its role needs to be clarified. So the expression in the conclusion should be appropriate.

      Following this critique, we revised INTRODUCTION, DISCUSSION and CONCLUSION, to highlight how AG fibroblasts function within a comprehensive immune response network.

      1. This study cannot directly answer the issue of the relationship between periodontitis and systemic diseases.

      We agree with this critique. We either deleted or de-emphasized the relationship between periodontitis and systemic diseases throughout the text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cell death plays a critical role on regulating organogenesis. During tooth morphogenesis, apoptosis of embryonic dental tissue plays critical roles on regulating tooth germ development. The current study focused on ferroptosis, another way of cell death which has rarely been investigated in tooth development, and showed it may also play an important role on regulating the tooth dimension. The topic is novel and interesting, but the experimental design has many flaws which significantly compromised the study.

      1. The entire study was based on ex vivo tooth germ explant culture. Mandibular tooth germs of E15.5 (bell stage) were isolated for ex vivo culture. Most tooth germ explant culture experiments were actually using tooth germ of much earlier stages (E11.5-E13.5) for organ culture. After E16.5, both the large size and initially formed enamel/dentin could prevent nutrition from penetrating inside. Also, using tooth germ of earlier stage will help identify impact of ferroptosis upon early tooth development.

      2. Due to limited penetration, the ex vivo culture in the study lasted for no more than 5 days. I would recommend the authors to perform kidney capsule transplantation as an alternative approach, which can support tooth germ development much longer even into root formation.

      3. The major justification of using tooth germ ex vivo culture as the model in the study was to "conduct high-throughput analysis". However, the study could hardly be qualified as a high-throughput analysis. I would recommend the authors perform RNA sequencing for comparing tooth germs before/after erastin treatment. Such experiments won't take too much time or resource.

      We are grateful for the insightful feedback on our ex vivo tooth germ culture model. We initially chose the E15.5 tooth germ over earlier stages due to peak Gpx4 expression and iron accumulation during molar development, which occurs between E15.5 and E17.5 (Figure 1A & 1B). This period may be the most sensitive to ferroptotic stress during tooth development. Our experiments also demonstrated that the tooth germ displays robust growth after seven days of ex vivo cultivation (Figure supplement 1B).

      Kidney capsule transplantation is indeed an ideal method for ex vivo tooth germ culture. However, in our studies, we used erastin – a classic ferroptosis inducer – which exhibits instability in vivo, thereby constraining our investigation using kidney capsule transplantation.

      Our results about Gpx4 expression in the tooth germ during development (Figure 1A) showed a spatiotemporal pattern. This pattern suggests that bulk RNA sequencing of the tooth germ might not yield accurate revelations about changes in ferroptosis-related genes. We are presently using transgenic mice to further study the impact of excessive in vivo ferroptotic stress on tooth development. In these experiments, we intend to conduct single-cell RNA sequencing to explore detailed alterations in the tooth germ.

      1. Although the study mostly used molars as the model, the in vivo iron concentration was only demonstrated on incisors, but not molars (Figure 1).

      We have updated Figure 1B to include images of molars, which illustrate the accumulation of iron during molar development. The iron concentration peaks at E17.5, then decreases at PN0. Interestingly, unlike Gpx4 expression, iron accumulation rebounds at PN3. To gain a more accurate understanding, further in vivo studies utilizing transgenic mice are required.

      1. Phenotype analysis in Figure 2 is too superficial. Only dimensional information was provided. Cusps number, cusps distribution pattern and rooth/furcation formation were not evaluated. Differentiation of ameloblast/odontoblast was not evaluated. The proliferation rate in the dental epithelium/mesenchyme was not analyzed.

      The cusps number/distribution pattern are not influenced by erastin treatment in recent model (Figure 2A & 2C). Recent ex vivo culture model of tooth germ is unable to investigate the possible function of ferroptotic stress in rooth/furcation formation since it mainly initiates from PN4 to PN7. The proliferation and differentiation of dental epithelium/mesenchyme will be analyzed using transgenic mice in vivo.

      1. Low magnification images should be included in Figure 3 to display the entire tooth germs.

      The emission spectrum of recent utilized iron probe will extend due to increasing concentration of iron. This property makes the counter staining of tissue samples unavailable. The structure of the ex vivo cultured tooth germ could only be recognized in high magnification. The calculation could represent the entire alternation.

      1. In Figure 4, does ferroptotic inhibitor eliminate the iron accumulation in the tooth germ? How about the expression level of several target genes shown in Figure 3?

      In Fig 5, Fer-1 reduced the iron accumulation in tooth germ. Different inhibitors suppressed ferroptosis via different ways, Lip-1 mainly inhibits lipid peroxidation, DFO is an iron chelator which reduces the labile iron pool, Fer-1 is reported to both inhibit lipid peroxidation and reduce the labile iron pool, their functions to the accumulation of iron might be varied. The core risk factors of ferroptosis are lipid peroxidation and iron accumulation, thus in Fig 5, we analyzed the expression of 4HNE and the accumulation of iron to illustrated the suppression o ferroptosis instead of detecting several regulatory genes.

      1. The manuscript has many typos and grammar mistakes. All "submandibular" should be simply "mandibular". "eastin" should be "erastin" (line 92). "partly" should be "partially" (line 611).

      We addressed all the gramma and typo errors.

      Reviewer #2 (Recommendations for The Authors):

      This is a very well done study. However, writing is absolutely substandard. The authors should check and review extensively for improvements to the use of English. This is not just about language but also about style of the paper and presentation. As written, the abstract is not concise at all, and the overall logic of the study is not well presented. Currently, the abstract reads like another introduction.

      We improved our presentation.

      Reviewer #3 (Recommendations for The Authors):

      This is an interesting work reporting ferroptosis that is involved in the tooth morphogenesis. The authors showed that Gpx4, the core anti-lipid peroxidation enzyme in ferroptosis, is upregulated in tooth development using ex vivo culture system. They convincingly demonstrated that ferroptosis, but apoptosis, was present in tooth morphogenesis. The findings are interesting and novel. The work represents one of the earliest works studying Ferroptosis in tooth morphogenesis. There are several minor concerns.

      1) The abstract is too long and should be shortened.

      We modified the abstract to make it concise.

      2) Can the Gpx4 quantitatively be measured by qRT-PCR?

      3) How is Gpx4 regulated during development? If unknown, the authors should discuss it at least

      4) Are there any tooth developmental defects associated with ferroptosis? If there is one, the authors should discuss it.

      Our research on Gpx4 expression in the tooth germ during development (Figure 1A) highlights a specific spatiotemporal pattern. This pattern suggests that bulk RNA sequencing of the tooth germ may not provide accurate insight into changes in ferroptosis-related genes.

      The developmental role of Gpx4 had been studied even before the ferroptosis was formally described (before 2012). In situ hybridization indicated expression of Gpx4 in all developing germ layers during gastrulation and in the somite stage in the developing central nervous system and in the heart, which made Gpx4 (-/-) mice die embryonically in utero by midgestation (E7.5) and are associated with a lack of normal structural compartmentalization. Specific deletion of Gpx4 during developmental process were found to participate in the maturation and survival of cerebral and photoreceptor cell. Recent years, more ferroptosis related function of Gpx4 were discovered in neutrophil and chondrocyte of adult mice, in which specific deletion will lead to ferroptosis-induced organ dysregulation and degeneration.

      At present, no systematic study has been conducted on ferroptosis or ferroptotic stress in relation to tooth developmental defects. However, as early as the 1930s, pioneering dental biologists had already identified the presence of iron in the teeth of various animals. They also found that some enamel defects in mice were related to abnormal iron metabolism. Lipid metabolism and lipid peroxidation, which are other key risk factors of ferroptosis, were also described in the initial stages of dental biology research.

      We are currently generating transgenic mice with dental epithelium/mesenchymal specific deletions of Gpx4. This will allow us to further investigate the developmental defects related to ferroptosis and ferroptotic stress.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors performed an RNAi screen to identify epigenetic regulators involved in oxygen-glucose deprivation (OGD)-induced neuronal injury using immortalized mouse hippocampal neuronal cell line HT-22. They identified PRMT5 as a novel negative regulator of neuronal cell survival after OGD. Both in vitro and in vivo experiments were then performed to evaluate the roles of PRMT5 in OGD and ischemic stroke-induced injury. The authors found that genetic and pharmacological inhibition of PRMT5 protected against neuronal cell death in both in vitro and in vivo models. Furthermore, they found that in response to OGD and ischemia, PRMT5 was translocated from the cytosol to the nucleus, where PRMT5 bound to the chromatin and promoter regions of targeted genes to repress the expression of downstream genes. Further, they showed that silencing PRMT5 significantly altered the OGD-induced changes for a large-scale of genes. In a mouse model of middle cerebral artery occlusion (MCAO), PRMT5 inhibitor EPZ015666 protected against neuronal death in vivo. This study reveals a potential therapeutic target for the treatment of ischemic stroke. Overall, the authors have done elegant work showing the role of PRMT5 in neuronal cell survival. However, the essential mechanisms underlying PRMT5 nuclear translocation have not been investigated, and the in vivo animal studies should be further strengthened.

      Thank you very much for your comments and suggestions. While stroke stands as the second leading cause of death globally, and the burden of post-onset disability is substantial, particularly surging at a faster rate in low- and middle-income countries compared to high-income countries. The exploration of new drugs for stroke treatment holds profound societal implications. The concept of neuroprotective drug development is not novel; over the past half-century, considerable research and resources have been invested in this field. Yet, progress appears to be notably limited, and interest is currently waning.

      Our research team is dedicated to devising rapid and cost-effective functional screening strategies grounded in the nervous system. Through this forward research approach, we aim to delve into potential neuroprotective targets across various neurological diseases. This endeavor not only bears significance for acute stroke but also holds potential application value for a spectrum of generalized nerve injuries.

      Building on your insights, our upcoming studies will involve in vivo animal experiments, integrating the PRMT5 nuclear translocation mechanism. We anticipate that our continued research will benefit from further professional insights and guidance from your expertise.

      Reviewer #2 (Public Review):

      Haoyang Wu et al. have shown that the symmetric arginine methyltransferase PRMT5 binds to the promoter region of several essential genes and represses their expression, leading to neuronal cell death. Knocking down PRMT5 in HT-22 cells by shRNA leads to pertinent improvement in cell survival after oxygen-glucose deprivation (OGD) conditions. In another set of experiments, inhibition of the catalytic activity of PRMT5 by a specific inhibitor, EPZ015666, in a middle cerebral artery occlusion (MCAO) mice model also showed protective effects against neuronal cell death. In this manuscript, the authors have established the negative role of PRMT5 in cerebral ischemia both in vitro and in vivo.

      However, my primary concern is the novelty of the manuscript. It has already been reported that inhibition of PRMT5 attenuates cerebral ischemia/reperfusion condition (Inhibition of PRMT5 attenuates cerebral ischemia/reperfusion-induced inflammation and pyroptosis through suppression of NF-κB/NLRP3 axis. Xiang Wu et al. Neuroscience Letters, Volume 776, 2022, 136576, ISSN 0304-3940, https://doi.org/10.1016/j.neulet.2022.136576.). Even these authors have also shown that treatment of PRMT5 specific catalytic inhibitor, LLY-283, could rescue ischemia-induced over-expression of inflammation-related factors.

      However, it would be better to verify the specificity of the inhibitor, EPZ015666, using other methyltransferases to be sure that the rescue is indeed mediated by PRMT5 catalytic inhibition.

      Thank you sincerely for dedicating time from your busy schedule to review our papers. Your comments and suggestions hold immense value for us, contributing significantly to the enhancement of our work. We acknowledge with honesty that this research journey has been a prolonged and challenging experience.

      The major functional study, as indicated by the CHIP-seq data record, was concluded between 2017 and 2019. Since then, our efforts and resources have been devoted to conducting in-depth mechanism and regulation research for PRMT5. Notably, PRMT5 is involved in 4-5 types of histone arginine methylation, and it plays a role in complex modification effects for proteins in the cytoplasm. Despite employing a variety of investigative methods, understanding and controlling these intricate mechanisms in experimental design have proven quite challenging. This not only places us at a disadvantage compared to some competitors but also hinders the creative potential of our lab team.

      We firmly believe that there is ample room for further research on the role of PRMT5 in the nervous system. We aspire to collaborate with other research teams to explore this area collectively.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors use an OT setup to measure the DNA gripping and DNA slipping dynamics of phage lambda terminase motor interaction with DNA. They discover major differences in the dynamics of these two events, in comparison to the phage T4 motor, which they previously investigated. They attribute these differences to the presence of the TerS (small terminase) subunit of the motor complex of phage lambda in addition to the TerL (large terminase) subunit in phage, while in T4 only the TerL subunit is present. By exposing the stalled phage lambda procapsid-DNA complex (stalled with ATP-gammaS) to solutions containing 1) no nucleotide, 2) poorly hydrolyzed ATP, and 3) ADP, they found that the gripping persistence is strongest with ATP, weaker with ADP, and weakest with no nucleotide. This demonstrates nucleotide-dependent DNA gripping and friction of the motor. However, both persistence of gripping and friction are dramatically stronger than in the T4 TerL motor, due to the presence of the TerS subunit. While TerS was believed to be essential for the initiation of packaging in vivo, its role during DNA translocation was unclear. This study reveals the key role played by TerS in DNA gripping and DNA-motor friction, highlighting its role in DNA translocation where TerS acts as a "sliding clamp".

      The study also provides a method to investigate factors affecting the stability of the initiation complex in viral packaging motors.

      Strengths:

      The experiments are well carried out and the conclusions are justified. These findings are of great significance and advance our understanding of viral motor function in the DNA packaging process and packaging dynamics.

      Weaknesses:

      While the collected OT data is quantitative, therefore is no further quantitative analysis of the motor packaging dynamics with regard to different motor subunit functions and the presence of nucleotides.

      We thank the reviewer for the feedback and we will address the additional recommendations in a revised manuscript. Regarding the comment about quantitative analysis of the packaging dynamics, we emphasize that the present study focuses only on analysis of the grip/slip dynamics in the absence of ATP, since we have already studied the packaging dynamics (DNA translocation dynamics) with ATP in prior studies (refs 34, 35, 39-43). Note that in the present paper we do relate the present studies to these prior studies (such as on p. 7-8 regarding the mechanism of DNA gripping/release during translocation, on p. 8 regarding the finding that the T4 motor (without TerS) exhibits more frequent slipping during packaging, and on p. 8-9 regarding the cause of pauses during packaging).

      Reviewer #2 (Public Review):

      Summary:

      In their paper Rawson et al investigate the nanomechanical properties of the lambda bacteriophage packaging motor in terms of its ability to allow either the slippage of DNA out of the capsid or exerting a grip on the DNA, thereby preventing the slipping. They use a fascinatingly elegant single-molecule biophysics approach, in which gentle forces, generated and controlled by optical tweezers, are used to pull on the DNA molecule about to be packaged by the virus. A microfluidic device is then used to change the nucleotide environment of the reaction, so that the packaging motor can be investigated in its nucleotide-free (apo), ADP-, and non-hydrolyzable ATP-analog-bound states. The authors show that the apo state is dominated by DNA slippage which is impeded by friction. The slippage is stochastically halted by gripping stages. In ADP the DNA-gripped state becomes overwhelming, resulting in a much slowed DNA slippage. In non-hydrolyzable ATP analogs, the DNA slippage is essentially halted and the gripped state becomes exclusive. The authors also show that the slipping and gripping states are controlled not only by nucleotides but also by the force exerted on DNA. Altogether, DNA transport through/by the lambda-phage packaging motor is regulated by nucleotides and mechanical force. Furthermore, the authors document an intriguingly interesting DNA end-clamping mechanism that prevents the DNA from slipping entirely out of the capsid, which would make the packaging process inefficient even on the statistical level. The authors claim that their findings are likely related to the function of a small terminase subunit (TerS) in the lambda-phage motor, which may act as a sliding clamp.

      Strengths:

      Altogether this is a very elegantly executed, thought-provoking, and interesting work with numerous significant practical implications. The paper is well-written and nicely documented.

      Weaknesses:

      There are really no major weaknesses, apart from a few minor issues detailed below in my recommendations.

      We thank the reviewer for the feedback and we will address the minor issues in a revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We have substantially revised our manuscript based on the extensive and highly constructive comments of the reviewers. We have included new data, refined existing data, and revised the text. To do this, some figures had to be split and several figures had to be renumbered. The additional experiments presented at the end of the Results also led us to expand our discussion of current limitations of our story.

      Recommendations for the authors

      Reviewer #1:

      To improve the manuscript, I have some recommendations for the authors.

      1) The cell size was quantified using flow cytometry (forward scatter). While this approach provides a convenient way to measure cell size, it is only a relative way to compare the cell size. A 10% increase in FSC value does not necessarily mean a 10% increase in diameter, this depends on the instrument. Consequently, the claims of density changes such as based on the panel 5B may be incorrect. It would be useful also to perform some experiments with Coulter Counter or imaging based quantification of cell size.

      We agree and this is precisely why we had also measured cell diameters by imaging (reported at the bottom of page 7 and figure supplement 1D in the initial version of the manuscript). In the revised manuscript, we have added a cautionary note in the same context. Regarding density changes, those measurements by FRAP are independent of assumptions about cell diameter. When cell density is down and cells are larger by whatever factor, one can safely conclude that total protein did not scale.

      2) When the Hsp90a/b KOs are introduced on page 9, it would be helpful to know at this stage whether the double KO cells are viable to understand why the individual KOs rather than double knockout cells were used.

      We have now added a statement to indicate that total Hsp90 KOs are not viable in eukaryotes.

      3) How the following can be reconciled with previous work is a bit unclear and needs some clarification: Neurohr et al 2019 identifies cytoplasmic dilution in larger cells, but in this manuscript WT cells maintain the same cytoplasmic density while becoming larger under chronic stress while the Hsp90 KO cells have reduced cytoplasmic density. Does this mean that the cytoplasmic dilution does not relate to cell size but is indirect and related to heat stress? Or is this related to uncoupling of cell size and density only in excessively large cells as for example HEK cells only increase their diameter by 30% based on the flow cytometry analysis?

      Yes, indeed, beyond a certain threshold, excessively enlarged cells cannot scale protein anymore. In the revised manuscript, we not only look at cells exposed to stress for much longer (up to one month) (see last paragraph of revised Results). These cells become even bigger, and in agreement with Neurohr and colleagues, we find that protein scaling breaks down.

      4) Related to the previous, the authors state that "Hsp90 levels rather than a specific isoform are critical for maintaining the cytoplasmic density", but there is no direct evidence connecting Hsp90 levels to cell size. Given the number of proteomics experiments done in this work, can a correlation between Hsp90 levels and cell size/cell density be identified? Or is this related to the way cell size is increased in chronic stress as later the authors say that with the CDK4/6 inhibitor Hsp90α/β KO cells can scale the total protein.

      We have previously determined total Hsp90 levels quantitatively by mass spec (Bhattacharya et al., 2022; see Figure S8 there) (now explicitly mentioned in the same context as our revision related to point #2, see above), and we have now also added the quantitation, including that of total Hsp90 levels, in what is now Figure 9.

      5) Page 17 states "Hsp90α/β KO cells increase cell size while translation is still reduced. Thus, cell size and translation must be coupled for adaptation to chronic stress." This feels like an important conclusion of the paper, yet the direct evidence is rather limited and the authors are clearly not sure how the Hsp90 KO cells increase their size without increasing the translational capacity. Yes, a potential explanation is provided immediately afterward as the authors show that Hsp90α/β KO cells subjected to chronic HS also have reduced proteasomal activity. Reducing protein degradation allows cells to gain more protein even if the synthesis rate does not increase (steady-state protein levels is a balance between synthesis and degradation). As stated by the authors in the discussion, the KO cells "fail to couple cell size increase to translation" simply because they can increase total protein, and cell size, by reducing protein degradation.

      Yes, reducing protein turnover might be a viable strategy, but here, reduced protein degradation in the Hsp90 KOs is clearly not enough since total protein levels cannot keep up with the cell size increase.

      6) What is unclear to me is to what extent these results (where chronic heat stress increases cell size and cells proliferate) relate to large senescent cells which are arrested. The discussion speculates that a failure to adapt to stress leads to aging, but direct evidence is lacking.

      Even though we feel quite strongly that (some) speculation should be allowed, we now provide more direct evidence for senescence (see Figure 10 of revised manuscript and corresponding text). Moreover, we had already demonstrated in Bhattacharya et al. 2022 that senescence is triggered by below-threshold levels of Hsp90 (i.e. cells express senescence markers). But note that senescence is only manifest upon prolonged exposure to chronic mild stress, and that our standard protocol for chronic mild stress was established in such a way as to avoid much of an effect on viability and proliferation (see Figure 1). So no, at least for wild-type cells, except for the experiments of Figure 10, what we studied are not large senescent and arrested cells.

      7) The clarity and content of the figures need some improvement. For example, in Fig 1, it is difficult to see the small symbols specifying the cell lines as the replicates are often overlapping. The font for p values is also too small. For Fig 2, legend says "the statistical significance between the groups was analyzed by two-tailed unpaired Student's t-tests." but there are no statistics shown. The use of statistical testing is also inconsistent across different figures and panels, for example Fig 3 A vs 3C and 5A vs 5H. In Fig 4. the legend talks about p-value, but y axis in panels is q value. The authors need to clarify this by mentioning that these are adjusted p values. Fig 7. should also explain "Rapa" in the legend or state "Rapamycin" in the figure.

      To avoid overloading figures further with enlarged text, we prefer not to increase the font size of the p-values, and for graphs where data points are too small or overlap, we remind the reviewer that all original data will be available with the paper (and linked to from each figure). For Figure 2, we removed the indicated orphaned statement. We've now added stats for Figure 3C, and double checked all others; note that in most cases where the differences are really obvious, we did not add p-values. Wherever there were q-values as Y axis, we have now also added the term "adjusted p-value" in the legend. As for "Rapa", it was and still is defined in the legend.

      8) The data in Fig 5A looks curious as the 39C response is bimodal suggesting that only some cells adapt to the heat stress or could this be a technical issue with the measurements?

      The reason for this is that the data points are from 2 independent experiments. This means that the measurements were done on different days with a microscope that had to be calibrated again and may have been in a slightly different mode. This is not uncommon with this type of data. As an example of that, please see Fig. 3C of Persson et al. Cell 183:1572-1585 (https://doi.org/10.1016/j.cell.2020.10.017).

      Reviewer #2:

      Specific comments for authors:

      Major comments:

      1. Fig. 1F: if cells are not split for 7 days than they start growing in multi-layers. The density within a plate affects their proliferation rate as well as their translation rate. Therefore, a proliferation curve (with counting) when cells are kept for the duration of the 7 day experiment at sub-confluent density (ideally <90%) would be much more informative in this case, and also help to understand the dynamics within the timecourse. For example, if initially there is cell cycle arrest (at day1, as shown in Fig. 1d), then proliferation rates should reflect that.

      See next point.

      1. On a more general note: What is the confluence of the 4-7day experiments? Initial density can change the cell's behavior not only for RPE cells (as shown in fig. 7e), but HEK cells are sensitive to that as well. It is critical that experiments for translation, protein content, cell size, etc. be done in sub-confluent conditions, as the over-confluency alone could be a confounder for cell size, translation rates, etc. If this is indeed the way it was done, this should be clarified. Otherwise, this is a critical confounder which should be eliminated.

      The risk of the confounding effects of overcrowding is indeed an important point, which we avoided, unfortunately without explicitly mentioning it in the manuscript (assuming that it went without saying). While we had already mentioned the seeding density and type of plate in Materials and Methods, we now address it explicitly both with additional data (new figure supplement 1B) and clarifying additions in the text. In our experience, the most common problem with confluent plates is not that cells grow on top of each other, but that they come off the plate and die. Regarding the cell cycle analysis of Fig. 1D and the proliferation assays of Fig. 1G, note that in the latter, we standardized cell numbers to those of day 1.

      1. The speculations about the link to aging and senescence are very interesting, however since these are only hypothesis at this stage, the current phrasing in the abstract is a bit misleading. In fact, I was expecting at least one experiment to deal with aging/senescence, primed by the abstract.

      You are perfectly correct. We have now added new experimental evidence that shows cells display activity of the senscence marker SA-βgal after prolonged chronic stress (Figure 10). Please see our response to point #6 of reviewer #1 for further comments.

      1. Fig. 2D - nuclei are also getting much larger - what is the contribution of the nuclear increase to the overall cell increase? Does it scale linearly? Or does it contribute more/less compared to the entire cell?

      Good point! We now include additional data on nuclear size in Figure 2E and figure supplement 2D, and corresponding additions in Results and Discussion. And as you correctly spotted, nuclei become bigger, too. The data suggest that the ratio of cytoplasm to nuclear size is more or less maintained. One can speculate that nuclei are larger because of partial "unfolding" (opening) of chromatin, which might very well be driven by the activation of Hsf1. But that's for future studies to figure out.

      1. Fig 3a-c: in fig. 2a it looks like the knockout of one isoform leads to a basal increase in the expression of the other. However, since different antibodies are used for alpha and beta, the question of whether this increase leads to complete compensation of the total levels of hsp90 cannot be answered. qPCR for common regions could help answer this question, and this could help explain the increased hsf1 activity in the knockouts.

      As pointed out in response to reviewer #1, point #4, we had previously determined total Hsp90 levels quantitatively by mass spec (Bhattacharya et al., 2022; see Figure S8 there), and we now mention that explicitly. Moreover, we have now added new data including the quantitation of total Hsp90 levels in Figure 9. RT-PCR might not be of much help considering that we had shown in Bhattacharya et al. 2022 that below-threshold Hsp90 levels (even less than what happens here) trigger translation through an IRES in the Hsp90β mRNA, whose levels don't change.

      1. What is the HSE-luc construct used for the hsf1 activity? Is that an artificial HSE? Or the Hspa7 promoter? It would be interesting to check the activity with respect to the hsp90 promoter using a similar assay, to understand whether cells compensation for overall reduction in hsp90 levels is the primary "goal" for hsf1 activation.

      The HSE-luc reporter is an artificial construct (we now clarify this in the Materials and Methods). Although Hsp90 is important, Hsf1's goal in life goes well beyond it. It notably also regulates lots of genes in the absence of stress, notably in cancer cells. Fig. 4B is an example of a blot that shows that chronic stress does not dramatically affect the levels of Hsp90α/β.

      1. The proteomics data are very interesting, however additional details are missing and it is hard to extract them from source data 1. Specifically - focusing on the 2 hsp90s, what do they look like? The compensation questions above could be answered using the proteomics data as well.

      As mentioned above in response to this reviewer's point #5 (and #4 of reviewer #1), we have previously addressed that in a paper that was focused on precisely this issue, and we have adapted the current manuscript accordingly.

      1. How many proteins go up/down in the proteomics data? How does this compare between WT and knockout cells? The authors should detail the specific differences, which pathways? Which proteins? otherwise the volcano plots alone, on their own, are really not informative.

      We have now added a GO analysis (Figure 5C), and heat maps for chaperones/co-chaperones and Hsp90 interactors (new figure supplements 4 and 5). We have still left some volcano plots because they are a good visualization of the overall changes. The text has been revised accordingly, notably also to clarify what we are trying to show with volcano plots (GO analysis and heat maps).

      1. Fig. 3f: cells with hsf1 knockdown even decreased in size after HS. Is this significant? Why could that be?

      The be honest, we do not know. A wild speculation would be that Hsf1 is not only required to drive the cell size increase, but that a certain minimal level of Hsf1 is required to maintain normal cell size (specifically in A549 cells?).

      1. The siHSF1 cells showing no change in cell size is central to the paper's claims. This should be done in HEK293 cells at least, for which much of the data in the paper is shown, preferably also in RPE1 cells.

      We have now added new data with the results obtained with HEK293T cells (Fig. 3F).

      1. Technical note: it is very strange that MAFs can be transfected for luciferase assay. Such primary cells, to my knowledge, are largely non-transfectable. How was transfection performed in these cells? The authors should show that these cells can be transfected using imaging, or give a reference.

      We did both. We gave references and the experimental details in Materials and Methods, but we now say it even more explicitly in there. Note that the transfection efficiency is not so critical in luciferase assays as one only reads out the activities of the transfected cell population.

      1. The claim that proteostasis remains intact and the complexity of the proteome is unchanged should be examined more quantitatively. Specifically, analysis directly comparing between WT and KO cells should be performed: are the induced and repressed proteins the same? Is there a correlation between the levels of significantly changed proteins between WT and KO cells? This analysis should be done for chaperones, hsp90 interactors, as well as for the total proteome. Additionally, proteins whose levels differ could suggest (additional) mechanisms underlying the effects.

      This comment also relates back to point #8. We hope that our newly added comment in the Results section associated with the new heat maps makes it clearer what purpose the proteomic data serve and that it is beyond the scope of this paper to quantitate differences further or to home in on this or that protein (with the exception of those proteins we have done immunoblots for). To go deeper into mechanisms is going to be a full project(s) in itself.

      1. "Surprisingly, we found that Hsp90α/β KO cells do even better than WT cells under basal conditions (37{degree sign} C) (Figure 4D)." This is not so surprising, in light of the fact that HSF1 activity in these cells is higher, thus their chaperoning capacity should be better (for example, more HSP70 present?), as the authors themselves point out later in the text.

      It is surprising considering that there is less of a major molecular chaperone. It's definitely not the first thing you suspect when you knock out Hsp90. But to avoid confusion, we have taken out "surprisingly" and reworded the statement.

      1. "Similarly, Hsp90α/β KO cells might do better than WT cells under chronic HS because of their ability to further increase the levels of other molecular chaperones, such as Hsp27, Hsp40, and Hsp70, during chronic HS." This relates to the point above - the authors can directly quantify the changes in the levels of all other chaperones, since they have the proteomics data, and substantiate these claims, which are now only suggestions.

      The subordinate clause ("... because...") is not a speculation, it is a statement based on the data (Fig. 4B and figure supplement 4A-B, and yes, of course, the proteomic data). However, that KOs indeed do better because of that remains to be proven (hence, the "might do better").

      1. In A549 cells, knockout of Hsp90 led to lower basal diffusion coefficient (proxy for cytosolic density) at normal temperatures. Then, at 40 degrees, it seems that the coefficient goes back to being more or less equal to that of WT cells (fig. S5D). How can the authors explain this?

      One cannot really compare them one on one. After all, the Hsp90 KOs are different cell lines, their EGFP expression levels may differ, and their heat sensitivity definitely differs. What can be compared is cells of a given cell line (i.e. WT or KO), transfected as a pool and then split to be cultured at different temperatures.

      1. P-eIF2alpha and other translation marker western blots should be repeated and quantified and in also performed in A549 KO cells. The latter is very important, as the changed in A549 WT cells during adaptation of all translation regulatory markers: p-eIF2alpha, p-mTOR, and most strikingly total mTOR, are sky-rocketing, while in HEK cells these remain constant. As mTOR is a well-known regulator of cell size, and a target of Hsp90, could it be the major mediator of this effect in A549 cells? And if so, what is the substitute in HEK cells?

      We now include bar graphs with quantitation of multiple experiments for both HEK and A549 cells, including for the KOs (Figure 6C-D - figure supplement 8). What they show is that p-mTOR levels increase during chronic stress. But since overall it also increases in Hsp90α/β KO cells, we had to conclude that this cannot explain the differences between cells of different genotypes. We have added a statement to that effect in the corresponding Results section.

      1. Figs. 5D (and S5F) are both for HEK cells, while Fig. 5H is for A549. The corresponding plots for both cell lines should be provided for clarity, as the magnitudes in 5D and S5F seem much larger in HEK cells than seen in 5H. If there are differences between the cell lines these should be pointed out, as currently, showing some figures for one and not the other is confusing.

      HEK and A549 cells in these experiments, which are different, serve different purposes. We now explicitly mention already in the text of the Results, which cell line is used. Hopefully that makes it less confusing.

      1. Fig. 6C lacks a pvalue.

      It's missing because it cannot be calculated. The graph shows the average of "only" 2 biologically independent samples (as stated in the legend).

      1. Fig. S6C - the legend doesn't match the figure. Additionally, #aggregates should be normalized to the respective #of cells in each micrograph, and p-values should be presented for those normalized values.

      For what is now figure supplement 9C, this has been fixed as suggested.

      1. Also, under non-HS conditions, Hsp90 knockout cells show less aggregates than the WT. Is this significant (numbers are small, so perhaps it isn't)? What does this mean for the basal proteostasis state of Hsp90 knockout cells? Is it perhaps better than that of the WT?

      The suggested way of quantitating the aggregates took care of that. There is no clear difference anymore between WT and KO, but clearly many more aggregates under chronic stress (figure supplement 9C).

      1. The data on the connection between size and survival under chronic stress is highly compelling, even though correlative. The authors speculate in the discussion about one possible explanation to the question of how the enlarged size protect from the chronic stress. In fact, their proteomics dataset has the potential to help address, at least in part, their hypothesis about thresholds of certain proteins, by saying which proteins cross the detectability threshold in the data, and which processes these relate to.

      What the proteomic data say is that most things don't change (standardized to total protein). While it is possible that a few proteins do change in interesting ways, characterizing those is beyond the scope of this study.

      1. Fig. 7G should have a respective quantification with a p-value.

      We have added additional data. What is now Fig. 9 shows the quantitation of multiple biological replicates (with p-values).

      Minor comments:

      1. "it is known that acute HS causes ribosomal dissociation from mRNA, which results in a translational pause (Shalgi et al., 2013)." - This paper showed that acute HS causes ribosomal pausing on mRNAs, not ribosomal dissociation.

      We corrected this.

      1. Fig. 7E - size bar is missing.

      It was actually there, but hard to see. We have improved that in what is now Fig. 8E (and it is now also mentioned in the legend).

      Reviewer #3:

      My main points are outlined in the Public Review. Only a few additional comments are included here:

      1. The manuscript is quite long and there are places where it could be shortened and tightened for clarity. I'd recommend going through carefully and trying to shorten to improve readability.

      We hope that our revisions to address all of the reviewers' comments (and to accommodate more data) make the text more readable. But to make it shorter would have come at the expense of clarity.

      1. It wasn't clear to me that the increased luciferase folding in HSP90 KO lines was surprising. It is demonstrated that knockdown of these isoforms can activate HSF1, which increases many chaperones known to promote luciferase refolding.

      We address this point in our response to point #13 of reviewer #2 (basically: we took out "surprisingly").

      1. Along the same lines. HSP90 knockdown activates HSF1, but doesn't induce basal cell size. However, exogenous overexpression of HSF1 or activation of HSF1 with capsaicin increase cell size. Why are similar things not observed for HSP90 knockdown? Is it the extent of HSF1 activation? This seems a bit unlikely because it looked like activation was similar in KO and capsaicin treated cells.

      This must be due to the specifics of these different assays. The levels of Hsf1 protein and activity, and the time course of Hsf1 activity may be different. Moreover, it is likely that the reporter gene readout does not accurately report on all Hsf1 activities at a genome-wide scale.

      1. As noted above, does HSP90 depletion impact ISR signaling induced by other types of stress (e.g., ER or mitochondrial stress). Specifically, do you see sustained translational attenuation (and eIF2a phosphorylation) when HSP90 is depleted under these conditions. In other words, does HSP90 have a specific role in globally resolving eIF2a phosphorylation as part of the ISR or is that specific to certain types of stress.

      Although we now include data to show that tunicamycin (and therefore presumably the UPR/ISR) also induces a cell size increase, comprehensively analyzing what we refer to as RSR across different types of stresses (including mitochondrial and ER stresses) in the background of different Hsp90 genotypes and cell lines goes well beyond the scope of the current study.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors sought to directly compare the predictions of two models of somatosensory processing: The attenuation model, which states that the sensation of touch on one hand is reduced when it is the predictable result of an active movement by the other hand; and the enhancement model, which states that the sensation of touch is actually increased, as long as the active hand does not receive touch stimultaneously with the passive hand (no double stimulation). The authors achieved their aims, with results clearly demonstrating (1) attenuation in the case of self touch, (2) that previously-observed enhancement is a consequence of the comparison condition (false enhancement), and (3) that attenuation involves predictive mechanisms and does not result simply from double stimulation. These findings, and the methodology, should particularly impact future studies of perceptual attenuation, sensory prediction error, and motor control more generally. The opposite conclusions obtainable by selecting different comparison conditions is particularly striking.

      Experiment 1 affirms that a touch to the passive finger caused by the active finger tapping a force sensor is perceived as weaker (attenuated) compared to a baseline not involving the active finger, but that if double stimulation is prevented (active finger moves, but no contact), neither attenuation nor enhancement occurs. Experiment 2 includes the three original conditions, plus the no-go condition used as a comparison in these earlier studies. Results suggest that the comparisons used by previous studies would result in the false appearance of enhancement. Finally, Experiment 3 tests the hypothesis that the lack of attenuation in the no-contact condition is due to the absence of double stimulation rather than predictive mechanisms. When contact and no-contact trials were mixed in an 80:20 ratio, such that participants would form predictions about the consequence of their active finger movement even if some trials lacked contact. In this case, attenuation was observed for both contact and no-contact trials, supporting the idea that attenuation is related to predictive processes linked to moving the active finger, and is not a simple consequence of double stimulation.

      The methodology and analysis plans for all three experiments were pre-registered prior to data collection. We can therefore be very confident that the results were not influenced by hypotheses developed only after seeing the data. The three experiments were each performed in a new set of participants. Experiments 2 and 3 included conditions that replicated the Experiment 1 effects, allowing us to be very confident that the results are robust.

      While the study has significant strengths, some aspects of the interpretation need to be clarified. In particular, the authors' interpretation depends on the idea that attenuation is absent in the no-contact condition because this action-sensory consequence relationship is an "arbitrary mapping." It is not clear what makes it arbitrary. The self-touch contact condition could also be considered somewhat arbitrary and different from real self-touch; the 2N test force was triggered by the right finger tapping a force sensor. If participants' tapping forces were recorded, it would be useful to include this information, particularly about how variable participants' taps were. In other words, unlike real self-touch, in this paradigm the force of the active finger tap did not affect the force delivered to the passive finger.

      By ‘arbitrary’, we refer to nonecological mappings between a movement and a somatosensory stimulus. In other words, a mapping that does not resemble how one touches their body (natural self-touch). Examples of such arbitrary mappings are moving the right finger in the air and receiving simultaneous touch on the other hand, as in Thomas et al. (2022), or moving a joystick or potentiometer with one hand and receiving a touch on the other hand. These joystick or potentiometer conditions are typically used as a control condition when studying somatosensory attenuation because they include an arbitrary sensorimotor mapping (Shergill et al., 2005, 2003; Teufel et al., 2010; Wolpe et al., 2016).

      We understand the reviewer’s point about the relationship between the forces applied with the right hand and the forces received on the left hand. First, we would like to clarify that we recorded the forces that the participants applied to the sensor in every experiment. We have now added a figure (Figure 3 – figure supplement 3) showing the forces over time across all participants in every experiment, which is referred to in the Methods on Lines 727-730. As we wrote in the Methods (Lines 720-727), and in line with previous studies (Asimakidou et al., 2022; Kilteni et al., 2021; Kilteni and Ehrsson, 2022), we asked participants to tap, neither too weakly nor too strongly, with their right index finger, “as if tapping the screen of their smartphone”. We did so because participants do not have an intuitive sense of how strong a force of 2 N is, and this instruction allowed them to apply forces of similar magnitude from trial to trial while receiving the same touch on their left index finger. Indeed, as shown in Figure 3 – figure supplement 3 (D-F), participants showed low trial-to-trial variability in the applied forces, with an average variability (s.e.m.) of only ± 0.13 N in Experiment 1, ± 0.12 N in Experiment 2 and ± 0.11 N in Experiment 3. In other words, they generated similar forces with their right index finger across all trials while receiving the same force on their left index finger, establishing an approximately constant gain between movement and touch and a perceived causality between the two (Bays and Wolpert, 2008; Kilteni, 2023). Critically, Bays and Wolpert (Experiment 1 in that book chapter) previously showed that the magnitude of attenuation remains unaffected when halving or doubling the gain between the force applied by the active finger and the force delivered on the passive hand as long as the gain remains constant throughout the experiment (Bays and Wolpert, 2008). This should not be surprising given that when one finger transmits a force through an object to another finger, the resulting force also depends on the object's properties (e.g., shape, material and contact area) and the angle at which the finger contacts the object. This is outlined in Lines 733-736 of the manuscript.

      One additional potential weakness is that participants' vision was occluded in Experiment 3, but not in Experiments 1 and 2. The authors do not discuss whether this difference could confound any of the analyses that compare results across experiments.

      We thank the reviewer for the comment. We do not think that blindfolding is a weakness of our study, as we designed our experiment to take this factor into account. Specifically, we blindfolded participants to ensure that they would not know when the force sensor was retracted on (unexpected) no-contact trials. This was essential for establishing an expectation that they would contact the force sensor. Importantly, participants were blindfolded in all conditions of Experiment 3 (contact, no-contact and baseline), so any effect of blindfolding was present across all conditions of Experiment 3. Since in the analyses of Experiment 3 (Lines 342-354), we always compared between conditions, blindfolding per se could not explain any differences between conditions, as any putative effects of blindfolding are effectively removed when contrasting two conditions in which participants were blindfolded. Notably, this argument also applies to the comparisons that we made between Experiment 3 and Experiments 1 and 2, since all these analyses (Lines 362-376) compare the difference between contact and no-contact trials (e.g., PSE values) between the experiments. Once again, any putative effects from blindfolding were effectively removed. We should also emphasize that the participants’ left index finger as well as the motor that delivered the force to their left index finger were occluded from view in Experiments 1 and 2. This was done to prevent participants from using any visual cues to discriminate between the two forces. This is has been included in the Methods section (Lines 772-775).

      In conclusion, blindfolding cannot explain the results of Experiment 3, and it did not alter the interpretation of any of our results derived by comparing the experiments. We have clarified this point in the manuscript (Lines 823-827).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript the authors perform a detailed analysis of the impact of food type on reproduction in C. elegans. They find that, in comparison with the standard OP50 strain of E. coli that is ubiquitously used to maintain C. elegans in the laboratory setting, the CS180 strain results in a reduction in the number of progeny that may be a consequence of an early transition from spermatogenesis to oogenesis that reduces total sperm number. They also find that the rate of oocyte fertilization is increased in animals fed CS180 vs. OP50. Using mutants and laser ablations, the authors show that, whereas the insulin-like peptide INS-6 acts in the ASJ sensory neurons to mediate the food type effect on total progeny and early oogenesis, the increased fertilization rate phenotype does not require ASJ or insulin-like signaling and instead requires the AWA olfactory neurons.

      The major strengths of the manuscript are the establishment of INS-6 as a link between food type and reproduction and the detail and rigor with which the experiments were executed. The results presented generally support the authors' model. This role of insulin-like signaling in connecting food type and reproduction makes it a plausible target for evolutionary forces that may have shaped insulin-like signaling in invertebrates. As such, this work contributes broadly to our understanding of how insulin signaling may have evolved prior to the emergence of vertebrates.

      We thank the Reviewer for these nice comments.

      A weakness of the work is the epistasis analysis of insulin-like pathway components, which is incomplete and at times difficult to interpret.

      We conducted an epistasis analysis between ins-6 and daf-16 with regard to early oogenesis onset on the CS180 diet. Through recombination of lin-41::GFP with the daf-16 deletion mutation on chromosome I, we showed that daf-16 mutants exhibit early oogenesis at mid L4 on CS180 (Figure 5C and F), which is unlike the ins-6 deletion (null) mutants or the reduction-offunction mutations in daf-2. Both ins-6 and daf-2 mutants exhibit delayed oogenesis on CS180 (Figure 5B, D, and F). Interestingly, the delayed oogenesis phenotype of ins-6 null mutants was not rescued by loss of daf-16, suggesting that wild-type ins-6 promotes early oogenesis independent of daf-16 (Figure 5F). This is reminiscent of the Arur lab’s findings, where daf-2 promotes germline meiotic progression independent of daf-16 in response to food availability (Lopez et al., Dev Cell 2013, vol 27, pp 227-240).

      Reviewer #2 (Public Review):

      The manuscript by Mishra et al. examines the modulation of the nervous system by different bacterial food to influence reproductive phenotypes-specifically onset of oogenesis, fertilization rate, and progeny production. Defining how animal reproduction could be modulated by bacterial food cues through neuroendocrine signaling is a fascinating subject of study for which C. elegans is well-suited. However, the overall scope of the current study is limited, and some of the central data do not provide compelling evidence for the authors' underlying hypothesis and model.

      1) Two strains of E. coli are examined, the standard C. elegans bacterial food strain OP50 and an E. coli strain that Alcedo and colleagues have previously characterized to influence aging and longevity through nervous system modulation. While the authors determine that differences in LPS structure present between the strains does not account for the food-dependent effects, there is little further insight regarding the bacterial features that contribute to the observed differences in reproductive physiology. Moreover, at least two of the phenotypes examined-total progeny and fertilization rate-are known to be affected by bacterial food quality and may be affected by bacteria in many ways, so the description of these phenotypes is somewhat less compelling than the study of the onset of oogenesis.

      Our study focused on how specific sensory neurons mediate the effects of different bacterial diets on three different aspects of C. elegans reproductive physiology—total progeny, oogenesis onset and fertilization rates. We examined the effects of three different bacteria, E. coli OP50, CS180 and CS2429, on these three phenotypes and the effects of two Serratia marcescens strains, Db11 and Db1140, on oogenesis onset. Of these five bacteria, only CS180 and its derivative CS2429, promote early C. elegans oogenesis.

      In the revised manuscript, we included the effects of a fourth E. coli strain, the K-12 HT115 on total progeny (Figure 2—supplement 1), oogenesis onset (Figure 2E) and fertilization rates (Figure 2F). We found that HT115 does not elicit the same response as CS180 on oogenesis onset and fertilization rates. Thus, the oogenic-inducing and fertilization-enhancing cue(s) appear to be specific to CS180 and its derivative CS2429. We started characterizing the potential nature of these CS180-derived cue(s). So far, we found that these cues are unlikely to be free, small metabolites, since they were lost upon filtration of the CS180-conditioned LB media through a nylon membrane that has a pore size of 0.45 µm (Figure 2G and H). While we agree with the Reviewer that the identification of these cues are important, we believe that it is beyond the scope of this manuscript.

      More importantly, we showed that the sensory neuron ASJ does modulate the timing of oogenesis and that this involves the insulin-like peptide ins-6 (please see our responses to the Essential Revisions section and Figures 5 and 6). We also showed that ASJ (Figure 7G and K) or ins-6 (Figure 8D) does not affect the food type-dependent fertilization rates, which are modulated by a different sensory neuron, the olfactory neuron AWA (Figure 7J and K). AWA in turn has no effect on the timing of oogenesis (Figure 7L). Thus, this manuscript links specific sensory neurons and insulin-like peptides to distinct aspects of oocyte biology, which we believe is a significant advance in the field of reproductive biology.

      2) The onset of oogenesis phenotype, using the lin-41::GFP reporter, seems more specific and tractable, and the authors nicely decouple this phenotype from the total progeny and fertilization rate phenotypes through experiments that shift animals to different bacterial food at specific developmental stages.

      We thank the Reviewer for this comment.

      However, as it stands, the data regarding the role of ins-6 and ASJ in modulating this phenotype, and the model that exposure to CS180 bacterial food causes a change in the ASJ expression of ins-6, which is sufficient to promote the earlier onset of oogenesis at the mid-L4 stage, seems somewhat incomplete and have some inconsistencies to be addressed.

      a) The ins-6 mutant phenotype is rescued by genome ins-6 and partially rescued by ins-6 expressed under and ASJ-specific promoter. The lack of rescue from an ASI promoter is puzzling given the secreted nature of ins-6.

      We address this in Essential Revisions, point 3. Briefly, we disagree that this is puzzling, since several labs have already shown that there are functional differences between the INS-6 produced from ASI versus the INS-6 produced from ASJ, using different experimental approaches (Chen et al., 2013; Tang et al., 2023; and this work). Indeed, the cell-specific activities of a secreted signal is not limited to INS-6, but has also been described for other secreted peptides, such as INS-1 (Kodama et al., 2006; Tomioka et al., 2006; Takeishi et al., eLife 2020, vol 9, e61167. Thus, the interesting question is why functional differences exist between the INS-6 peptides from the two neurons. This is a fascinating question, but beyond the scope of this manuscript.

      b) The ins-6 mutant phenotype with regard to delaying the early expression of lin-41::GFP on CS180 appears weaker than the daf-2 mutant phenotype. This is difficult to reconcile with what is known about the relative strength of the daf-2 mutant alleles relative to ins-6 for a wide range of phenotypes.

      There are evidence in the literature that the ins-6 mutant phenotype will not look exactly like that of daf-2 (Chen et al., 2013; Cornils et al., Development 2011, vol 138, pp1183-93; Fernandes de Abreu et al., PLoS Genet 2014, vol 10, e1004225). The DAF-2 insulin-like receptor is predicted to bind multiple insulin-like peptides (Pierce et al., Genes Dev 2001, vol 15, pp 672-686), some of which can act antagonistic to DAF-2 function (Pierce et al., 2001; Cornils et al., 2011; Chen et al., 2013; Fernandes de Abreu et al., 2014). Thus, the oogenic effects of the reduction-offunction mutations in daf-2 are likely the sum of multiple insulin-like peptides, some of which might also delay oogenesis. This could explain why the manipulation of an individual insulin-like peptide, INS-6, which could bind DAF-2 to promote oogenesis, does not closely resemble the phenotype of daf-2 mutants.

      c) The daf-16 loss-of-function phenotype and suppression of daf-2 and ins-6 mutant phenotypes are not shown for the lin-41::GFP expression phenotype.

      We address this in the Public Review comments of Reviewer 1. Briefly, we focused on the epistasis analysis between ins-6 and daf-16 and showed that ins-6 promotes early oogenesis independent of daf-16.

      d) The modest difference in ins-6p::mCherry expression in the ASJ neurons (Figure 5D) make the idea that this difference causes onset of oogenesis somewhat implausible.

      We disagree that this change is modest and that the oogenic effect of such a change is implausible.

      First, the change in ins-6p::mCherry expression in ASJ on CS180 is comparable to other physiologically-important expression changes that have been reported for other genes (for example, Entchev et al., eLife 2015, vol 4, 4:e06259, for the tryptophan hydroxylase tph-1 and the TGF-β daf-7; and Tataridas-Pallas et al, PLoS Genet 2021, vol 17, e1009358, for the neuronally expressed NRF transcription factor skn-1b). Second, it is worth noting that we were using a single-copy reporter for ins-6 expression, where detected changes will be smaller but should be closer to physiological responses. It is possible that multiple-copy reporters will give larger changes, but that would be further from a physiological response. Third, the change in ins-6p::mCherry expression is comparable in scale to the ins-6 mutant phenotype. Our results showed that the 35% increase in ASJ expression of ins-6 is due to food type (Figure 6A; mean fluorescence on OP50 = 1526 + 94; mean fluorescence on CS180 = 2056 + 104). This change in magnitude is similar to the loss of lin-41::GFP expression in mid L4 of ins-6 mutants versus controls. About 30% to 43% of control worms express lin-41::GFP, whereas 0% of ins-6 mutants express the same reporter at mid L4 on CS180 (Figure 5 and its associated supplement).

      e) The strain carrying an genetic ablation of ASJ appears to have a markedly different baseline of kinetics of lin-41::GFP expression (even at lethargus, less than half of the animals appear to express lin-41::GFP). Given this phenotype, it seems difficult to draw conclusions about bacterial food-dependent effects on expression of lin-41::GFP. Additional characterization corroborating timing of oogenesis independent of the lin-41::GFP marker may be helpful, but something seems amiss.

      We address this in Essential Revisions, point 4. Briefly, we disagree that the kinetics of lin-41::GFP expression in ASJ-ablated animals is puzzling, compared to the kinetics observed in insulin signaling mutants. Besides ins-6, ASJ expresses multiple signals (Taylor et al., 2021), some of which might also regulate the multiple functions of oogenic lin-41::GFP. Thus, it should not be surprising that loss of ASJ will have a markedly different effect on oogenesis than the loss of ins-6.

      Reviewer #3 (Public Review):

      I very much enjoyed reading this paper by Shashwat Mishra and team from Joy Alcedo's and from Queelim Ch'ng's laboratories dissecting how sensory signals regulate reproduction in worms. The mechanisms by which sensory inputs affect the function of the germline, the balance between growth and differentiation within this tissue, are of broad interest not only to those interested in reproduction and differentiation, but also to those interested in the mechanisms of plasticity that enable organisms to adjust to changing environmental conditions. These mechanisms are only now beginning to be characterized. Here the focus is on the role of insulin signals expressed in sensory neurons. This work builds on previous findings by the Alcedo lab that sensory perception of bacterial-type dependent signals regulates C. elegans lifespan. Here their focus is on the effects on reproduction, and on the communication of that information by insulin-like signals.

      We thank the Reviewer for these nice comments.

      Worms have a huge family of 40 insulin-like genes, which the Alcedo and Ch'ng labs have been studying for many years. The paper starts with the interesting premise that the brood size of the worms is food type dependent. The authors show that this is due to effects on the timing of the onset of oogenesis during larval development (which constrains the size of the pool of sperm available for subsequent oocyte fertilization) as well as on effects on the rate of oocyte fertilization during adulthood. Using clever timing for food switching, they show that the effects on oogenesis onset and on fertilization rate are separable. In addition, these effects did not appear to be merely the outcome of indirect effects of food ingestion, but were, instead, at least in part, due to the perception of environmental information by specific sensory neurons. Using mutants affecting transduction of sensory information in specific neurons and genetic ablation of specific neurons, the authors show that the onset of oogenesis and the rate of reproduction were controlled by different sensory neurons, ASJ and AWA, respectively. One of these neurons, ASJ, transmitted environmental information via the ins-6 neuropeptide.

      Altogether, the paper advances our understanding of how environmental determinants influence reproduction.

      We thank the Reviewer for these nice comments.

    1. Author Response

      Reviewer #3 (Public Review):

      Comment 1: I'm having some difficulty understanding the logic of Figure 5 in determining cis processing. It is an inverse of figure 4, and in my view, provides further evidence of trans processing. A better experiment would be to use WT-citrine tagged protein with catalytic dead mcherry and image them together. This would show WT cis processing occurs faster than trans processing as citrine specks should appear earlier than the mCherry ones. Can also do colocalization and FRET-based assays with the pair.

      We thank the reviewer for pointing this out. While our data demonstrate that the same molecule must be catalytically active and competent for processing at the IDL (Figure 5), we agree that the data do not rule out trans-processing as a mechanism for speck formation. We have therefore modified the interpretation of these findings accordingly (pp. 7-8). We agree that some of the quantitative assays the reviewer has suggested would strengthen this logic, and we are making efforts to carry out a kinetic FRET-based assay for our upcoming biochemistry-focused manuscript to better characterize the enzymatic affinity of Casp11 for cis- vs. trans- based autoprocessing, and how either impacts Casp11 speck assembly.

      Comment 2: Do those casp11 specks still contain CARDs?- i.e. is the second cleavage necessary for speck formation? Is CARD necessary at all? Would adding the TEV site at CDL and b/w p20 and p10 rescue? i.e. trans-activate?

      We are grateful to the reviewer for these insightful questions, which we also had considered. We addressed this question in two ways – first by replacing the CARD with a DmrB dimerizable domain that undergoes inducible dimerization of Casp11 in the presence of the dimerizing drug AP20187. Critically, inducible dimerization of DmrB-ΔCARD-Casp11-mCherry significantly enhances Casp11-mCherry speck formation, and this speck formation requires catalytic activity, even in the presence of dimerizer (Figure 6A-C). Moreover, we generated CARD-less Casp11-mCherry constructs containing wild-type p20-p10 and catalytically inactive p20-p10. Intriguingly, the CARD was dispensable for spontaneous Casp11-mCherry speck formation, which again was dependent on catalytic activity (Figure 6-figure supplement 2A-B). While we do not currently have data with a TEV-cleavable CDL construct, our data here demonstrate that the CARD is dispensable for speck formation in an overexpression system, implying that the p20/p10 contains all the information that is necessary and sufficient to mediate spontaneous assembly of Casp11 specks in HEK293T cells. Nonetheless, as forced dimerization enhances speck formation (Figure, we hypothesize that CARD-LPS interactions act to facilitate catalytic activity and push cooperative assembly of the Casp11 speck.

      To address whether both the N-terminal CARD and C-terminal p10 domains are present in Casp11 specks, we performed a dual-fluorophore co-localization assay in which we transiently expressed C-terminal mCherry-tagged Casp11 constructs (Casp11-mCherry) in HEK293T cells that stably express N-terminal Flag-tagged Casp11 (2xFLAG-Casp11). As expected, Casp11-mCherry formed specks spontaneously in this setting (Figure 3-figure supplement 1). Critically, both the N-terminal FLAG and C-terminal mCherry were found together in these specks, indicating the presence of both Casp11 N- and C- termini within the specks. Moreover, the wild-type Casp11-mCherry also recruited catalytically inactive 2xFLAG-Casp11C254A, again supporting the finding that wild-type Casp11 can recruit a catalytic mutant to noncanonical inflammasome complexes.

      Comment 3: What are the equations that fit experimental data points and R2 for? E.g. Figure 1E. What are the parameters being fitted/compared and how are those interpreted? A table of fitted values and proper interpretation should be provided.

      We thank the reviewer for this request to clarify how the curves were fit to the experimental data points. We have modified our ‘Statistical Analysis’ section and all figure legends that contain dose-response curves to reflect the equations used to fit each curve. Additionally, please find a table of raw values in the corresponding source data provided for each dose-response curve (Figure 2 Source Data 5; Figure 4 Source Data 3, 6; Figure 5 Source Data 3, 4; Figure 7 Source Data 2; and Figure 4-figure supplement 1 Source Data 1).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines different signaling networks and attempts to give general results for when the network will exhibit biphasic behavior, which is the situation when the output of the network is a non-monotonic function of its inputs. The strength of the paper is in the approach it takes. It starts with the simplest network motifs that produce biphasic behavior and then asks too what happens when these motifs are parts of larger networks. Their approach is in contrast to the usual way in which this question is tackled, which tends to be within the confines of a specific signaling network, where general results like the ones that the authors are after, might be hard to spot.

      We thank the reviewer for the careful reading of the manuscript and for the comments and appreciate the fact the reviewer regards the approach as the strength of the paper.

      The weakness of the paper, in my opinion, is the rather formal description of the results which I am afraid will be of rather limited utility to experimental groups seeking to make use of them. The paper attempts to provide general rules for when to expect biphasic behavior and it was hard to assess to what extent such rules exist as behaviors can change depending on the context of a larger network in which the smaller biphasic one is embedded. The other thing that made assessing the generality of the results difficult is that the input-output functions shown in all the figures are computed for a specific choice of parameters and I was left wondering how different choices of parameters might change the reported behaviors. The lack of specific proposals for how their results should guide future experiments on different signaling networks is another weakness.

      We address these points in a number of ways. Initially our presentation was intended to highlight unambiguously which systems (especially the substrate modification building blocks) were capable of biphasic response and which were not, and highlighting parameter dependence on intrinsic kinetic parameters. Based on both referee comments, we make a number of changes

      (a) We highlight the rationale for choosing the suite of biochemical substrate modification systems: enzyme/substrate sharing is a key driver for the origins of biphasic responses and the suite of systems we employ allows us to systematically explore this (see Response to Essential Revisions). These are building blocks of many pathways,

      (b) Biphasic responses emerge from a built in competing effect. In every instance of substrate modification systems, we now highlight the mechanistic underpinning which gives rise to the competing effect responsible for the biphasic response. This will help experimentalists and modellers alike obtain insights into how such behaviour may arise, and the associated ingredients which facilitate that (which may be relevant in other systems). Similarly, we highlight how altered behaviour at the network level may arise from a biphasic interaction pattern, providing the intuition therein and guide further experimental investigation (also see Response to Essential Revisions).

      (c) With regard to parameters (also see Response to Essential Comments) firstly we emphasize that we completely characterize at the substrate modification level, whether biphasic responses are possible as a function of intrinsic kinetic constants. This is done for every system studied. In Fig 2, we depict this, along with sample biphasic dose responses, for pictorial depiction. However, the essential point is that the parametric dependence on intrinsic kinetic parameters is completely done. We indicate in which cases biphasic responses are impossible irrespective of intrinsic kinetic parameters, where they can be obtained for every value of the intrinsic kinetic parameters, and where there are partial restrictions in the intrinsic kinetic parameter space for obtaining this. In the revision we have performed further parametric analysis to assess the impact of species total amount providing further insights. We have also shown that in all these systems biphasic responses can be obtained in ranges of kinetic parameters similar to those found experimentally (eg Wistel et al 2018) and for reasonable species total amounts in systems and synthetic biology. This is analyzed, and depicted in Figure 2-figure supplement 3 and Figure 2-figure supplement 4.

      (d) Also, in response to another comment (about behaviour changing in networks): we first emphasize that we start at the substrate modification level to uncover drivers of biphasic responses at this level. Biphasic responses arise from an inbuilt competing effect and we demonstrate different ways in which such an inbuilt competing effect arises, through sharing of enzymes or substrates. While it is true that the behaviour can change as part of a network (a) It still remains that there are these in-built competing effects which can generate biphasic responses (both substrate and enzyme) and this can manifest at a pathway or network level under suitable conditions (b) the fact that behaviour at a network level may be altered is exactly why we consider studies at the network level showing both biphasic patterns in interaction (the overall behaviour is determined by the motif and the biphasic pattern of interaction and studies involving interaction of biphasic responses at both the network and substrate modification level!! (subsection: The network level)

      (e) We have also expanded on a paragraph on testable predictions in the conclusions (p10).

      Taken together, we believe that these results should interest both experimentalists and modellers and have intrinsic value as well.

      While I appreciate that the authors adopted a style of presenting their results such that all the mathematics is buried in the figures, I found that it made reading the paper quite difficult, and contributed to my confusion about which results are general and insensitive to parameter choices and which are not. I believe a narrative that integrated the math with some simple intuition might have been more effective. For example, when the authors say in the text that model M0 is incapable of displaying biphasic response, how general is that result? Later on, when discussing model M2, they provide a criterion for biphasic response in terms of products of rate constants satisfying an inequality, but the meaning of this condition is not described. Such things make it hard to learn from the authors' work.

      This has indeed been incorporated, and we agree that presenting the intuition and mechanistic underpinning for the behaviour aids readability. In addition to the points about parameters which are now explained at length in the paper , there are a number of paragraphs providing the mechanistic underpinning and intuition for why the behaviour is obtained. Both these are discussed at length in Response to Essential Revisions. Thus, both the mechanistic intuition and the role of parameters are addressed in detail in the revision.

      When M0 is mentioned to be incapable of yielding biphasic responses we mean just that: irrespective of any parameter choice in the model. The meaning of the criterion in Model M2 is now discussed. We take the point about not being able to learn from the work seriously and have made various changes both on the intuition and clarifying the impact of parameters.

      The text is sprinkled with statements like "this reveals the plurality of information processing behaviors..." where the meaning is quite opaque (for this example, there is no description of "information processing" and what it might mean in this context) and therefore it makes it hard to understand what are the lessons learned from these calculations. Another example is found in the description of Erk regulation where the authors speak of "significant robustness" but what is meant by "significant" is also unclear.

      Yes, we agree that these phrases are distracting and not adding much and so we have removed them.

      Overall, I think this is an interesting attempt to provide a general mathematical framework for analyzing biphasic response of signaling networks, but the authors fall short for the reasons described above. I think a lot can be fixed by improving the way the results are presented.

      We have indeed taken these comments on board and aimed to improve the presentation

      Reviewer #2 (Public Review):

      Biphasic responses are widely observed in biological systems and the determination of general design principles underlying biphasic responses is an important problem. The authors attempt to study this problem using a range of biochemical signaling models ranging from simple enzymatic modification and de-modification of a single substrate to systems with multiple enzymes and substrates. The authors used analytical and computational calculations to determine conditions such as network topology, range of concentrations, and rate parameters that could give rise to biphasic responses. I think the approach and the result of their investigation are interesting and can be potentially useful. However, the conditions for biphasic responses are described in terms of parameter ranges or relationships in particular biochemical models, and these parameters have not been connected to the values of concentrations or rates in real biological systems. This makes it difficult to evaluate how these findings would be applicable in nature or in experiments. It might also help if some general mechanisms in terms of competition/cooperation of time scales/processes are gleaned which potentially can be used to analyze biphasic responses in real biological systems.

      We thank the reviewer for a careful reading of the manuscript and for the various comments and are happy to see the reviewer find the approach interesting. We address these comments in more detail below.

      Reading these comments, we recognized how various analysis and algebraic equations could appear opaque to a reader both in terms of what it conveys and its import. To address this, we made a number of changes.

      1. First and foremost, we provide the mechanistic underpinning and intuition for why a competing effect emerges in the first place. We do this for every substrate modification system we analyze and make further comments in the subsection focussing on the network level as well as ERK This intuition should help a reader where the result is coming from and be then able to see if it might apply in a quite different system. This is discussed in detail in Response to Essential Revisions.

      2. Secondly, we have discussed many aspects of the parameters in more detail. Our goal, especially in substrate modification systems was to be able to completely characterize the role of intrinsic kinetic parameters: whether biphasic responses was impossible irrespective of parameters, whether they were possible for every value of intrinsic kinetic parameters or whether they were possible in a subset of kinetic parameter space. This has been done for every substrate modification system, and has been discussed more explicitly in the revision. Furthermore, when biphasic responses were possible, we aimed to determine the impact of species total amounts which facilitated the response. Here we performed additional analytical and semi-analytical work. Additionally with the semi-analytical work and parameters chosen in ranges very similar to those found experimentally (eg Wistel et al 2018), we are able to show that biphasic responses can indeed be obtained in experimentally feasible ranges. Further aspects of the parameters are discussed in detail in the Response to Essential Revisions. In particular, a number of new paragraphs (p2-3, p6) and plots Figure 2-figure supplement 3 and Figure 2-figure supplement 4 specifically deal with this.

      Taken together these address the reviewers points.

    1. Author Response

      Reviewer #1 (Public Review):

      This interesting manuscript sets out to develop for the mouse a series of important concepts and models that this group has previously developed for models of monkey brains, where they showed that in a large-scale model, anterior → posterior spatial gradients such as spine density (and thus inferred strength of local coupling) lead to a transition from transient stimulus responses to persistent responses, capable of supporting working memory (WM). No such spine density gradient is found in the mouse. Here, the authors propose and use modeling to explore the idea, that the corresponding gradient may be that of density of inhibitory PV cells in different regions of the brain.

      The goal of the study - a large-scale, anatomically-constrained model of WM - is an extremely valuable one, and the authors' efforts in this direction should be supported. That said, some of the main claims in the manuscript were not, at least as currently written, clearly supported by the data, a number of important clarifications need to be made, and some claims of novelty are made in a way that, for a typical reader, may obscure the actual contribution being made.

      The biggest issue is that one of the main claims, that together with cell-type specific long-range targeting, "density of cell classes define working memory representations" (abstract), is not terribly clear. For example, Figs. 2D and 2E show that a brain region's hierarchical location tightly predicts its persistent firing rate (2D), but that PV cell fraction has a far weaker correlation (2E). Is hierarchical location sufficient? If PV cell fraction were constant across model brain regions, would we still get persistent activity modes? It seems likely that the answer may be "yes", but the answer, easily within reach of the authors, is surprisingly not in the current version of the manuscript. Figure 3D, for the thalamocortical model, shows no significant correlation of firing rate with PV density.

      Given the claim about PV density (in the abstract and the first main point of the discussion), this is a big concern. Yet it seems easily addressable: e.g. if indeed the authors found that hierarchy was sufficient and PV density immaterial, the model would be no less interesting. And if the authors demonstrated clearly that a PV density gradient is required, that would make the claim a solid one. If, within the model, such a causal demonstration is present, this reader at least missed it.

      MAJOR CONCERNS:

      (1) The model appears to be a model of a single side of the brain. Perhaps each brain region in the model could be considered an amalgam of that region across both sides of the brain. Yet given results like Li et al. Nature 2016, who show that persistent activity is robust to inhibition of one side, but not both sides of ALM, at the very least discussion of the issue is warranted.

      The model is indeed a one-hemisphere model, and an expansion to a bihemispheric model is considered for future work. We have added the following sentence in the Discussion section:

      “Future versions of the large-scale model may consider different interneuron types to understand their contributions to activity patterns in the cortex (Kim et al,2017; Meng et al., 2023; Tremblay and Rudy, 2016; Nigro et al., 2022), the role of interhemispheric projections in providing robustness for short-term memory encoding (Ni et al., 2016), and the inclusions of populations with tuning to various stimulus features and/or task parameters that would allow for switching across tasks (Yang et al, 2018).”

      (2) The authors make an interesting attempt to distinguish core WM regions from other regions such as "readout" regions, defined as showing persistent activity yet not having an effect on persistent activity elsewhere in the network.

      However, this definition seemed problematic: for example, consider a network that consists of 20 brain regions, all interconnected to each other, and all equivalent to each other, capable of displaying persistent activity thanks to mutual connectivity. Imagine that inhibition of any one of these regions is not sufficient to significantly perturb persistent activity in the other 19. Then they would all be labeled as "readout". Yet, by construction in this thought experiment, they are all equivalent to each other and are all core areas. Such redundancy may well be present in the brain. How would the authors address this redundancy issue?

      We acknowledge the importance of this thought experiment. Although we initially restricted the definition of core area to how a single area contributes to working memory, we proceeded with concurrent inhibition of multiple readout areas (see Essential Revisions response 6 above).

      (3) Also important to discuss would be the fact that every brain region in this model is set up as composed of two populations, and when long-range interactions are strong and the attractors strongly coupled, the entire brain is set up as a 1-bit working memory. How would results and the approach be impacted by considering WM for more flexible situations?

      We have used a model of two populations as the simplest way to integrate large-scale connectivity and inhibitory gradients. Indeed, future work should consider more realistic connectivity and populations with various degrees of tuning to different task parameters. (see Reviewer 1 response 1 above)

      (4) Another concern that is important yet easily addressed is the authors' use of the term "novel cell-type specific graph theory measures". Describing in the abstract and elsewhere the fact that what they mean is to take into account the sign of connections, not just their magnitude, would transmit to readers the essence of the contribution in a manner very simple to understand. Most readers would fail to grasp the essential point of the current labeling, which sounds potentially very vague and complex.

      We have reworded the abstract - see also Essential reviews response 2 above.

      (5) Finally, the overall significance of the study, and advances over previous work, were not entirely clear. In the discussion, the authors identify three major findings: (1) WM function is shaped by the PV cell density gradient. But as above, further work is required to make it clear that this claim is supported by the model. (2) if local recurrent excitation is insufficient to generate persistent activity, then long-range recurrent excitation is needed to generate it. I had trouble understanding why a model was needed to reach this conclusion - it seems as if it is simply a question of straightforward logic. The discussion states that in this regard, the work here "offers specific predictions to be tested experimentally", but I had trouble identifying what these specific predictions are. (3) Taking into account sign, not only magnitude, of connections, is important. This last point once again seemed a matter of straightforward logic, making its novelty difficult to assess.

      We thank the reviewer, we have addressed these issues in the Essential Revisions 3) above.

      Reviewer #2 (Public Review):

      This paper uses the mouse mesoscale connectome, combined with data on the number and fraction of PV-type interneurons, to build a large-scale model of working memory activity in response to inputs from various sensory modalities. The key claims of the paper are two-fold. First, previous work has shown that there does not appear to be an increase in the number of excitatory inputs (spines) per pyramidal neuron along the cortical hierarchy (and this increase was previously suggested to underlie working memory activity occurring preferentially in higher areas along the cortical hierarchy). Thus, the claim is that a key alternative mechanism in the mouse is the heterogeneity in the fraction of PV interneurons. Second, the authors claim to develop novel cell type-specific graph theory.

      I liked seeing the authors put all of the mouse connectomic information into a model to see how it behaved and expect that this will be useful to the community at large as a starting point for other researchers wishing to use and build upon such large-scale models. However, I have significant concerns about both primary scientific claims. With regard to the PV fraction, this does not look like a particularly robust result. First, it's a fairly weak result to start, much smaller than the simple effect of the location of an area along the cortical hierarchy (compare Figs. 2D, 2E; 3C, 3D). Second, the result seems to be heavily dependent upon having subdivided the somatosensory cortex into many separate points and focusing the main figures of the paper (and the only ones showing rates as a function of PV cell fraction) solely on simulations in which the sensory input is provided to the visual cortex. With regards to the claim of novel cell type-specific graph theory, there doesn't appear to be anything particularly novel. The authors simply make sure to assign negative rather than positive weights to inhibitory connections in their graph-theoretic analyses.

      Major issues:

      1) Weakness of result on effect of PV cell fraction. Comparing Figures 2D and 2E, or 3C and 3D, there is a very clear effect of cortical hierarchy on firing rate during the delay period in Figures 2D and 3C. However, in Figure 2E relating delay period firing rate to PV cell fraction, the result looks far weaker. (And similarly for Figs. 3C, 3D, with the latter result not even significant). Moreover, the PV cell fraction results are dominated by the zero firing rate brain regions (as opposed to being a nice graded set of rates, both for zeros and non-zeros, as with the cortical hierarchy results of Figures 2D), and these zeros are particularly contributed to by subdividing somatosensory (SS) into many subregions, thus contributing many points at the lower right of the graph.

      Further, it should be noted that Figure 2E is for visual inputs. In the supplementary Figure 2 - supplement 1, the authors do apply sensory inputs to auditory and somatosensory cortex...but then only show the result that the delay period firing rate increases along the cortical hierarchy (as in Figure 2D for the visual input), but strikingly omit the plots of firing rate versus PV cell fraction. This omission suggests that the result is even weaker for inputs to other sensory modalities, and thus difficult to justify as a defining principle.

      We have now made an effort to exhaustively compare the contributions of PV versus hierarchy in defining the firing rate activity patterns in the model - see Essential Revisions response 1 above. Moreover, we included plots of firing rate versus PV cell fraction for other sensory modalities, and the results would still support a common architecture for short-term memory maintenance.

      2) Graph theoretic analyses. The main comparison made is between graph-theoretic quantities when the quantities account for or do not account for, PV cells contributing negative connection strengths. This did not seem particularly novel.

      See Essential Revisions response 2 above

      3) It was not clear to me how much the cell-type specific loop strength results were a result of having inhibitory cell types, versus were a result of the assumption ('counter-stream inhibitory bias') that there is a different ratio of excitation to inhibition in top-down versus bottom-up connections. It seems like the main results were more a function of this assumed asymmetry in top-down vs. bottom-up than it was a function of just using cell-type per se. That is, if one ignored inhibitory neurons but put in the top-down vs. bottom-up asymmetry, would one get the same basic results? And, likewise, if one didn't assume asymmetry in the excitatory vs. inhibitory connectivity in top-down versus bottom-up connections, but kept the Pyramidal and PV cell fraction data, would the basic result go away?

      We have addressed the issue of cell-type specific loop strength in Essential Revisions response 2 above.

      4) In the Discussion, there is a third 'main finding' claimed: "when local recurrent excitation is not sufficient to sustain persistent activity...distributed working memory must emerge from long-range interactions between parcellated areas". Isn't this essentially true by definition?

      We have addressed this important issue in Essential Revisions response 3 above.

      5) I don't know if it's even "CIB" that's important or just "any asymmetry (excitatory or inhibitory) between top-down vs. bottom-up directions along the hierarchy". This is worth clarifying and thinking more about, as assigning this to inhibition may be over-attributing a more basic need for asymmetry to a particular mechanism.

      We found that this asymmetry is indeed crucial, which may be provided by CIB or, in some regimes, it is sufficient that a PV gradient is present - see Essential Revisions response 1 above.

      Other questions:

      1) Is it really true that less than 2% of neurons are PV neurons for some areas? Are there higher fractions of other inhibitory interneuron types for these areas, and does this provide a confound for interpreting model results that don't include these other types?

      Maybe related to the above, the authors write in the Results that local excitation in the model is proportional to PV interneuron density. However, in the methods, it looks like there are two terms: a constant inhibition term and a term proportional to density. Maybe this former term was used to account for other cell types. Also, is local excitation in the model likewise proportional to pyramidal interneuron density (and, if not, why not?)?

      The reviewer is correct in pointing out that the ‘constant inhibition term’, which we interpret as a minimal inhibition, accounts for other cell types. We have added the respective explanation in the Methods section. Future versions of the model may include different interneuron types - see Reviewer 1 Response 1 above.

      2) Non-essential areas. The categorization of areas as 'non-essential' as opposed to, e.g. "inputs" is confusing. It seems like the main point is that, since the delay period activity as a whole is bistable, certain areas' contributions may be small enough that, alone, they can't flip the network between its bistable down and up states. However, this does not mean that such areas (such as the purple 'non-essential' area in Figure 5a) are 'non-essential' in the more common sense of the word. Rather, it seems that the purple area is just a 'weaker input' area, and it's confusing to thus label it as 'non-essential' (especially since I'd guess that, whether or not an area flips on/off the bistability may also depend on the assumed strength of the external input signal, i.e. if one made the labeled 'input area' a bit too weak to alone trigger the bistability, then the purple area might become 'essential' to cross the threshold for triggering a bistable-up state).

      This is an important point, and a similar point was also raised by Reviewer 1. For simplicity, we have restricted the definition of the function of an area (e.g., input, vs core vs non essential) to how a single area contributes to working memory. The existence of ‘subnetworks’ for any of these functions is indeed plausible - and potentially important, but we have left this for future modeling work. (see Essential Revisions response 6 above). The point that distinguishes ‘input’ and ‘non-essential’ areas is simply whether inhibiting said area during the stimulus period affects stimulus-specific persistent activity. Surely some of the areas that we have classified as ‘non-essential’ have important roles, even for the contents of working memory, however they are not essential to produce the activity pattern we observe here.

      3) Relation between 'core areas' and loop strength. The measure underlying 'prediction accuracy = 0.93' in Figure 6D and the associated results seems incomplete by being unidirectional. It captures the direction: 'given high cell-type specific loop strength, then core area' but it does not capture the other direction: 'given a cell is part of a core area, is its predicted cell-type specific loop strength strong?'. It would be good to report statistics for both directions of association between loop strength and core area.

      Indeed the prediction accuracy refers to the direction loop strength->core area, for which we estimate how well a continuous variable (loop strength) predicts a binary variable (whether core area or not). A prediction in the reverse direction is not well defined, namely to predict a continuous variable from a binary variable, so the reverse association may be only indirectly inferred from Figure 8D.

      4) More justification would be useful on the assumption that the reticular nucleus provides tonic inhibition across the entire thalamus.

      Relatively little is known about how specific this inhibition may be. We have included references in the Discussion section that speak to this fact. (Crabtree 2018, Hardinger et al., 2023).

      5) Is NMDA/AMPA ratio constant across areas and is this another difference between mice and monkeys? I am aware of early work in the mouse (Myme et al., J. Neurophys., 2003) suggesting no changes at least in comparing two brain regions' layer 2/3, but has more work been performed related to this?

      Recent anatomical in-vitro autoradiography work in the macaque shows that NMDA/AMPA ratio (in terms of receptor density) varies across the cortical hierarchy (Klatzmann et al., 2022). Functionally NMDA receptors seem important in PFC L2/3 for persistent activity, while in V1, they contribute relatively little to the stimulus response, which is dominated by AMPA-mediated excitation. This was shown by a recent physiological study in the macaque (Yang et al., 2018). This could indeed point to a species difference, although like-for-like comparisons of equivalent experiments across species are lacking in the literature.. We have included this and other related references in our Discussion - see Essential Revision 4 above.

      6) Are bilateral connections between the left and right sides of a given area omitted and could those be important?

      These potentially important connections were omitted for simplicity in the model, please see Reviewer 1 Responses 1, 3 above.

      Reviewer #3 (Public Review):

      Combining dynamical modelling and recent findings of mouse brain anatomy, Ding et al. developed a cell-type-specific connectome-based dynamical model of the mouse brain underlying working memory. The authors find that there is a gradient across the cortex in terms of whether mnemonic information can be sustained persistently or only transiently, and this gradient is negatively correlated to the local density of parvalbumin (PV) positive inhibitory cells but positively correlated with mesoscale-defined cortical hierarchy. In addition, weighing connectivity strength by PV density at target areas provides a more faithful relationship between input strength and delay firing rate. The authors also investigate a model where cortical persistent activity can only be sustained with thalamus input intact, although this result is rather separate from the rest of the study. The authors then use this model to test the causal contributions of different areas to working memory. Although some of the in silico perturbations are consistent with existing experimental data, others are rather surprising and need to be further discussed. Finally, the authors investigate patterns of attractor states as a result of different local and long-range connections and suggest that distinct attractor states could underlie different task demands.

      The importance of PV density as a predictor for working memory activity patterns in the mouse brain is in contrast to recent computational findings in the primate brain where the number of spines (excitatory synapses per pyramidal cell) is the key predictor. This finding reveals important species differences and provides complementary mechanisms that can shape distributed patterns of working memory representation across cortical regions. The method of biologically-based near-whole-brain dynamical modeling of a cognitive function is compelling, and the main conclusions are mostly well supported by evidence. However, some aspects of the method, result, and discussion need to be clarified and extended.

      1) Based on existing anatomical data, the authors reveal a negative correlation between cortical hierarchy (defined by mesoscale connectivity; this concept needs to be explicitly defined in the Results session, not just in the Method section) and local PV density (Fig. 1). In the dynamical model, the authors find that working memory activity is positively (and strongly) correlated with cortical hierarchy and negatively (and less strongly) correlated with PV cell density (Fig. 2), and conclude that working memory activity depends on both. But could the negative correlation between activity and PV density simply result from the inherent relationship between hierarchy and PV density across regions? To strengthen this result, the authors should quantify the predictive power of local PV density on working memory activity beyond the predictive power of cortical hierarchy.

      We have systematically compared the relationship between PV and hierarchy in generating delay-patterns of activity - see Essential Revisions response 1 above.

      2) In Fig. 4, the authors find that cell-type-specific graph measures more accurately predict delay-period firing rates. Specifically, the authors weigh connections with a cell-type-projection coefficient, which is smaller when the PV cell fraction is higher in the target area. Considering that local PV cell fraction is already correlated with delay activity patterns, weighing the input with the same feature will naturally result in a better input-output relationship. This result will be strengthened if there is a more independent measure of cell-type-projection coefficient, such as the spine density of PV vs excitatory cells across regions, or even the percentage of inhibitory versus excitatory cells targeted by upstream region (even just for an example set of brain regions).

      We have compared different measures of cell-type projection coefficients and how they predict delay-patterns of activity and whether an area is a core area - see Essential Revisions response 2 above.

      3) The authors aim to identify a core subnetwork that generates persistent activity across the cortex by characterising delay activity as well as the effects of perturbations during the stimulus and delay period. Consistent with existing data, the model identifies frontal areas and medial orbital areas as core areas. Surprisingly, areas such as the gustatory area are also part of the core areas. These more nuanced predictions from the model should be further discussed. Also surprisingly, the secondary motor cortex (MOs), which has been indicated as a core area for short-term memory and motor planning by many existing studies is classified as a readout area. The authors explain this potential discrepancy as a difference in task demand. The task used in this study is a visual delayed response task, and the task(s) used to support the role of MOs in short-term memory is usually a whisker-based delayed response task or an auditory delay response task. In all these tasks, activity in the delay period is likely a mixture of sensory memory, decision, and motor preparation signals. Therefore, task demand is unlikely the reason for this discrepancy. On the other hand, motor effectors (saccade, lick, reach, orient) could be a potential reason why some areas are recruited as part of the core working-memory network in one task and not in another task. The authors should further discuss both of these points.

      We have addressed this important point in Essential Revisions response 5 above.

      4) As a non-expert in the field, it is rather difficult to grasp the relationship between the results in Fig. 7 and the rest of the paper. Are all the attractor states related to working memory? If so, why are the core regions for different attractor states so different? And are the core regions identified in Fig. 5 based on arbitrary parameters that happen to identify certain areas as core (PL)? The authors should at least further clarify the method used and discuss these results in the context of previous results in this study.

      Attractor states that have a stable baseline are, by definition, related to working memory in that there is a baseline and a memory state associated with the model. Some areas, such as PL are more likely to be associated with different core subnetworks given its position in the hierarchy. In the current version of the manuscript, we provide a motivation for the different attractor states and how they may relate to cognitive function.

    1. Author Response

      Reviewer #1

      While the article clearly outlines the strengths of the chosen approach, it lacks an equally clear exposition of its limitations and a more thorough comparison to established approaches. Two examples of limitations that should be stated more clearly, in my opinion: models need to be small enough to fit on a single machine (in contrast to e.g. NEURON and NEST which support distributed computation via MPI), and only single-compartment models are supported; both limitations are mentioned in passing in the discussion, but would merit a more upfront mention.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems.

      In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      The study does not verify the accuracy of the presented framework. While its basic approach (time-step-based simulation, standard numerical integration algorithms) is sufficiently similar to other software to not expect major discrepancies, an explicit comparison would remove any doubt. Quantitative measures of accuracies are particularly important in the context of benchmarks (see below), since simulations can be made arbitrarily fast by sacrificing performance.

      We agree that an explicit comparison would help alleviate any doubts and provide a more comprehensive understanding of our framework's accuracy. We have revised our manuscript to include a dedicated section, particularly Appendix 11. In this section, we verified that all simulators generated consistent average firing rates for the given benchmark network models (figure 1 and figure 2 in Appendix 11). These verifications were performed under different network sizes (ranging from 4e^3 to 4e^5) and different computing platforms (CPU, GPU and TPU). We also qualitatively compared the overall network activity patterns produced by each simulator to ensure they exhibited the same dynamics (figure 3 and figure 4 in Appendix 11). While exact spike-to-spike reproducibility was not guaranteed between different simulator implementations, we confirmed that our simulator produced activity consistent with the reference simulators for both firing rates and network-level dynamics. Additionally, BrainPy did not sacrifice simulation accuracy for speed performance. Despite using single precision floating point, BrainPy was able to produce consistent firing rates and raster diagrams across all simulations (see figure 3 and figure 4 in Appendix 11).

      We hope these revisions can ensure that our manuscript provides a clear and robust validation of the accuracy of our simulator.

      Benchmarking against other software is obviously important, but also full of potential pitfalls. The current article does not state clearly whether the results are strictly comparable. In particular: are the benchmarks on the different simulators calculating results to the same accuracy (use of single or double precision, same integration algorithm, etc.)? Does each simulator use the fastest possible execution mode (e.g. number of threads/processes for NEST, C++ standalone mode in Brian2, etc.)? What is exactly measured (compilation time, network generation time, simulation execution time, ...) - these components will scale differently with network size and simulation duration, so summing them up makes the results difficult to interpret. Details are also missing for the comparison between the XLA operator customization in C++ vs. Python: was the C++ variant written by the authors or by someone else? Does the NUMBA→XLA mechanism also support GPUs/TPUs? This comparison also seems to be missing from the GitHub repository provided for reproducing the paper results.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the comparison between XLA operator customization in C++ and Python, we utilized our self-implemented C++ version, which is accessible in the Appendix 8 Listing 2. Presently, the NUMBA→XLA mechanism does not support GPUs/TPUs; however, we are working on expanding this capability to other platforms. We have made this clarification in the revised manuscript as well (see L1278 - L1285).

      While the authors convincingly argue for the merits of their Python-based/object-oriented approach, in my opinion, they do not fully acknowledge the advantages of domain-specific languages (NMODL, NestML, equation syntax of ANNarchy and Brian2, ...). In particular, such languages aim at a strong decoupling of the mathematical model description from its implementation and other parts of the model. In contrast, models described with BrainPy's approach often need to refer to such details, e.g. be aware of differences between dense and sparse connectivity schemes, online, or batch mode, etc. It might also be worth mentioning descriptive approaches to synaptic connectivity as supported by other simulators (connection syntax in Brian2, Connection Set Algebra for NEST).

      We have made revisions to better acknowledge the merits of DSLs while providing a more comprehensive comparison. These revisions are incorporated in Discussion (L452 - L466) and Appendix 1 (L778 - L788).

      Reviewer #2

      While the results presented are impressive, publishing further details of the benchmarks in an appendix would be helpful for evaluating the claims and the overall conclusion would be more convincing if the performance benefits were demonstrated on a wider selection of test cases. Unsatisfyingly, the authors gave up on making a direct comparison to Brian running on GPUs with GeNN which would have been a fairer comparison than CPU-based simulations. The code for the chosen benchmarks is also likely to be highly optimised by the authors for running on BrainPy but less so for the other platforms - a fairer test would be to invite the authors of the other simulators to optimise the same models and re-evaluate the benchmarks.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the wider selection of test cases, we understand the importance of demonstrating the performance benefits on a broader range of scenarios. Particularly, we have designed two kinds of benchmark models:

      • Sparse connection models. This category models include COBA-LIF network and COBA-HH network. The former is a standard E/I balanced network for comparing simualtion speed of a brain simulator, while the latter uses the complex computational expensive HH neuron model as the elements. Both models can be effectively to demonstrate the capability of a brain simulator for the sparse and event-driven computation.

      • Dense connection models. The local circuits of a cortical column are usually connected densely (Science 366, 1093). Particularly, we use the decision making network proposed by (Wang, 2002) for evaluations.

      In the revised version, we include extensive experiments on these three test cases under different kinds of computing platforms (including CPU, GPU, and TPU) to strengthen the overall conclusion and provide a more comprehensive evaluation of our approach.

      Regarding the comparison to Brian running on GPUs with GeNN, we apologize for not including that in our initial submission. We have conducted the necessary experiments on all three benchmark models we have used in our evaluations and include these results in the revised version of the paper (see Figure 8). This addition will enhance the credibility of our findings and allow for a more meaningful comparison between different simulation platforms. Furthermore, we have also reached out to the authors of other simulators and invite them to optimize the same models used in our benchmarks. We believe this collaborative approach will ensure a more equitable evaluation of the simulators and provide a more robust and convincing analysis of our work.

      Furthermore, the manuscript reads like an advertisement for the platform with very little discussion of its limitations, weaknesses, or directions for further improvement. A more frank and balanced perspective would strengthen the manuscript and give the reader greater confidence in the platform.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems. In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      Since simulators wax and wane in popularity, it would be reassuring to see a roadmap for development with a proposed release cadence and a sustainable governance policy for the project. This would serve to both clearly indicate the areas of active development where community contributions would be most valuable and also to reassure potential users that the project is unlikely to be abandoned in the near future, ensuring that their time investment in learning to use the framework will not be wasted.

      We appreciate the reviewer raising the point for demonstrating the project's sustainability. In response to this feedback, we have made the following efforts.

      Firstly, we add and maintain a "Development roadmap" section in the BrainPy GitHub homepage (https://github.com/brainpy/BrainPy). This will enable the community to have a clear understanding of the project's direction and the areas of active development. Additionally, the "Future work" section in our revised paper has also outlined a comprehensive roadmap for next stages of the BrainPy development.

      Secondly, to address the concern about the sustainability of our project and the potential risk of abandonment, we have incorporated a ACKNOWLEDGMENTS.md file in the GitHub (https://github.com/brainpy/BrainPy/blob/master/ACKNOWLEDGMENTS.md) to outline our sustainable funding support. These supports demonstrates our commitment to the long-term maintenance and development of the project, thus may help to dispel doubts of users for the project abandonment.

      Similarly, a complex set of dependencies, which need to be modified for BrainPy, will likely make the project hard to maintain and so a similar plan to those given for the CI pipeline and documentation generation for automation of these modifications would be a good addition. It is also important to periodically reflect on whether it still makes sense to combine all the disparate tools into one framework as the codebase grows and starts to strain under modifications required to maintain its unification.

      We appreciate the reviewer's valuable suggestions on the BrainPy framework.

      First, BrainPy is a self-contained package designed specifically for brain dynamics programming. It boasts minimal dependencies, relying only on fundamental packages within the Python scientific computing ecosystem. In essence, BrainPy relies on numpy for array-based computations and utilizes jax and jaxlib for JIT compilation. While we currently utilize numba to customize dedicated operators, we can also remove this dependency by rewriting these operators with C++ code. We incorporate the use of brainpylib, a package developed by ourselves, which provides dedicated operators for CPUs and GPUs in the context of brain dynamics modeling. Additionally, BrainPy leverages mature solutions within the field for certain auxiliary functions. For instance, we integrate the use of tqdm to facilitate the display of a progress bar during model execution, and employ matplotlib for visualization purposes, capitalizing on its well-established capabilities in the scientific community.

      Second, we agree that there is a risk of overly complex dependencies and architectural strains. To mitigate this risk, we have taken the following changes:

      • We prioritize good software engineering practices like loose coupling, high cohesion and modularity in the framework design. This will isolate dependencies and changes to specific components. For example, brainpy.visualize nodule defines abstract visualization functions in which the visualization backend can be changed anytime.

      • We invest in automating aspects of the build, test, and release process to relieve manual maintenance burdens. We heavily use the GitHub actions for testing BrainPy codes and building documentations.

      • We document dependencies clearly and maintain backwards compatibility when possible. New APIs will be clearly stated supported after which BrainPy version, and deprecated APIs will be deprecated over multiple release cycles.

      • We continuously monitor code complexity metrics and refactor/simplify the architecture when needed.

      • When new tools have significantly different requirements, we will consider spinning them off into separate projects rather than forcing them into the core framework.

      Finally, a live demonstration would be a very useful addition to the project. For example, a Jupyter notebook hosted on mybinder.org or similar, and a fully configured Docker image, would each enable potential users to quickly experiment with BrainPy without having to install a stack of dependencies and troubleshoot version conflicts with their pre-existing setup. This would greatly lower the barrier to adoption and help to convince a larger base of modellers of the potential merits of BrainPy, which could be major, both in terms of the computational speed-up and ease of development for a wide range of modelling paradigms.

      We appreciate the reviewer's valuable feedback and suggestion. We have hosted a Jupyter notebook and a fully configured Docker image on mybinder.org (https://mybinder.org/v2/gh/brainpy/BrainPy-binder/main). Users can easily experiment with BrainPy without the need to install multiple dependencies or troubleshoot version conflicts.

      Reviewer #3

      One potential issue is that the scope of the neuro-simulator is not very clearly explained and the target audience is not well defined: is BrainPy primarily intended for computational neuroscientists or for neuro-AI practitioners? The simulator offers very detailed neural models (HH, fractional order models), classical point-models (LIF, AdEx), rate-coded models (reservoirs), but also deep learning layers (Conv, MaxPool, BatchNorm, LSTM). Is there an advantage to using BrainPy rather than PyTorch for purely deep networks? Is it possible to build hybrid models combining rate-coded reservoirs or convnets with a network of HH neurons? Without such a hybrid approach, it is unclear why the deep learning layers are needed.

      We appreciate the reviewer's concern regarding the scope of BrainPy and the need for clarification regarding the target audience.

      BrainPy is designed to cater to both computational neuroscientists and neuro-AI practitioners by integrating detailed neural models, classical point models, rate-coded models, and deep learning models. The platform aims to provide a general-purpose programming framework for modeling brain dynamics, allowing users to explore the dynamics of brain or brain-inspired models that combines insights from biology and machine learning.

      Particularly, brain dynamics models (provided in brainpy.dyn module) and deep learning models (provided in brainpy.dnn module) are closely integrated with each other in BrainPy. First, to build brain dynamics models, users should use the building blocks in brainpy.dnn module to create synaptic projections.

      Second, to build brain-inspired computing models for machine learning, users could also take advantages of neuronal and synaptic dynamics have been provided in brainpy.dyn module.

      To that end, BrainPy provides building blocks of detailed conductance-based models like Hodgkin-Huxley, as well as common deep learning layers like convolutions.

      Regarding the advantage of using BrainPy over PyTorch for purely deep networks, we acknowledge that existing deep learning libraries like Flax in the JAX ecosystem provide extensive tools and examples for constructing traditional deep neural networks. While BrainPy does implement standard deep learning layers, our primary focus is not to compete directly with those libraries. Instead, we provide these models for the seamless integration of deep learning layers within BrainPy's core modeling abstractions, including variables and dynamical systems. This integration allows researchers to incorporate common deep learning layers into their brain models. Additionally, the inclusion of deep learning layers in BrainPy serves as examples for customization and facilitates the development of tailored layers for neuroscience research. Researchers can modify or extend the implementations to suit their specific needs.

      In summary, BrainPy's scope focuses on the general-purpose brain dynamics programming. The target audience includes computational neuroscientists who want to incorporate insights from machine learning, as well as some ML researchers interested in integrating brain-like components.

      In terms of plasticity, only external training procedures are implemented (backpropagation, FORCE, surrogate gradients). No local plasticity mechanism (Hebbian learning for rate-coded networks, STDP and its variants for spiking networks) seems to be implemented, apart from STP. Is it a planned feature? Appendix 8 refers to bp.synplast.STDP(), but it is not present in the current code (https://github.com/brainpy/BrainPy/tree/master/brainpy/_src/dyn/synplast). Spiking networks without STDP are not going to be very useful to computational neuroscientists, so this suggests that the simulator targets primarily neuro-AI, i.e. AI researchers interested in using spiking models in a machine learning approach.

      We appreciate that the reviewer raising the limitations of BrainPy in terms of local plasticity mechanisms. We are sorry for the delay of implementing STDP models in BrainPy. Currently, we provide very general implementations of STDP. It can be compatible with any synaptic model (such as Exponential, Dual Exponential, AMPA, GABA, and NMDA dynamics), and common connection patterns (such as Dense, and Sparse connection patterns).

      bp.dyn.STDP_Song2001(pre, post, delay, syn, comm, out)

      It can also be easily used with the combination of short-term plasticity models. The modular design of BrainPy's framework also make the plasticity component straightforward to be implemented and integrated into existing models.

      A second weakness of the paper concerns the demos and benchmarks used to demonstrate the versatility and performance of BrainPy, which are not sufficiently described. In Fig. 4, it is for example not explained how the reservoirs are trained (only the readout weights, or also the recurrent ones? Using BPTT only makes sense when the recurrent weights are also trained.), nor how many neurons they have, what the final performance is, etc. The comparison with NEURON, NEST, and Brian2 is hard to trust without detailed explanations. Why are different numbers of neurons used for COBA and COBAHH? How long is the simulation in each setting? Which time is measured: the total time including compilation and network creation, or just the simulation time? Are the same numerical methods used for all simulators? It would also be interesting to discuss why the only result involving TPUs (Fig 8c) shows that it is worse than the V100 GPU. What could be the reason? Are there biologically-realistic networks that would benefit from a TPU? As the support for TPUs is a major selling point of BrainPy, it would be important to investigate its usage further.

      We appreciate the reviewer for raising the important question about the demos and benchmarks used to demonstrate the versatility and performance of BrainPy. To address these concerns, we have added more details in the revised paper, including:

      • In Fig. 4, we explain how the reservoirs are trained in Appendix 10, in which only the readout weights are trained, and they are trained using backpropagation, FORCE learning, and ridge regression algorithms, respectively. We also specify the number of neurons in each reservoir (see L1397), and the final performance of the reservoirs on the task (see Figure 4).

      • To enable readers to better interpret the simulator comparisons in Fig. 8, we have also added more detailed explanations of the comparison with NEURON, NEST, and Brian2 in Appendix 11.

      • In the current revised paper, we provide a comprehensive analysis of BrainPy's compatibility with different hardware platforms, including TPUs, and to identify the specific conditions under which TPUs may offer advantages (see Figure 8 and Appendix 11—figure 7 ). We have also discussed the potential benefits of TPUs for biologically-realistic networks (see L514 - L521). Particularly, for the biological network with arbitrary sparsity, TPUs does not show advantage over GPUs (see Appendix 11—figure 7). TPUs are best at exploiting certain kinds of structured sparsity, for example block sparsity.

    1. Author Response

      Reviewer #1 (Public Review):

      Due complicated and often unpredictable idiosyncratic differences, comparing fMRI topography between subjects typically would require extra expensive scan time and extra laborious analyzing steps to examine with specific functional localizer scan runs that contrast fMRI responses of every subject to different stimulus categories. To overcome this challenge, hyperaligning tools have recently been developed (e.g., Guntupalli et al., 2016; Haxby et al., 2011) based on aligning in a high-dimensional space of voxels of subjects' fMRI responses to watching a given movie. In the present study, Jiahui and colleagues propose a significantly improved version of hyperaligning functional brain topography between individuals. This new version, based on fMRI connectivity, works robustly on datasets when subjects watched different movies and were scanned with different parameters/scanners at different MRI centers.

      Robustness is the major strength of this study. Despite the fact that datasets from different subjects watching different movies at different MRI centers with different scan parameters were used, the results of functional brain topography from between-subject hyperalignment based on fMRI connectivity were comparable to the golden standard of within-subject functional localizations, and significantly better than regular surface anatomical alignments. These results also support the claim that the present approach is a useful improvement from previous hyperalignments based on time-locked fMRI voxel responses, which would require normative samples of subjects watching a same movie.

      We thank the reviewer for the appreciation of our work.

      Given the robustness, this new version of hyperalignment would provide much stronger statistical power for group-level comparisons with less costs of time and efforts to collect and analyze data from large sample size according to the current stringent standard, likely being useful to the whole research community of functional neuroimaging. That said, more discussions of the limit of the present hyperalignment approach would be helpful to potential eLife readers. For example, to what extend the present hyperalignment approach would be applicable to individuals with atypical functional brain topography such as brain lesion patients with e.g., acquired prosopagnosia? Even in typical populations, while bilateral fusiform face areas can be identified in the majority through functional localizer scans, the left fusiform face area sometimes cannot be found. Moreover, many top-down factors are known to modulate functional brain topography. Due to these factors, brain responses and functional connectivity may be different even when a same subject watched a same movie twice (e.g., Cui et al., 2021).

      We thank the reviewer for the suggestion and agree that it would be fascinating if the predictions can be made with high fidelity in neuropsychological populations. Although we are optimistic that our algorithm is able to generalize across diverse populations, to date, no previous literature has provided empirical evidence to illustrate the effectiveness, including optimizations and special applications beyond typical brains. Besides the neuropsychological population, it would also be valuable to study the generalization across a broad age range, for example, from infants to the elderly. The brain changes across age both anatomically and functionally, so it is a challenge to predict functional topographies based on a normative group that only includes young participants. With all these potential applications in mind, future research is needed to illustrate the efficacy, build the pipeline, and construct the representative normative groups to meet the requirements of accurate individualized predictions in diverse populations.

      In typical populations, although participants have great individual variabilities in their functional topographies, for instance, some participants have distinguishable patches of activations in their left ventral temporal cortex while some participants don’t, our algorithms successfully captured these individualized differences in the prediction. The figure below shows, as an example, the face-selective topographies of two individuals that have markedly different face-selective topographies on the left ventral temporal cortex. The left participant has prominent face-selective areas on the left ventral temporal cortex that are in similar sizes as the right side, while the right participant only has a few scattered small face-selective spots on the left side. No matter what their face-selective areas look like, our algorithm accurately recovered the individualized locations, shapes, and sizes, retaining the individual variability in the functional topographies.

      Functional connectivity profiles based on naturalistic stimuli are very stable across the cortex, even when participants watch different movies. In Figure 4-figure supplement 9, the mean correlations of fine-scaled connectome for most searchlights (r = 15mm) when participants watched The Grand Budapest Hotel and the Raiders of the Lost Ark were generally around 0.8. The mean correlations were about 0.9 between the first and second half of the same movie although the stimuli contents were different between the two halves. Thus, the fine-grained functional connectivity profiles remain highly stable and reliable across movie contents, which contributes to the robustness of cross-movie, time, and other parameters (e.g., scanner models, scanning parameter) predictions using our algorithms.

      We added a paragraph in the discuss section to address the concerns (page 18-19):

      “This study successfully illustrated that accurate individualized predictions are both robust and applicable across a variety of conditions, including movie types, languages, scanning parameters, and scanner models. Importantly, the intricate connectivity profiles remain consistent even when participants view entirely different movies, as evidenced by Figure 4-figure supplement 9, reinforcing the prediction's stability in various scenarios. However, all four datasets in this study only included typical participants with anatomically intact brains. An unanswered question is whether individualized topographies of neuropsychological populations with atypical cortical function (e.g., developmental prosopagnosics) or with lesioned brains (e.g., acquired prosopagnosics) could also be accurately predicted using the hyperalignment-based methods. Up to now, as far as we know, no previous literature has investigated this question. Beyond neuropsychological groups, it is also valuable to investigate how well the predictions will be across a wide range of age, from infants to the elderly. Future research is essential to adapt our algorithms to diverse populations.”

      Reviewer #2 (Public Review):

      Guo and her colleagues develop a new approach to map the category-selective functional topographies in individual participants based on their movie-viewing fMRI data and functional localizer data from a normative sample. The connectivity hyperalignment are used to derived the transformation matrices between the participants according to their functional connectomes during movies watching. The transformation matrices are then used to project the localizer data from the normative sample into the new participant and create the idiosyncratic cortical topography for the participant. The authors demonstrate that a target participant's individualized category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. The new approach allows researchers to estimate a broad range of functional topographies based on naturalistic movies and a normative database, making it possible to integrate datasets from laboratories worldwide to map functional areas for individuals. The topic is of broad interest for neuroimaging community; the rationale of the study is straightforward and the experiments were well designed; the results are comprehensive. I have some concerns that the authors may want to address, particularly on the details of the pipeline used to map individual category-selective functional topographies.

      We thank the reviewer for the encouragement.

      1) How does the length of the scan-length of movie-viewing fMRI affect the accuracy in predicting the idiosyncratic cortical topography? Similarly, how does the number of participants in the normative database affect the prediction of the category-selective topography? This information is important for the researchers who are interested in using the approach in their studies.

      To investigate the influence of movie-viewing data length and the number of participants in the normative database on prediction performance, we systematically varied these parameters. Specifically, we altered the number of runs utilized in the analysis for both the normative and target data and experimented with varying the number of participants in the normative dataset using the Budapest and the Sraiders datasets. We have included a new Figure 4-figure supplement 5 to present a summary of these findings.

      The results reveal that both within-dataset and between-dataset prediction performances are positively correlated with the length of movie-viewing fMRI data used for both the normative and target groups. A similar trend was observed with respect to the number of participants included in the normative dataset. It is important to highlight, though, that, even when analyzing as little as one run of movie-viewing data—roughly 10-15 minutes, our hyperalignment-based prediction performance was significantly higher than that achieved using traditional surface alignment. This held true even when the normative dataset included as few as five participants.

      In summary, our results show that prediction performance generally improves with longer movie-viewing sessions and larger normative datasets. However, it is noteworthy that even with minimal data—10 minutes of movie-viewing and a small number of participants in the normative dataset—our algorithm still outperforms traditional surface alignment methods significantly.

      We also added sentences in the discussion section (page 15):

      “We investigated the influence of naturalistic movie length and the size of the training group on the prediction accuracy of individualized functional topographies. By incrementally increasing both the number of movie runs in the training and target dataset and the participants in the training group in the Budapest and Sraiders dataset, we observed enhanced prediction accuracy (Figure 4-figure supplement 5). Notably, even with just one movie run in the training or target dataset, or with a mere five participants in the training group, our prediction performance (Pearson r) ranged from about 0.6 to 0.7. This accuracy significantly outperformed results obtained using surface-based alignment.”

      2) The data show that category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. I'm wondering whether the functional connectome from resting state fMRI can do the same job as the movie-watching fMRI. If it is yes, it will expand the approach to broader data.

      We agree with the reviewer that demonstrating the applicability of the resting state data will expand the application scenarios of this approach. Most previous findings on resting state connectivity, including the comparison between the naturalistic and the resting state paradigms, focused on the macro-scale similarities and differences (e.g., Samara et al., 2023). Very few studies have investigated the fine-scaled connectome based on resting state data. The study on connectivity hyperalignment (Guntupalli et al., 2018) demonstrated a shared fine-scale connectivity structure among individuals that co-exists with the common coarse-scale structure and built the algorithm to successfully hyperalign individuals to the shared fine-scaled space. Another study from our lab (Feilong et al., 2021) revealed that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence, indicating reliable and biologically relevant fine-scaled resting state connectome structures. Thus, it is highly plausible that our approach is able to be generalized to the resting state data, generating significantly better predictions of individualized functional topographies than traditional surface alignment. However, due to the limitations of the current datasets, we do not have resting state data available in the current datasets to perform this analysis. We are in the process of collecting new data to explore this hypothesis in future work.

      We added sentences to the discussion section to discuss this idea (page 18):

      “Studies comparing movie-viewing and resting state functional connectivity have shown that both paradigms yield overlapping macroscale cortical organizations (29), though naturalistic viewing introduces unique modality-specific hierarchical gradients. However, there remains a gap in research comparing the fine-scaled connectomes of naturalistic and resting state paradigms. Guntupalli and colleagues (14) revealed a shared fine-scale structure that coexists with the coarse-scale structure, and connectivity hyperalignment successfully improved intersubject correlations across a wide variety of tasks. Feilong et al. (13) noted that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence. This suggests a reliable and biologically relevant fine-scale resting state connectivity structure among individuals. Therefore, it is plausible that individualized functional topography could be effectively estimated using resting state functional connectivity, expanding the applicability of our approach. Future studies are needed to explore this direction.”

      3) The authors averaged the hyper-aligned functional localizer data from all of subjects to predict individual category-selective topographies. As there are large spatial variability in the functional areas across subjects, averaging the data from many subjects may blur boundaries of the functional areas. A better solution might be to average those subjects who show highly similar connectome to the target subjects.

      We appreciate the reviewer’s insightful question about optimizing prediction performance by selecting participants most similar in functional connectivity to the target individuals. This is a promising direction and difficult problem as well. Our approach is based on fine-scale connectome to hyperalign participants, thus different groups of participants may be similar to the target participant in different searchlights. In addition, based on results discussed in the response to Q2, the more participants included in the normative dataset, the better the prediction performance. Thus, there is a trade-off between the number of participants included in the normative dataset for the prediction and the overall similarity of those participants to the target participant.

      To quantitatively explore this idea, we used a searchlight in the right ventral temporal cortex, roughly at the location of posterior fusiform face area (pFFA).We sorted participants by their connectome similarity to each target participant and then examined prediction performance based on either the top nine most similar participants or the bottom nine least similar participants. Our results, presented in Figure 4-figure supplement 8, reveal that hyperalignment consistently outperforms surface alignment regardless of the subset of participants used. Notably, using the nine most similar participants did not significantly alter prediction performance (Tukey Test, z = -0.09, p = 0.996), while using the least similar participants did negatively impact it (Tukey Test, z = 2.492, p = 0.034). Interestingly, the stability of hyperalignment-based predictions remained high even when only a subset of participants was used, contrasting with the variability observed in surface-alignment-based predictions.

      Overall, these findings suggest that while selecting functionally similar participants is a promising avenue for future optimization, the process will require nuanced, searchlight-specific criteria. Each searchlight may necessitate its own set of optimal participants to balance between the performance boost from having more participants and the fidelity gained from participant similarity.

      We added the following to the discussion in the manuscript (page 16):

      “In our study, we used fine-scale connectomes, noting that some participants are more similar to the target participant in specific searchlights. It is an interesting question whether predictions could be enhanced by exclusively selecting those more similar participants for the target participant. To explore this option, we examined a searchlight in the right ventral temporal cortex that was roughly at the location of the posterior fusiform area (pFFA) using the top and bottom nine participants similar to each target participant measured by their fine-scale connectome similarities in the budapest dataset. Generally, using all or part of the participants for the prediction generated similar results (Figure 4-figure supplement 8). Compared to using all the participants, using only the top nine participants who are the most similar to the target participants did not significantly improve the prediction (Tukey Test, z = -0.09, p = 0.996), but using only the bottom nine participants generated significantly lower prediction accuracies (Tukey Test, z = 2.492, p = 0.034). This suggests a trade-off between the number of participants included in the prediction and the similarity of the participants. Future studies are needed to explore the optimal threshold for the number of participants included for each searchlight to refine the algorithm.”

      4) It is good to see that predictions made with hyperalignment were close to and sometimes even exceeded the reliability values measured by Cronbach's alpha. But, please clarify how the Cronbach's alpha is calculated.

      Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. For example, Guntupalli et al. (2016) used correlations of category-selectivity maps between odd and even localizer runs as the measure of reliability. The odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.

      Cronbach’s alpha has been used in many previously published works from our lab (e.g., Feilong et al., 2021; Jiahui et al., 2020, 2023). The code for implementing this metric is publicly accessible on the first author’s Github repository (https://github.com/GUOJiahui/face_DCNN/blob/main/code/cronbach_alpha.py).

      We added the detailed explanation above to the Material and Methods section (page 24):

      “Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. The common odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.”

      5) Which algorithm was used to perform surface-based anatomical alignment? Can the state-ofthe-art Multimodal Surface Matching (MSM) algorithm from HCP achieve better performance?

      We preprocessed our datasets using fMRIPrep, which employs algorithms from FreeSurfer’s recon-all for surface-based anatomical alignment. It is worth noting that different alignment methods can yield varying degrees of performance. For instance, a study by Coalson et al. (2018) compared the localization performance of multiple surface-based alignment methods, including Multimodal Surface Matching (MSM) and FreeSurfer. The study found that MSM outperformed FreeSurfer in terms of peak probabilities and spatial clustering, suggesting better overall localization.

      Additionally, Guntupalli et al. (2018) evaluated intersubject correlations (ISC) of functional connectivity from movie-viewing data using both Connectivity Hyperalignment (CHA) and MSM-All with the Human Connectome Project (HCP) dataset. The study showed that although MSM-All yielded marginally better ISC than traditional surface alignment, CHA’s performance was significantly superior.

      In summary, while using a more advanced alignment algorithm like MSM could marginally improve prediction performance, its advantages may not be substantial when compared to our CHA-based predictions. The combination of MSM and CHA represents an intriguing direction for future research, although it falls outside the scope of our current study.

      6) Is it necessary to project to the time course of the functional localizer from the normative sample into the new participants? Does it work if we just project the contrast maps from the normative samples to the new subjects?

      It is an interesting question and a practical alternative to researchers to know whether time series of the localizer runs are required to obtain reasonable predictions, as in some scenarios, contrast maps may be the only accessible data in the analysis. To quantitatively explore this possibility, we applied transformation matrices derived from the movie data to training participants’s individual pre-calculated contrast maps of all four categories, and evaluated the predictions. We found nearly similar prediction performance between the two flavors within and across datasets (Figure 4-figure supplement 7). However, it is worth noting that applying transformation matrices directly to contrast maps did not get as much improvement in the interactive steps as the other flavor in the advanced CHA, perhaps due to the scale changes when multiple iterations were implemented and the difficulty to properly normalize the t-maps compared to the regular time series.

      Overall, although our algorithm is originally designed to be used on the time course of the functional localizer runs, relatively comparable results can be generated even when the contrast maps are directly projected from the normative group to the target participant. However, to derive the best results with our approach, time series are recommended when the situation permits.

      We have also added the contents into the Discussion section (page 16):

      “Our original algorithm is designed to apply transformation matrices to the time series of localizer data of training participants before generating contrast maps. To explore whether directly applying these matrices to pre-calculated contrast maps yields comparable results, we conducted an additional analysis across the four categories. Our findings indicate that the prediction outcomes were indeed quite similar between the two approaches for both the within- and across-datasets predictions (Figure 4-figure supplement 7). However, it is worth noting that the improvements observed with enhanced CHA were not as pronounced when applied directly to the contrast maps as opposed to the time series.”

      7) Saygin and her colleagues have demonstrated that structural connectivity fingerprints can predict cortical selectivity for multiple visual categories across cortex (Osher DE et al, 2016, Cerebral Cortex; Saygin et al, 2011, Nat. Neurosci). I think there's a connection between those studies and the current study. If the author can discuss the connection between them, it may help us understand why CHA work so well.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      Reviewer #3 (Public Review):

      In this paper, Jiahui and colleagues propose a new method for learning individual-specific functional resonance imaging (fMRI) patterns from naturalistic stimuli, extending existing hyperalignment methods. They evaluate this method - enhanced connectivity hyperalignment (CHA) - across four datasets, each comprising between nine (Raiders) and twenty (Budapest, Sraiders) participants.

      The work promises to address a significant need in existing functional alignment methods: while hyperalignment and related methods have been increasingly used in the field to compare participants scanned with overlapping stimuli (or lack thereof, in the case of resting state data), their use remains largely tied to naturalistic stimuli. In this case, having non-overlapping stimuli is a significant constraint on application, as many researchers may have access to only partially overlapping stimuli or wish to compare stimuli acquired under different protocols and at different sites.

      It is surprising, however, that the authors do not cite a paper that has already successfully demonstrated a functional alignment method that can address exactly this need: a connectivitybased Shared Response Model (cSRM; Nastase et al., 2020, NeuroImage). It would be relevant for the authors to consider the cSRM method in relation to their enhanced CHA method in detail. In particular, both the relative predictive performance as well as associated computational costs would be useful for researchers to understand in considering enhanced CHA for their applications.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      With this in mind, I noted several current weaknesses in the paper:

      First, while the enhanced CHA method is a promising update on existing CHA techniques, it is unclear why this particular six step, iterative approach was adopted. That is: why was six steps chosen over any other number? At present, it is not clear if there is an explicit loss function that the authors are minimizing over their iterations. The relative computational cost of six iterations is also likely significant, particularly compared to previous hyperalignment algorithms. A more detailed theoretical understanding of why six iterations are necessary-or if other researchers could adopt a variable number according to the characteristics of their data-would significantly improve the transferability of this method.

      In the advanced connectivity hyperalignment implementation, we gradually increased the number of targets. The six steps were not intentionally chosen but were the result of the increase to the maximum number of fine-grained targets, namely single cortical vertices.

      Our datasets were resampled to the cortical mesh with 18,742 vertices across both hemispheres (approximately 3 mm vertex spacing; icoorder 5; 20,484 vertices before removing non-cortical vertices). Step 1 was the classic standard connectivity hyperalignment implementation based on the anatomically-aligned data. Since using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data generates poor functional correspondence across participants (Busch et al., 2021), we used 1,284 vertices (icoorder 3, before removing the medial wall) as connectivity targets in step 1. However, it is beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales. To better align across participants, we iterated the alignment for another two times (step 2 and step 3) with the same number of 1,284 coarse connectivity targets to ensure improved alignment before increasing the number of targets in the later steps. In step 4, we increased the number of targets to 5,124 (icoorder 4, before removing the medial wall), and iterated with this number of vertices for two times in total (step 4 & step 5) before using all vertices as targets. In the final step (step 6), all vertices were used as connectivity targets.

      It is true that the multiple iteration steps largely increased the computational complexity compared to the classic connectivity hyperalignment, but the prediction increase was steady across all datasets and became comparable to response hyperalignment performance which requires time-locked stimuli. We did not use an explicit loss function in the algorithm, but followed the natural progression of the number of potential connectivity targets in the implementation. On the other hand, the difference between the performance of the improved and the classic connectivity hyperalignment was relatively small (difference of r < 0.05), which indicates the effectiveness of our classic algorithm. It is up to the researchers’ own options to adopt the number of iterations and the pace of increasing the number of targets in each step. If computational resources are limited or if a shorter total computational time is the primary priority, using the classic connectivity hyperalignment may be the best option to balance the trade-offs.

      The Materials and Methods section had the details of the implementation (page 22-23):

      “Using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data usually generates poor functional correspondence across participants (33). It is, however, beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales.

      We used six steps to further improve the connectivity hyperalignment method. Step 1 was the initial connectivity hyperalignment step as described above that was based on the raw anatomically aligned movie data. The resultant transformation matrices were applied to those movie runs, and the hyperaligned data were then used in step 2 to calculate new connectivity patterns and calculate new transformation matrices. We repeated this procedure iteratively six times and derived transformation matrices for each step. In steps 1, 2, and 3, 642 × 2 (icoorder3, before removing the medial wall) connectivity targets were defined with 13 mm searchlights. In step 4 and 5, 2562 × 2 (icoorder 4, before removing the medial wall) connectivity targets were used with 7 mm searchlights to calculate target mean time series. In the final step 6, all 18742 vertices were included as separate connectivity targets, using each vertex’s time series rather than calculating the mean in a searchlight. Each step of this advanced connectivity hyperalignment algorithm increased the prediction performance (Figure 4-figure supplement 2).”

      But to help the readers understand the logic of the advanced connectivity hyperalignment algorithm used in this study, we expanded the discussion section (page 15):

      “Because using dense connectivity targets (e.g., using all vertices as connectivity targets) with anatomically-alignment data often leads to suboptimal alignment across participants (33), we started with coarse connectivity targets and gradually increased the number of connectivity targets to form a denser representation of connectivity profiles. The iterations improved the prediction performance step by step, and at the final step (step 6, all vertices were used as connectivity targets) in this analysis, the enhanced CHA generated comparable performance with RHA (Figure 4-figure supplement 4).”

      Second, the existing evaluations for enhanced CHA appear to be entirely based on imagederived correlations. That is, the authors compare the predicted image from CHA with the ground-truth image using correlation. While this provides promising initial evidence, correlation-based measures are often difficult to interpret given their sensitivity to image characteristics such as smoothness. Including Cronbach's alpha reliability as a baseline does not address this concern, as it is similarly an image-based statistic. It would be useful to see additional predictive experiments using frameworks such as time-segment classification, intersubject decoding, or encoding models.

      We appreciate the reviewer’s concern regarding the stability of local correlations in relation to image characteristics. To address this, we conducted additional analysis using different searchlight sizes (with radii of 10 mm, 15 mm, and 20 mm) to evaluate the predicted categoryselective maps, focusing specifically on the Budapest dataset. The local correlations between the predicted category-selective maps (obtained using enhanced CHA) and participants’ own maps based on classic localizer runs were calculated for each searchlight. We averaged these correlations across participants and plotted the resulting maps, as shown in Figure 4-figure supplement 10. Although using a larger searchlight radius is similar to employing a larger smoothing kernel, the results remained relatively stable across different searchlight sizes, particularly in regions selectively responsive to the specific category. This stability suggests that while the evaluation may be influenced by image-related features, the conclusion would remain consistent under varying parameters.

      As for the use of enhanced CHA, it serves as an optimized version of the classic CHA, specifically designed for predicting individualized functional topographies. Evaluating prediction performance in our study is based on t-value contrast maps for each participant. Given this, it's unclear how time-segment classification or other decoding/encoding models could be appropriately implemented for performance evaluation. However, prior research from our lab has already established the effectiveness of classic CHA. Specifically, Guntupalli et al. (2018) showed that classic CHA significantly improved intersubject correlations (ISC) of connectivity profiles across the cortex. They also revealed that CHA captured fine-scale variations in connectivity profiles for nearby cortical nodes across participants and led to improved betweensubject multivariate pattern classification accuracies (bsMVPC) of movie segments. These findings serve as robust evidence for the effectiveness of classic CHA, laying the groundwork for our enhanced CHA approach.

      We added Figure 4-figure supplement 10 to the supplementary material:

      Addressing these concerns and considering cSRM as a comparison model would significantly strengthen the paper. There are also notable strengths that I would encourage the authors to further pursue. In particular, the authors have access to a unique dataset in which the same Raiders of the Lost Ark stimulus was scanned for participants within the Budapest (SRaiders) dataset as well as non-overlapping participants in the Raiders dataset. Exploring the relative performance for cross-movie prediction within a dataset as compared to a shared movie prediction across datasets is particularly interesting for methods development. I would encourage the authors to explicitly report results in this framework to highlight both this unique testing structure as well as the performance of their enhanced CHA method.

      We appreciate the reviewer's suggestion to examine a shared time-series but non-overlapping participants scenario using the Sraiders and Raiders datasets. However, there are significant differences between the two datasets that preclude such direct comparison. These differences include varying scanning parameters, MRI scanners, localizer types, and data collection procedures. Due to these methodological divergences, the datasets cannot be treated as identical time-series.

      Firstly, the scanning parameters vary considerably. Sraiders were scanned with TR = 1 s (TR/TE = 1000/33 ms, flip angle = 59 °, resolution = 2.5 mm3 isotropic voxels, matrix size = 96 × 96, FoV = 240 × 240 mm, multiband acceleration factor = 4, and no in-plane acceleration), and Raiders were scanned with TR = 2.5 s (TR = 2.5 s, TE = 35 ms, Flip angle = 90°, 80 × 80 matrix, FOV = 240 mm × 240 mm, resolution = 0.938 mm × 0.938 mm × 1.0 mm).

      Secondly, participants in the Sraiders were scanned with a 3 T S Magnetom Prisma MRI scanner with a 32 channel head coil and the Raiders dataset, collected more than 10 years ago, used a 3T Philips Intera Achieva scanner with an eight-channel head coil.

      Thirdly, the stimuli presentations were different. In the Sraiders dataset, the movie Raiders of the Lost Ark was split into eight parts (~15 min each), and the first four parts were watched outside of the scanner prior to the scanning (~56 min). The later four parts were watched in the scanner (57 min) with audio. And in the Raiders dataset, the audio-visual movie was split into eight parts (~15 min each). Participants watched all eight parts in the scanner with audio (one part / per run).

      Fourthly and critically, the two datasets included two types of localizers. The Sraiders dataset included dynamic localizer runs, and the Raiders dataset only contained a static localizer that was similarly designed as in the Forrest dataset.

      With all four points, it is not suitable to treat the two datasets as identical time-series. The difference in the localizer type is a further issue. The topographies generated from the two types of localizers are dissimilar in many ways. For all categories, the dynamic localizer elicited stronger and broader category-selective activations than the static localizer, and the searchlight analysis showed that the dynamic localizer had higher reliabilities across the cortex, especially in regions that were selectively responsive to the target category. Due to these differences, crossdataset predictions yielded lower correlations than within-dataset predictions. This is not indicative of methodological failure but reflects diverging topographies activated by different localizers.

      In the manuscript, we have extensively analyzed cross-dataset predictions (Figure 2-figure supplement 1-Figure 4-figure supplement 4 & 6).

      ● Figure 2-figure supplement 1 demonstrates that, despite the limitations of cross-localizertype evaluation, both R-to-S (Raiders to Sraiders) and S-to-R (Sraiders to Raiders) predictions significantly outperformed surface alignment methods across categories.

      ● Figure Figure 2-figure supplement 2 confirms that the prediction performance remained stable across individual participants, underscoring the robustness of our methodology.

      ● Figure 3-figure supplement 1 & Figure 3-figure supplement 2 display contrast maps generated from both native and alternate localizers, revealing that the maps share similar topographies irrespective of the dataset origin.

      ● Figure 4-figure supplement 1 presents a correlation analysis of local similarities in R-to-S and S-to-R predictions, highlighting particularly strong correlations in the ventral face regions.

      ● Figure 4-figure supplement 2 employs histograms to showcase performance across major cortices and furnishes additional evidence regarding the influence of localizer types on the results.

      ● Figure 4-figure supplement 3 offers a searchlight analysis for other categories, enriching the scope of our investigation.

      ● Figure 4-figure supplement 4 affirms that the advanced CHA is effective in both R-to-S and S-to-R predictions.

      ● Figure 4-figure supplement 6 compares the efficacy of 1-step vs. 2-step prediction methods for R-to-S and S-to-R, showing a clear advantage for the 1-step approach.

      These analyses affirmed that our approach outperforms surface alignment methods. But the inherent limitations in data collection and localizer types preclude a direct exploration of the reviewer’s hypothesis. These complexities necessitate further research to fully validate the proposed scenario.

      Overall, I share the authors' enthusiasm for the potential of cross-movie, cross-dataset prediction, and I believe that methods such as enhanced CHA are likely to significantly improve our ability to make these comparisons in the near future. At present, however, I find that the theoretical and experimental support for enhanced CHA is incomplete. It is therefore difficult to assess how enhanced CHA meets its goals or how successfully other researchers would be able to adopt this method in their own experiments.

      We hope our new analysis and replies addressed the reviewer’s concerns.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1)The authors demonstrate that Isw1 has a role in responding to antifungals in Cryptococcus. However, it is not clear if changes in Isw1 stability represent a general response to stress. This study would have benefited from experiments to test: (1) if levels of Isw1 change in response to other stressors (e.g., heat, osmotic, or oxidative stress) and (2) if loss of Isw1 impacts resistance to other stressors.

      A series of experiments were conducted to illustrate and measure phenotypic traits associated with virulence. These traits encompassed capsule formation, melanin synthesis, cell proliferation under stressful conditions, and Isw1 expression levels in response to diverse environmental stimuli. Please see Figure 3a, 3b, 3c, Figure 3-figure supplement 1 and line 237-241.

      2) The authors demonstrate a critical role in the acetylation of K97 and ubiquitination of K441 in regulating Isw1 stability. Additionally, this study shows that K113 is also likely involved in this process. However, it appears that K113 can be either acetylated or ubiquitinated, and it is, thus, less clear if one of the two modifications or both modifications is critical at this residue. Additional experiments may be required to answer this question. This study would have benefited from an additional discussion on the results related to the modification of K113.

      We express our genuine gratitude for this insightful critique pertaining to the K113 site. In our study, we observed the presence of acetylation and ubiquitination changes at the K113 site in our mass spectrometry data. This finding suggests that a proportion of Isw1 is acetylated, while another proportion of Isw1 is ubiquitinated. In order to analyze the K113 function, a series of experiments were conducted, involving the production of triple, double, and single mutations at positions K89, K97, and K113. In addition, the utilization of K-to-R (mimicking deacetylation) and K-to-Q (mimicking acetylation) methodologies was implemented. To elucidate the significance of the acetylation modification of K113, a series of mutants were created. The K-to-R mutation was employed to indicate the deacetylation and deubiquitylation status, while the K-to-Q mutation was utilized to represent the acetylation and deubiquitylation status. In our dataset, it was shown that neither the single mutation of K113 K-to-R nor K-to-Q exhibited any discernible drug resistance phenotype. This finding suggests that, within the physiological context of the Isw1 protein, both post-translational modifications (PTMs) of K113 had minimal or no impact on the regulation of drug resistance. The reason for this phenomenon is because the acetylation modification of K97 imitates the process of ubiquitination of Isw1, hence reducing the interaction between Isw1 and Cdc4, which is an E3 ligase. Hence, the ubiquitination of K113 does not play a crucial role in the regulation of Isw1 protein stability under conditions where K97 is completely acetylated. Nevertheless, upon deacetylation of K97, we observed a notable increase in the abundance of Isw1 protein when K113 is substituted with R. This finding strongly supports the notion that ubiquitination of K113 plays a crucial role in maintaining the stability of the Isw1 protein. Hence, in the case of K97 acetylation, the PTM modifications of K113 are not required for maintaining Isw1 protein levels. However, in the event of K97 deacetylation, the ubiquitination of K113 becomes crucial in regulating protein stability. Considering the intricate post-translational modification (PTM) regulation observed at the K113 site, it would be advantageous to generate antibodies specific to K113ac and K113ub in order to comprehensively investigate the functional role of K113 in the regulatory processes. Nevertheless, the presence of antibodies targeting site-specific ubiquitination is infrequent in scientific literature. We regret any confusion that may have arisen from the previous remark and have made revisions to the manuscript to address this issue. Please refer to line 485-500.

      3)The authors demonstrate that overexpression of ISW1 in select clinical isolates of Cryptococcus increases sensitivity to antifungals. However, these experiments would have benefited from additional controls, such as including overexpression of ISW1 in the wild-type strain (H99) and antifungal-sensitive isolate (CDLC120).

      In response to your concern, we successfully generated the strains as required. In the revised manuscript, we demonstrated that the overexpression of the stable variant of Isw1 in H99 and CDLC120 strains induces heightened susceptibility to antifungal drugs. Please see Figure 8e, 8i and line 404-413.

      Reviewer #3 (Public Review):

      1) ISWI chromatin remodellers are well-characterised in many organisms. How many ISWI proteins does Cryptococcus contain? Why did the authors focus on ISWI?

      We express our gratitude for this criticism. The identification of Isw1 was conducted as a further investigation building upon the findings presented in our previously published data (Li Y, 2019). In prior research, the acetylome in C. neoformans was comprehensively analyzed, and a series of knockout strains were created to investigate the relationship between fungal pathogenicity and acetylation. The Isw1 mutant has been discovered as a modifier of drug resistance. The identification of fungal paralogs of ISW genes was initially observed in Saccharomyces cerevisiae, a species of yeast that has experienced genome duplication. This process involves two paralogs, Isw1 and Isw2, which emerged as a result of the whole genome duplication event (Kellis M, 2004; Tsukiyama T, 1999; Wolfe KH, 1997). Because C. neoformans has not gone through the complete genome duplication event, its genome only encodes one copy of ISW gene. Please see line 129-134..

      2) What is the ISWI protein complex(es)? The Mass-Spec analysis should reveal this.

      Prior research conducted on Saccharomyces cerevisiae has provided evidence that the ISWI complex is comprised of several subunits, namely Isw1, Ioc genes, Itc1, Chd1, and Sua7 (Mellor J, 2004; Smolle M, 2012; Sugiyama and Nikawa, 2001; Vary JC Jr, 2003; Yadon AN, 2013). Upon a thorough examination of the C. neoformans genome, we have not been able to identifying a similar the IOC gene family. This absence likely suggests an evolutionary loss of the IOC gene family in C. neoformans, as suggested on the FungiDB website. However, C. neoformans has Itc1, Chd1, and Sua7. While we concur with the aforementioned statement on the capability of Mass-Spec data to elucidate potential protein-protein interactions and aid in the identification of subunits within the ISWI complex, it is important to acknowledge that the PTM Mass-Spec methodology is solely employed for the purpose of identifying potential sites of protein modification. In order to comprehensively investigate the cryptoccocal ISWI complex, we conducted a standardized Isw1-Flag protein immunoprecipitation procedure, followed by Mass-Spec analysis. In the present study, a total of 22 proteins that interact with Isw1 were found in our experimental data. Among these proteins, 11 have been previously reported to be associated with the regulatory networks including Isw1. In the mass spectrometry results, the protein Itc1 was found to be co-immunoprecipitated with the protein Isw1. Although the Mass-Spec analysis did not reveal the presence of Chd1 and Sua7, our study demonstrated that Chd1 can be coimmunoprecipitated with Isw1 through the utilization of co-IP and immunoblotting techniques. However, no interaction between Isw1 and Sua7 was shown utilizing any of these methods. In brief, cryptococcal ISWI regulatory machinery is distantly related to that from S. cerevisiae. Please see Figure 2 and line 206-219.

      3) Is Cryptococcus ISWI a transcriptional activator or repressor?

      We regret the erroneous representation of Isw1 in the prior iteration of the manuscript. The misclassification of Isw1 as a transcriptional regulator has been identified, since it has been determined to function as a chromatin remodeler instead. The text has been suitably revised in accordance with academic standards. In the revised publication, we have presented a comprehensive transcriptome analysis of the isw1 Δ strain under both FLC treatment and no treatment conditions. This analysis offers valuable insights into the gene regulatory patterns associated with Isw1. In our dataset, we observed that Isw1 exerts a negative regulatory effect on the expression of genes that encode drug pumps, while simultaneously exerting a positive regulatory effect on the expression of genes that are essential for 5-FC resistance. Moreover, the ChIP-PCR study demonstrated the binding of Isw1 to the promoter regions of genes of interest. Hence, the chromatin remodeler Isw1 has a dual role, wherein it both facilitates the activation of certain genes and suppresses the expression of others, in response to varying forms of drug resistance. Please see line 142-153.

      4) Is ISWI function in drug resistance linked to its chromatin remodelling activity?

      In order to investigate the potential role of Isw1 on chromatin activity in the modulation of multidrug resistance, we have conducted protein truncation experiments. Specifically, we deleted the DNA binding domain, the helicase domain, and the SNF2 domain, which have been previously shown to regulate Isw1 chromatin activity in the model organism S. cerevisiae (Grune T, 2003; Mellor J, 2004; Pinskaya M, 2009; Rowbotham SP, 2011). The new data demonstrated that all truncation variants of Isw1 mutants had a growth phenotype consistent with that of the deletional strain isw1Δ. In addition, the levels of gene expression observed in these strains were also similar to those observed in the deletion strain isw1Δ. This finding provides evidence that the regulation of the drug resistance mechanism is influenced by these critical domains involved in modifying chromatin activities. Moreover, the Isw1-Flag strain was utilized to conduct chromatin immunoprecipitation and PCR experiments, which revealed that Isw1 exhibits the ability to directly bind to the promoter regions of target genes. The new findings added evidence substantially supporting the hypothesis that the Isw1 chromatin activity plays a crucial role in modulating its protein function, and acting as a central regulator of drug resistance in C. neoformans. Please see revised Figure 1g, 1h, 1i and line 186-199 in the revised manuscript text.

      5) Does ISWI interact with chromatin? If so, which are ISWI-target genes? Does drug treatment modulate chromatin binding?

      To effectively tackle this concern, we have pursued two distinct approaches to demonstrate the chromatin regulatory effects of Isw1. In this study, the DNA binding domain was deliberately removed through genetic manipulation. The data presented indicates that the Isw1 mutants with shorter variations exhibited a growth phenotype that was characterized by multidrug resistance. This growth phenotype correlates with the growth phenotype obtained in the isw1Δ deletion strain. Additionally, it was observed that the levels of gene expression in the strain were comparable to those detected in the deletion strain isw1Δ. This discovery offers empirical support for the notion that the control of the drug resistance mechanism is indeed impacted by the DNA binding capability of Isw1. Furthermore, the Isw1-Flag strain was employed to perform chromatin immunoprecipitation and PCR assays, demonstrating the direct binding capacity of Isw1 to the promoter regions of target genes. The results obtained from this comprehensive analysis of the revised data offer significant evidence for the proposition that Isw1 interacts with chromatin and that its chromatin activity plays a pivotal role in modulating its protein function. This interaction serves as a central regulatory mechanism for drug resistance in C. neoformans. Furthermore, a transcriptome analysis was performed on both wildtype and isw1 deletion strains in the absence of FLC therapy. Upon comparing the results obtained from two unique experimental settings, specifically those with and without FLC administration, a notable disparity in the control of gene expression between these two situations was identified. In the context of the isw1 deletion strain exposed to FLC treatment, a set of 21 genes, including those belonging to the ABC/MFS family and efflux pumps, displayed significant changes in their gene expression patterns. In particular, a total of 9 genes exhibited downregulation, whilst 12 genes displayed upregulation. In contrast, in the absence of FLC supplementation, a total of 9 genes exhibited alterations in gene expression, with 3 genes showing downregulation and 6 genes showing upregulation. Therefore, the Isw1 protein plays a crucial role in the activation of certain genes, while simultaneously having a suppressive effect on other genes. Hence, the Isw1 undergoes a reconfiguration of its regulatory apparatus in response to drugs. Despite that the performance of ChIP-seq analysis was necessary in this study, it was observed that the treatment of fungal cells resulted in a notable decrease in the abundance of the Isw1 protein. This decrease can be attributed to the activation of Isw1 protein degradation. Consequently, there was an insufficient amount of Isw1 protein available for successful enrichment and subsequent ChIP-seq analysis (please see Figure 4a and 4c). However, the data collected collectively have demonstrated the idea that Isw1 serves as a crucial master regulator of drug resistance in C. neoformans. The text has undergone revisions in order to present our findings in a precise and thorough manner. Please see Figure 1c, 1g, Supplementary File 2, and line 145-153, 186-188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Multimodal experiences that for example contain both visual and tactile components are encoded as associative memories. This manuscript is a valuable contribution supporting structural and functional brain plasticity following associative training protocols that pair together different types of sensory stimuli. The results provide solid support for this plasticity being a basis for cross-modal associative memories.

      We appreciate eLife assessments to our discovery about the recruitment of associative memory neurons in cerebral cortices as a hub for the fulfillment of the first order and the second order of associative memory. Synapse interconnections among associative memory neurons mediate the reciprocal retrieval, the conversion and the translation of associated signals learnt in life span.

      Reviewer #1 (Public Review):

      This manuscript by Xu and colleagues addresses the important question of how multi-modal associations are encoded in the rodent brain. They use behavioral protocols to link stimuli to whisker movement and discover that the barrel cortex can be a hub for associations. Based on anatomical correlations, they suggest that structural plasticity between different areas can be linked to training. Moreover, they provide electrophysiological correlates that link to behavior and structure. Knock-down of nlg3 abolishes plasticity and learning. This study provides an important contribution as to how multi-modal associations can be formed across cortical regions.

      We sincerely thank Reviewer one’s comments, which is a great driving force for us to move forward to reveal the specific roles of neural circuits in associative memory and its relevant cognitive activities and emotional reactions.

      Reviewer #2 (Public Review):

      This manuscript by Xu et al. explores the potential joint storage/retrieval of associated signals in learning/memory and how that is encoded by some associative memory neurons using a mouse model. The authors examined mouse associative learning by pairing multimodal mouse learning including olfactory, tactile, gustatory, and pain/tail heating signals. The key finding is that after associative learning, barrel neurons respond to other multi-model stimulations. They found these barrel cortical neurons interconnect with other structures including piriform cortex, S1-Tr and gustatory cortical neurons. Further studies showed that Neuroligin 3 mediated the recruitment of associative memory neurons during paired stimulation group. The authors found that knockdown Neuroligin 3 in the barrel cortex suppressed the associative memory cell recruitment in the paired stimulation learning. Overall, while the findings of this study are interesting, the concept of associative learning involving multiple functionally connective cortical regions is not that novel. While some data presented are convincing, the other seems to lack rigor. In addition, more details and clarification of the experimental methods are needed.

      Thank you so much for your comments on our studies in terms of the recruitment of associative memory neurons as the hub for the joint storage and reciprocal retrieval of multi-modal associated signals. You are right about that the concept of associative memory neuron and the new established interconnection among cerebral cortices for the formation of associative memory are not novel. The original finding has been reported by senior author’s lab many years ago, which has also been presented in a book by Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019. In addition, we have made certain clarifications in our revision, but the detailed information about experimental approaches and concepts are expected to be seen in our previous publications and this book as well.

      Reviewer #1 (Recommendations For The Authors):

      I have two points that I find would strengthen the manuscript further:

      1. Associative memories are also based on specificity, which is not addressed in this manuscript. The authors could discuss this and also the magnitude of plasticity. In general, I would suggest also testing plasticity in response to a non-linked stimulus to prove specificity.

      This a good point. In terms of the specificity of associative memory in our model, we have shown this point in our previous studies, such as Wang, et al. “Neurons in the barrel cortex turn into processing whisker and odor signals: a cellular mechanism for the storage and retrieval of associative signals”. Frontiers in Cellular Neuroscience 9-320:1-17 2015, and Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019.

      1. Nlg3 knock-down is a strong intervention. The authors could discuss the implications of interfering with synapse assembly and mechanistic implications at the synaptic level. It could help to compare the consequences of this intervention to a post-training lesion.

      This is a good point. To prevent the possibility of post-training lesion by the intervention of Nlg3 knockdown, we have conducted the use of shRNA-scramble control. In addition, the discussion about the intervention of Nlg3 knockdown at synapse level has been added in our discussion.

      1. In general, the clarity of the wording in some sections/sentences could be improved.

      The rewording of certain sentences has been done in our revision.

      Reviewer #2 (Recommendations For The Authors):

      1. The writing of the manuscript needs major editing, there are grammatical errors even in the title. The extremely long introduction and discussion section with repeated details can be distracting from the main focus of the work.

      This point has been taken during our revision.

      1. Many bar graphs, such as Figure 5C and 5G, Figure 6C-6G, have low-resolution images, meaning that the axis titles and labels are unreadable.

      The resolution of Figures have been improved in our revision.

      1. The bar graph with data points and illustration in Figure 1E and 1G are misplaced.

      This mistake has been corrected in our revision.

      1. On page 23, Figure 2B, which layer(s) of the PC, S1Tr and GC were the images taken from? In the PSG group, why is there no red axon terminal signal observed in the three regions? does it indicate that there is no significant projection from the BC axon to PC, S1Tr, or GC neurons? Given that Thy1-YFP labeled glutamatergic neurons at PC, S1Tr, and GC and there is no discernable co-localization of yellow and green cells, can we assume that the glutamatergic neurons at PC, S1Tr, and GC are not involved in the associative learning after PSG paradigm? Lastly, the number of synapse contacts in Figure 2E is only 1-2 per 100um dendrite, but this is not quite consistent with the confocal images in Figure 2D. In Figure 2D, there are at least three tdTomato boutons on the cropped dendrite which is ~16um according to the scale bar.

      If we magnify Figure 2B, we are able to see red boutons, which can be seen in Figure 2C with a higher magnification. In addition, the distribution of synapse contacts is variable, we have demonstrated the averaged values of synapse contacts over dendrites in Figure 2E, such that the single original image may not exactly same as the statistical data.

      1. Figure 4C and Figure 8C, how were the percentages of associative neurons calculated after LFP recording? More details are needed on the method of this in vivo LFP/single unit recordings, including the spike sorting algorithm.

      In the section of Results, the total number of neurons recorded in each of groups has been given. For instance, the neurons recorded from PSG mice (Figure 4) were 70, which was used as denominator. With the number of neurons that responded to two or more signals, the percentage of associative memory neurons recruited in associative learning was calculated. This information has been added in our revision (please see the section of Results).

      1. The rationale for the authors choosing Neuroligin 3 as the target for investigating the formation of new synapse interconnections between BC, PC, S1Tr, and GC after PSG should be more clearly spelled out. Synaptic CAMs include SynCAM, NCAM, Neurexin, Cadherin et al all play a role in new synapse formation. Neuroligin 1 is expressed specifically in the CNS at excitatory synapses. Why did the authors choose to study Neuroligin 3 instead of Neuroligin 1?

      This is a good point. Based on our previous data, miRNA-324 is upregulated during the associative learning by our mouse model, which degrades neuroligin-3 mRNA. The role of neuroligin-3 in the formation of new synapses and the recruitment of associative memory neurons is studied in this paper.

      1. The behavioral results in Figure 5B-5G indicated that after pair-stimulation of WS-OS, WS-TS, or WS-GS, the memory learned in piriform, S1-Tr and gustatory cortical neurons can be retrieved from each other, by jumping over the barrel cortex. Is it possible that there is some direct interconnection formed between piriform, S1-Tr, and gustatory cortical neurons? Maybe they can try to do barrel cortical lesion or chemogenetic inhibition after PGS training and then repeat the behavioral tests as in Figure 5B-5G.

      We have done experiments to examine the potential direct interconnection among piriform, S1-Tr and gustatory cortical neurons, after the associative learning about twelve days. We have no convincing data to support this possibility at this moment.

      1. Some of the images showing the location of virus injections look VERY similar, such as Figure 3A left and right, Figures 7A and 7D. Larger variability of different animals/injection sites is definitely expected.

      The injected viruses in Figure 3 and Figure 7 are different, since AAV-carried fluorescent proteins in different cortical areas are different. In addition, if we carefully enlarge the images in the right and left panels of Figure 3A, we will see that the areas of AAV transfection in morphology are different. The similarity of injection areas as Reviewer two claimed indicates the more precision of our virus-injection sites.

      1. On page 49, are the green neurons in Figure 9B the BC cells? Just to be consistent, the authors should use the same color for BC cells as in Figure 9A. Also, label the primary and the secondary associative memory cells in Figure 9.

      Figure 9 has been thoroughly changed in our revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Soudi, Jahani et al. provide a valuable comparative study of local adaptation in four species of sunflowers and investigate the repeatability of observed genomic signals of adaptation and their link to haploblocks, known to be numerous and important in this system. The study builds on previous work in sunflowers that have investigated haploblocks in those species and on methodologies developed to look at repeated signals of local adaptations. The authors provide solid evidence of both genotype-environment associations (GEA) and genome-wide association study (GWAS), as well as phenotypic correlations with the environment, to show that part of the local adaptation signal is repeatable and significantly co-occur in regions harboring haploblocks. Results also show that part of the signal is species specific and points to high genetic redundancy. The authors rightfully point out the complexities of the adaptation process and that the truth must lie somewhere between two extreme models of evolutionary genetics, i.e. a population genetics view of large effect loci and a quantitative genetics model. The authors take great care in acknowledging and investigating the multiple biases inherent to the used methods (GEA and GWAS) and use a conservative approach to draw their conclusions. The multiplicity of analyses and their interdependence make them slightly hard to understand and the manuscript would benefit from more careful explanations of concepts and logical links throughout. This work will be of interest to evolutionary biologists and population geneticists in particular, and constitutes an additional applied example to the comparative local adaptation literature.

      Some thoughts on the last paragraph of the discussion (L481-497): I think it would be fine to have some more thoughts here on the processes that could contribute to the presence/absence of inversions, maybe in an "Ideas and Speculation" subsection. To me, your results point to the fact that though inversions are often presented as important for local adaptation, they seem to be highly contingent on the context of adaptation in each species. First, repeatability results are only at the window/gene level in your results, the specific mutations are not under scrutiny. Is it possible that inversions are only necessary when sets of small effect mutations are used, opposite to a large effect mutation in other species? Additionally, in a model with epistasis, fitness effects of mutations are dependent on the genomic background and it is possible that inversions were necessary in only certain contexts, even for the same mutations, i.e. some adaptive path contingency. Finally, do you have specific demographic history knowledge in this system that maps to the observations of the presence of inversions or not? For example, have the species "using" inversions been subject to more gene flow compared to others?

      Thank you for the great suggestions and helpful comments. Regarding the question of demography, each of the species actually harbours quite a large number of haploblocks (13 in H. annuus spanning 326Mb, 6 in H. argophyllus spanning 114 Mb, and 18 in H. petiolaris spanning 467 Mb; see Todesco et al. 2020 for more details) so there does not seem to be any clear association with demography. We agree about the complexities that might underly the evolution of inversions that you outline above, and have refined some of the text where we discuss their evolution in the Discussion.

      Reviewer #2 (Public Review):

      In this study the authors sought to understand the extent of similarity among species in intraspecific adaptation to environmental heterogeneity at the phenotypic and genetic levels. A particular focus was to evaluate if regions that were associated with adaptation within putative inversions in one species were also candidates for adaptation in another species that lacked those inversions. This study is timely for the field of evolutionary genomics, due to recent interest surrounding how inversions arise and become established in adaptation.

      Major strengths

      Their study system was well suited to addressing the aims, given that the different species of sunflower all had GWAS data on the same phenotypes from common garden experiments as well as landscape genomic data, and orthologous SNPs could be identified. Organizing a dataset of this magnitude is no small feat. The authors integrate many state-of-the-art statistical methods that they have developed in previous research into a framework for correlating genomic Windows of Repeated Association (WRA, also amalgamated into Clusters of Repeated Association based on LD among windows) with Similarity In Phenotype-Environment Correlation (SIPEC). The WRA/CRA methods are very useful and the authors do an excellent job at outlining the rationale for these methods.

      Thank you!

      Major weaknesses

      The study results rely heavily on the SIPEC measure, but I found the values reported difficult to interpret biologically. For example, in Figure 4 there is a range of SIPEC from 0 to 0.03 for most species pairs, with some pairs only as high as ~0.01. This does not appear to be a high degree of similarity in phenotype-environment correlation. For example, given the equation on line 517 for a single phenotype, if one species has a phenotype-environment correlation of 1.0 and the other has a correlation of 0.02, I would postulate that these two species do not have similar evolutionary responses, but the equation would give a value of (1+0.02)10.02/1 = 0.02 which is pretty typical "higher" value in Figure 4. I also question the logic behind using absolute values of the correlations for the SIPEC, because if a trait increases with an environment in one species but decreases with the environment in another species, I would not predict that the genetic basis of adaptation would be similar (as a side note, I would not question the logic behind using absolute correlations for associations with alleles, due to the arbitrary nature of signing alleles). I might be missing something here, so I look forward to reading the author's responses on these thoughts.

      The reviewer makes a very good point about the range of SIPEC, and we have changed our analysis to reflect this, now reporting the maximum value of SIPEC for each environment (across the axes of the PCA on phenotypes that cumulatively explain 95% of the variance), in Figure 4 and Supplementary Figures S2 and S13. For consistency among manuscript versions and to illustrate the effect of this change, we retain the mean SIPEC value in one figure in the supplementary materials (S12), which shows the small effect of this change on the qualitative patterns. Figure 4 now shows that the maximum SIPEC value is regularly quite strong, which should address the reviewer’s concern that this is not being driven by anomalous and small values. We appreciate this point and think this change now more closely reflects how we are trying to estimate the biological feature of interest – that some axis of phenotypic space is strongly (or not) responding to selection from the environmental variable.

      With respect to the logic behind using absolute value, we still feel this is justified for traits, because if a trait evolves to be bigger or smaller, it may still use the same genes. For example, flowering time may change to be later or earlier, which would result in opposite correlations with a given environment, but might use the same gene (e.g. FT) for this. As such, we think keeping absolute value is more representative as otherwise species with strong but opposite patterns of adaptation would look like they were very different. We have added a statement on line 584 in the methods section to further clarify the reason for this choice.

      An additional potential problem with the analysis is that from the way the analysis is presented, it appears that the 33 environmental variables were essentially treated as independent data points (e.g. in Figure 4, Figure 5). It's not appropriate to treat the environmental variables independently because many of them are highly correlated. For example in Figure 4, many of the high similarity/CRA values tend to be categorized as temperature variables, which are likely to be highly correlated with each other. This seems like a type of pseudo replication and is a major weakness of the framework.

      This is a good point and we fully agree. It is for this reason that we didn’t present any p-values or statistical tests of the overall patterns that are shown in these figures (i.e. the linear relationship between SIPEC and number of CRAs in figure 4 and the tendency for most points to fall above the 1:1 line in figure 5). But to make sure this is even more clear, we have added statements to the captions of these figures to remind readers that points are non-independent. We still feel that in the absence of a formal test, the overall patterns are strongly consistent with this interpretation. A smaller number of non-pseudo-replicated points in Figure 4 would still likely show linear patterns. Similarly, there are almost no significant points falling below the 1:1 line in Figure 5, and it seems unlikely that pseudoreplication would generate this pattern.

      Below I highlight the main claims from the study and evaluate how well the results support the conclusions.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments" (abstract)<br /> Given the questions above about SIPEC, I did not find this conclusion well supported with the way the data are presented in the manuscript.

      We have changed the reporting of the SIPEC metric so that it more clearly reflects whichever axis of phenotypic space is most strongly correlated with environment in both species (using max instead of mean). This shows similar qualitative patterns but illustrates that this happens across much higher values of SIPEC, showing that it is in fact driven by high correlations in each species (or non-similar correlations resulting in low values of SIPEC). While we agree about the pseudo-replication problem preventing formal statistical test of this hypothesis, the visual pattern is striking and seems unlikely to be an artefact, so we think this does still support this conclusion.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments, which are particularly enriched within regions of the genome harbouring an inversion in one species. " (Abstract) And "increased repeatability found in regions of the genome that harbour inversions" (Discussion)<br /> These claims are supported by the data shown in Figure 4, which shows that haploblocks are enriched for WRAs. I want to clarify a point about the wording here, as my understanding of the analysis is that the authors test if haploblocks are enriched with WRAs, not whether WRAs are enriched for haploblocks. The wording of the abstract is claiming the latter, but I think what they tested was the former. Let me know if I'm missing something here.

      We are actually not interested in whether WRAs are enriched for haploblocks; we want to know if WRAs tend to occur more commonly within haploblocks than outside of them. We have tried to clarify that this is our aim in various places in the manuscript. Our analysis for Figure 5 is the one supporting these claims, and it uses the Chi-square test statistic to assess the number of WRAs and non-WRAs that fall within vs. outside of inversions, and a permutation test to assess the significance of this observation, for each environmental variable and phenotype. We don’t think that this test has any direction to it – it’s simply testing if there is non-random association between the levels of the two factors. Thus, we think the wording we have used is consistent with the test result and our aims. Perhaps the confusion arose from the two methods that we present in the Methods (one is used for Figure 5, the other for Figure S6C & D), so we have added clarifications there.

      Notwithstanding the concerns about highly correlated environments potentially inflating some of the patterns in the manuscript, to my knowledge this is the first attempt in the literature to try this kind of comparison, and the results does generally suggest that inversions are more likely capturing, rather than accumulating adaptive variation. However, I don't think the authors can claim that repeated signatures are enriched with haploblock regions, and the authors should take care to refrain from stating the relative importance of different regions of the genome to adaptation without an analysis.

      Actually, we don’t have a strong feeling about whether inversions are capturing vs. accumulating adaptive variation, as these results could be consistent with either. As described above, we do not understand why we can’t claim that repeated signatures are enriched within haploblocks. We thought the reviewer is perhaps referring to the fact that the points are pseudo-replicated in the figures due to environment? We note that a very large number of points are significantly different from random in terms of the distribution of WRAs within vs. outside of haploblocks (light- vs. dark-shaded symbols), and that almost all of them fall above the 1:1 line. While there may be pseudo-replication preventing a test of the bigger multi-environment/multi-species hypothesis across all phenotypes and environments, there is almost a complete lack of significant results in the other direction. This seems like quite strong evidence about enrichment of WRAs within haploblocks, across many environments/species contrasts. We have added some text to the description of patterns in figure 5 to try to clarify this.

      "While a large number of genomic regions show evidence of repeated adaptation, most of the strongest signatures of association still tend to be species-specific, indicating substantial genotypic redundancy for local adaptation in these species." (Abstract)<br /> Figure 3B certainly makes it look like there is very little similarity among species in the genetic basis of adaptation, which leaves the question as to how important the repeated signatures really are for adaptation if there are very few of them. (Is 3B for the whole genome or only that region?). This result seems to be at odds with the large number of CRAs and the claims about the importance of haploblock regions to adaptation, which extend from my previous point.

      Figure 3B is for the whole genome, we have added text to the figure caption to clarify this. We think that both interpretations are possible: that most of the regions of the genome that are driving adaptation are non-repeated, but that a small but significant proportion of regions driving adaptation are repeated above what would be expected at random. Thus, it seems that there is high redundancy, coupled with adaptation via some genes that seem particularly functionally important and non-redundant, and therefore repeated. We added clarifying text on lines 541-548.

      "we have shown evidence of significant repeatability in the basis of local adaptation (Figure 4, 5), but also an abundance of species-specific, non-repeated signatures (Figure 3)"<br /> While the claim is a solid one, I am left wondering how much of these genomes show repeated vs. non-repeated signatures, how much of these genomes have haploblocks, and how much overlap there really is. Finding a way to intuitively represent these unknowns would greatly strengthen the manuscript.

      We agree, and really struggled to find the best way to communicate both the repeated patterns and the large amount of non-repeated signatures. Unfortunately, we have more confidence in the validity of repeated patterns because for the non-repeated patterns, a strong signature of association to environment in only one species could just be the product of structureenvironment correlation, as we didn’t control for population structure. Thus, trying to quantify the proportion of non-repeated signatures is difficult to do with any accuracy and we preferred to avoid putting too much emphasis on the simple calculation of the proportion of top candidate windows that were also WRAs.

      Overall, I think the main claims from the study, the statistical framework, and the results could be revised to better support each other.

      Although the current version of the manuscript has some potential shortcomings with regards to the statistical approaches, and the impact of this paper in its present form could be stifled because the biology tended to get lost in the statistics, these shortcomings may be addressed by the authors.

      With some revisions, the framework and data could have a high impact and be of high utility to the community.

      Thank you for your very helpful comments and suggestions on our paper, we really appreciate it.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Editor's comments:

      The reviewers make a series of reasonable suggestions that I echo. I found the paper quite hard to follow, and got fairly lost in the various layers of analyses done. Partially, this represents the complexity of empirical genomic data, which rarely deliver simple stories of convergence at a few genes. However, the properties of the various statistics used to detail local adaptation and convergence are not particularly clear and the figures presented were not intuitive representations of the data. This leaves the reader with an incomplete view of how much weight to put in the various lines of evidence marshaled. I would suggest simplifying the presentation of the results considerably. I add a few additional comments below.

      Great suggestion, we’ve added a schematic overview of the methods and main research questions to Figure S1 in the supplementary materials.

      A figure would help showing some of the signals of SNPs with putative signals of convergent environmental correlations across species, e.g. frequencies plotted against climate variables. This would help readers get a sense of how strong these signals were. These could be accompanied by the statistics calculated for these SNPs, that would allow the reader to start to get some intuitive sense of what the numbers mean.

      Great suggestion, we have added a schematic overview of the methods to Figure S1 that shows some of the values and illustrates how the methods work using visual examples from our data.

      In general, the introduction and some of the discussion of the inversion results feel oddly framed:<br /> Abstract line 36: "This shows that while inversions may facilitate local adaptation, at least some of the loci involved can still make substantial contributions without the benefit of recombination suppression."

      We have changed “some of the loci involved can still make substantial contributions without the benefit of recombination suppression” here to “some of the loci involved can still harbour mutations that make substantial contributions without the benefit of recombination suppression in species lacking a segregating inversion” as it hopefully clarifies that we’re not talking about individual alleles that are present in both species.

      Models of the role of local adaptation in the establishment of inversions (Kirkpatrick & Barton) assume that there are multiple locally adapted alleles already present. It is the load created by these alleles being constantly maintained in the face of migration and subsequent recombination that allow an inversion to be selected for because it keeps together locally adapted alleles. Thus these models predict that there could well be standing local adaptation at these loci in the absence of the inversion in other species, and that these locally adapted alleles while not fixed may be at high frequency. (After establishment, inversions housing locally adapted alleles, can shield more weakly, locally beneficial alleles from migration allow other alleles to build up.) Empirically it's interesting to find signals of local adaptation in other species that don't contain putative inversions. But the logic of the different predictions is not particularly clear from the introduction, and only becomes somewhat clearer in the discussion.

      Thank you for pointing out this murkiness, we have re-written portions of both the Introduction and Discussion to clarify this aspect.

      From the introduction: Inversions have been implicated in local adaptation in many species (Wellenreuther and Bernatchez 2018), likely due to their effect to suppress recombination among inverted and noninverted haplotypes, and thereby maintain LD among beneficial combinations of locally adapted alleles (Rieseberg 2001; Noor et al. 2001; Kirkpatrick and Barton 2006). This has been approached by models studying the establishment of inversions that capture combinations of locally adapted alleles present as standing variation (e.g., Kirkpatrick and Barton 2006), as well as models examining the accumulation of locally adapted mutations within inversions (e.g., Schaal et al. 2022). If there is variation in the density of loci that can potentially contribute to local adaptation, inversions would be expected to preferentially establish and be retained in regions harbouring a high density of such loci (and this expectation would hold for both the capture and accumulation models). We would also expect to see stronger signatures of repeated local adaptation in such high density regions. Despite mounting evidence of their importance in adaptation, it is unclear how inversions may covary with repeatability of adaptation among species. A fundamental parameter of importance in these models is the relationship between migration rate and strength of selection on individual alleles, which may not make persistent contributions to local adaptation without the suppressing effects of recombination if selection is too weak (Yeaman and Whitlock 2011; Bürger and Akerman 2011). If most alleles have small effects relative to migration rate and can only contribute to local adaptation via the benefit of the recombination-suppressing effect of an inversion, then we would expect little repeatability at the site of an inversion – other species lacking the inversion would not tend to use that same region for adaptation because selection would be too weak for alleles to persist. On the other hand, if some loci are particularly important for local adaptation and regularly yield mutations of large effect, with these patterns being conserved among species, repeatability within regions harbouring inversions may be substantial. Thus, studying whether adaptation at the same genomic region harbouring an inversion is observed in other species lacking the inversion can give insights about the underlying architecture of adaptation, and the evolution and maintenance of inversions.

      From the Discussion: The observed repeatability associated with inversions further supports the local adaptation model as an explanation for the long-term persistence of segregating inversions (at least in sunflowers, rather than mechanisms based on dominance or meiotic drive (Rieseberg 2001). If there is variation across the genome in the density of loci with the potential to be involved in local adaptation, then the establishment and maintenance of inversions would be biased towards regions harbouring a high density such loci under this model. If the genomic basis for local adaptation is conserved amongst species, then these same regions are more likely to have high repeatability. Thus, our observation of genomic regions harbouring inversions also being enriched for WRAs is consistent with this general model for inversion evolution. Unfortunately, our observations do not provide much insight into whether inversions evolve through the capture (e.g. Kirkpatrick and Barton 2006) or accumulation (e.g. Schaal et al. 2022) type of model, as either model would be consistent with our results. Most of the sunflower inversions are >1 My old, and therefore predate any current local adaptation patterns, but likely do not predate the genes underlying local adaptation (which appear to be shared among the species we studied). As for the alleles underlying local adaptation, they may be younger than the inversions, but as our work suggests, these regions are prone to harbouring locally adaptive alleles so it is possible that they also harboured other ancestral locally adaptive alleles.

      As a minor comment, there's a fair number of places where a more nuanced view of the field is needed, e.g.:<br /> "Models in evolutionary genetics tend to focus on extremes: population genetic approaches explore cases where strong selection deterministically drives a change in allele frequency" --This seems like a strange strawman. Population genetic models span a huge parameter range. The empirical approaches of looking for sweeps by detecting genome-wide statistical outliers is predicated on strong selection, but there are numerous papers that have looked for signals of weak selection genome-wide.

      Good point, we have changed our wording here.

      Reviewer #1 (Recommendations For The Authors):

      Comments

      My main comment on the manuscript is that the different levels and diversity of analyses are slightly hard to follow on the first, and even second, read. As there are several layers of correlations and comparisons, as well as some independent analyses, I wonder if it might be helpful to have a summary schematic figure of how all analyses fit together.

      Great idea, we have added Figure S1 that summarizes the main flow of the methods and research questions.

      • L169-171: Would it be more accurate to say that SIPEC is maximized when both species have strong correlations for an environmental variable across the same phenotypes? But maybe I misunderstood the index.

      Good point, we have now simplified SIPEC, reporting the max instead of the mean, which we think better reflects when similar patterns are happening in both species for some phenotype.

      • L191: Given the discussion in the introduction and elsewhere about the correction for population structure, which version is used here? Same for Figure 3.

      We have added clarification there.

      • L348: One [environmental] variable?

      Added

      • L353: Maybe add a percentage indication for 387 so that it is comparable to the following 23.3%.

      Good point, added

      -> L388 and paragraph: You mention "significant repeatability" but it is hard from the results at this point to have a broad idea of the amount of signal that is repeatable. Would it be possible to add here some quantitative measure of the proportion of signal repeatable or not, even if approximated?

      I wish we could, but I think the precision implied by such an approximation would involve a huge amount of uncertainty and likely inaccuracy. Because it is so hard to conclusively identify how many loci are significant but non-repeated, we really don’t have a good handle on the denominator here. We are pretty confident that the repeated loci are strongly enriched for true positives, but the non-repeated loci are also almost certainly strongly enriched for false positives. While we really want to be able to quantify this explicitly, we don’t think it’s possible given our data.

      -L415-418: "If there is variation [...] involved in local adaptation", I do not follow this argument, could you rephrase?

      Changed

      -L447-450: As you say in the supplementary methods, your analyses exclude 3/4 of the genome. Do you think this choice has a large impact on the number of outliers observed here as the genome-wide baseline would change?

      This is a very good question, but one that is quite complex and without a clear answer – we chose not to delve into it in the paper to keep the discussion streamlined. My (SY) feeling is that it is unlikely that regions harbouring transposable elements would contribute much to adaptation, but I think we really don’t know if that is true. Even excluding ¾ of the genome harbouring TEs, ¼ of the genome still constitutes a huge amount of sequence and a very large number of genes and it seems plausible that most genes and genic regions would not contribute to adaptation for a given trait, so I don’t think this would change the results too much in a qualitative way – but would almost certainly change the number of windows that are significant, etc.

      • L455-457: "As we are unable [...] potentially important drivers" Could you provide the logical link here between loci of small effect and them being important drivers. I presume you mean that the large effect loci found here only account for a small proportion of the heritability?

      Yes that’s what we meant here, so we’ve added some clarification.

      • L482: "enriched within inversions" should that be 'in genomic regions where there exist inversions in at least one species'? Thanks for catching that, yes. Changed.

      • Methods/SIPEC L512: Compared to the Results section it is unclear here what is referred to as an "environment" Is it a variable or a set of environment variables?

      This is done per environmental variable.

      I find the presence of the PCA for environment variables in Figure 2 misleading as my first interpretation was that PCs for environment were also used.

      Good point, we have clarified this on line 190-193.

      Maybe one potential addition to the formula would be to add an environment variable $j$ notation such that it reads "$SIPEC_j = \sum_i (|r_{ij,1}| + ...) ...$ where ... between environment variable $j$". I had initial difficulties to understand how this SIPEC was computed relating to environmental variables and this might help.

      Given the other changes we made to SIPEC, we felt it was simpler to just present it as a single calculation on a given combination of phenotype and environment for a pair of species, and then discuss taking the mean and maximum of this later.

      Finally, PCA axes explaining 95% of the variance are used, I would find it interesting to see how many PCs are used in comparison to the number of traits being measured.

      We have added the following sentence to the methods describing this:

      "For comparisons including H. argophyllus, 95% of the variance was typically explained by 8-10 PC axes (out of 28 or 29 phenotypes), whereas for comparisons among other taxa this included 21 or 22 PC axes (out of 65 or 66 phenotypes."

      Typos

      L52: --

      Changed

      L254: portions [of] their

      Changed

      L399: additional closing parenthesis

      Changed

      L458: signatures [of] repeated association

      Changed

      L554: performed [on]

      Changed

      L578: 5 ~~kp~~/kb windows

      Changed

      L601: ~~casual~~/causal SNPs

      Changed

      L615: ~~widow~~/window

      Changed

      L732: ~~Banding~~/Banting Postdoctoral Fellowship

      Changed

      L1002 & L960: [Supplementary] Figure

      Changed

      Supplementary: Some figure titles are in bold and others are not.

      Changed

      Reviewer #2 (Recommendations For The Authors):

      Overall I found the writing to be very clear and easy to follow. Despite my comments, it was clear that a lot of thought went into how to conduct the tests and visualize the results. I recommend ending the Discussion on a positive note, rather than an impossible test.

      Thanks for the positive suggestion, we have done this.

      In Figure 5, is the temperature variable missing in the legend and in the plot?

      No, for this plot we just combined the temperature/precipitation variables into one variable called “climate”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The first major issue is related to the imaging and tracking experiment to examine the formation and migration of F-actin foci as illustrated in figure 3. The formation and centripetally migration of F-actin foci is a significant finding of this MS for the promotion of B cells to switch from spreading to contraction response. Thus, I may suggest to recommend the authors to conduct one more rigorous fluorescent molecular tracking experiment to confirm this phenomenon. Molecular tracking usually requires low labeling density, and the lifeact-GFP labeling here do not meet this requirement which may cause misidentification of the moving molecules. Permeable dye-based fluorescent speckle microscopy is recommended here to track the actin foci if applicable (P. Risteski, Nat. Rev. Mol. Cell Biol., 2023, DOI: 10.1038/s41580-023-00588w & K. Hu, et al, Science, 2007, 315, 111-115).

      We thank the reviewer for the suggestion. We conducted the suggested experiment using membrane-permeable SiR-actin to track B-cell actin dynamics. Unfortunately, two significant issues prevented us from confirming the LifeAct-GFP results using fluorescent speckle microscopy. First, the concentration of SiR-actin required to visualize F-actin in the contact zone of mouse primary B-cells was relatively high due to their smaller sizes (~6 µm diameter) and non-adherent nature. With such a relatively high concentration of SiR-actin, we could not perform fluorescent speckle microscopy. Second, we observed that SiR-actin appeared to stabilize actin structures and reduce actin dynamics, further limiting its use in studying actin dynamics in B-cells.

      Additionally, kymograph is used for foci tracking in figure3 and figure4. Kymograph is indeed a powerful tool for tracking cell protrusion and retraction but is not fairly suitable here, since a Factin focus is a concentrated point which may not move strictly along the selected eight lines generating kymograph. Other imaging processing method should be used to track the foci, for example, time series max projection is recommended if applicable.

      We thank the reviewer for the suggestion and have tried the time series max projection. Unfortunately, it did not provide the resolution to identify individual actin foci, again probably due to the small size of primary mouse B-cells. While kymographs may not track the entire paths of these moving foci, we believe that the conclusions drawn from the kymography analysis in Figure 3 and 4 are reasonable. We generated eight kymographs for each cell in Figure 3 and three kymographs for each cell in Figure 4 to follow as many actin foci as possible within the spreading to contraction transition time window. Our analysis in Figure 3 identifies the fraction of actin foci originating from lamellipodia. In Figure 4, we used the kymographs to trace the path of putative clusters and used these to calculate their relative lifetimes and speed. While this is not what was suggested by the reviewer, our analysis provides qualitatively similar information to the time series max projection and reasonable comparisons between contracted and noncontracted cells, inhibitor-treated and untreated cells, and wild-type and WASP KO cells.

      The second major issue is about the relationship between actin foci formation and NMII recruitment in figure 5. The author concludes that 'N-WASP and Arp2/3 mediated branched actin polymerization promotes the recruitment and the reorganization of NMII ring-like structures by generating inner F-actin foci in the contact zone'. However, there is a lack of strong evidence to directly show the mechanism by which myosin is recruited and the up and down stream relationship between actin foci migration and myosin recruitment. Since myosin-induced actin retrograde flow is a classical model in adherent cells, is it possible that, here also in activated B cells, the recruited myosin driven the formation and migration of actin foci? This reviewer may recommend the author to investigate whether Myosin blocking (e.g., using Y27632) can eliminate the F-actin foci formation and migration.

      This is an excellent suggestion! In the revised manuscript, we have included new data showing that treatment with the non-muscle myosin II motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci ring-like structures, which are associated with B-cell contraction. These results together suggest that the generation of inner actin foci ring-like structure depends on the coordination between N-WASP-mediated actin polymerization and myosin contractile activity. We chose to use blebbistatin rather than Y27632 to inhibit non-muscle myosin II because in addition to the ROCK pathway, myosin light chain kinase can also activate myosin II, and Y27632 may have additional effects besides inhibiting myosin activity. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses: Minor as listed below. The working hypothesis of molecular crowding as a way to push out signalling molecules from the BCR dense foci is interesting. The authors provide evidence for that this is an active process mediated by N-WASP - Arp2/3 induced actin foci. Another possibility is that BCR dense foci formation is an indirect consequence of lamellipodia retraction. Future works should define the specific role of N-WASP, Arp2/3 and actin in the process to form BCR dense foci, especially as the BCR continue to signal in the cytoplasm.

      We thank the reviewer for the comments. We have included the possibility that lamellipodial retraction may be involved in increasing the molecular density of BCR clusters and suggested future studies on the potential roles of N-WASP-dependent inner actin foci and actomyosin structures in BCR internalization and intracellular signaling in the Discussion section.

      Reviewer #3 (Public Review):

      The author prove their claims by mean of thorough image analysis, mainly observing and quantifying the fluorescence and the dynamics of single clusters of antigen and actin foci and analyzing two-colors dynamical images. They perform their observation in control cells, on pharmacologically perturbed cells where the action of Arp2/3 or N-WASP is inhibited, and on modified primary cells (primary derived from genetically engineered mice) to silence N-WASP or WASP. The work is sound and complete, the experiments technically excellent and well explained. Some experiments and discussions are objectively harder to describe, and given the length of the work, the reader might find itself lost some times. A graphical abstract/summary of the main way N-WASP ultimately control signal attenuation would solve this minor point.

      We greatly appreciate the reviewer’s confirmation of our data quality and are delighted to accept the reviewer’s suggestion. In the revised manuscript, we have included a new figure (Figure 10) in the Discussion section, summarizing the results presented in the manuscript as a working model.

      Reviewer #1 (Recommendations For The Authors):

      Some minor points: Figure 1C, E, G and I shows three individual symbols, indicating three independent experiments described in legend. Please double check for accuracy.

      It is better to show statistical data with representative repeat, not the merged means of independent experiments. For example, figure 1C even indicates three "0" data in CK-666 treated cells, meaning no contracting cell was found in ~75 cells, while there are other repeats showing 45% - 50% contracting cells. This applies to all figures involving individual cell imaging data, such as figure 2D, in which 30 cells from three independent experiments were pooled. The authors shall clearly state that those independent experiments are statistically indistinguishable before pooling the data.

      We agree with the reviewer’s comments that these data have variability from individual mice, the quality of isolated primary B-cells, and the lateral mobility of planar lipid bilayers. To show the variability, we displayed the data from each experiment as individual data points. In the revised manuscript, we have utilized three colors of dots to represent three independent experiments in Figure 1C, E, G, and I, Figure 2B-G, and new Figure 5H, which show that the data from the three experiments have the same trend despite the variability.

      In figure 7B-C, figure 8 and figure 9. The significant test results were hard to understand in which groups they compared. Please describe it in more detail in the figure legend or the method section.

      In the legend, the authors claimed blue points in Figure 7B represented individual pCD79a clusters within an equal number of BCR clusters from each time points. The authors used means to qualify the change of blue points distribution. These shall be clearly stated in the Methods. Total BCR cluster numbers shall be shown also. This applies to Figure 7B, 7C, 7D and all figures in figure 8 and figure 9.

      We thank the reviewer for pointing it out. We have revised Figures 7-9, where we utilized square braces to indicate groups of clusters (blue points) being compared. We have also provided additional information in the figure legend and Method sections.

      Reviewer #2 (Recommendations For The Authors):

      199-200: What is the consequence of increased WASP activation in N-WASP knockout B cells? Is this evaluated as increased pWASP activity and/or increased actin polymerization of WASP knockout B cells. Does WASP and N-WASP have an additive or counteractive effect on each other during spreading and contraction?

      Indeed, the relationship between WASP and N-WASP, which are co-expressed in B-cells and other immune cells, is fascinating. Our previous studies, using WASP germline knockout, B-cellspecific N-WASP knockout, WASP and N-WASP double knockout mice, showed that WASP and N-WASP have both additive and counteractive effects during B-cell spreading, but B-cell contraction only depends on N-WASP (Liu et al. 2013. PLoS Biol). Double knockout B-cells fail to spread, and WASP knockout B-cells show reduced spreading but still contract, showing their additive effects. However, WASP and N-WASP suppress each other for activation, as detected by their phosphorylation. Phosphorylated WASP increases in the B-cell contact zone first, and phosphorylated N-WASP increases later when the phosphorylated WASP level decreases. Knocking out one of them enhances the phosphorylation of the other. Consequently, N-WASP knockout B-cells show increased spreading, probably due to enhanced activation of WASP, but exhibit delayed contraction. The revised manuscript has expanded the discussion on this area to relate it to the results presented in this manuscript.

      560-563: Was Syk and SHIP-1 measured in the same cell? If not, the conclusion should be tempered.

      Unfortunately, antibodies specific for Syk and SHIP-1 were from the same host, which did not allow us to stain them in the same cells. The revised manuscript has discussed this as a shortcoming of our work.

      1204-1205: Explain better "three randomly positioned kymographs were generated" - how were they selected?

      We apologize for this unclear sentence. The three kymographs were positioned to track as many inner F-actin foci as possible.

      328: Change "abolished" to "reduced" to describe the data. 354-356: Unclear sentence, please edit. 1171: (H) should be (G). 1325: "PI" should be "FI".

      We thank the reviewer for finding these typos and unclear sentences. We have made the corrections accordingly.

      Methods: The description of the TIRF microscopy method is good. Regarding the image analysis, it is somehow difficult to have a good understanding of what was analyzed just by reading the text. Please show an example of the pipeline for the analysis from a raw image and the processing steps.

      Figure 6-figure supplement 2 shows the image analysis process for tracking Fab’ clusters. We utilized the same approach for the image analysis of Figures 7-9.

      Discussion: Add a paragraph to state the limitations of the study. How do the findings here translate into in vivo activation of B cells and how can this be addressed based on the data presented in this study.

      We thank the reviewer for the suggestion. In several paragraphs of the revised Discussion section, we have brought up the limitations of the study and how these limitations affect the data interpretation. In addition, we have added Figure 10 and the associated text to present our working model, which explains how our findings reveal the cellular mechanism by which BCR surface signaling amplification transitions into attenuation, likely occurring in vivo.

      Figure 2: Add an example of the image analysis for foci determination. From the images, it is not always clear what is a foci and what is not which makes the "number of foci" data difficult to evaluate.

      We have added arrows to Figure 2A to indicate all identified inner F-actin foci in images.

      Figure 3: add a kymograph for the WKO analysis.

      In the revised Figure 4, we have provided a kymograph of a WKO B cell.

      Figure 4M: the analysis of the "relative speed" of the "WT" samples is lower compared to the other control samples "DMSO" and "CK-689". The conclusion is that WKO have similar "relative speed" as "WT" cells, but in fact the "WT" cells may have responded poorly in this experiment. What is the author's experience and explanation?

      We agree that the relative speeds of inner actin foci in the contact zone of WT and WKO B-cells are relatively low compared to DMSO and CK-689. Based on our experience, this parameter is very sensitive to the lateral mobility of planar lipid bilayers. We could only perform one pair of conditions using live cell images each time. The WT and WKO experiments were done at the end and might use relatively aged liposomes. However, it did not affect the number of inner actin foci formed and their relative lifetime, consistent with their similar relative speeds. Unfortunately, we lost the LifeAct-GFP-expressing WKO mouse colony and cannot redo this experiment using freshly made liposomes within a reasonable time.

      Figure 7B-D: Add a more detailed legend for the black and brown lines in the dot plots.

      We have expanded the legend for Figure 7B-D to provide additional details.

      Figure 8-9: Show representative images for SYK, pSYK, SHIP-1 and pSHIP-1. Add a more detailed legend for the black and brown lines in the dot plots.

      We have provided representative images for Syk, pSyk, SHIP-1, and pSHIP-1 in revised Figure 8 and 9.

      Reviewer #3 (Recommendations For The Authors):

      From the paper one understands that NMII is recruited by the actin foci and this recruitment pushes the foci towards the center of the synapse, in what resembles a positive feedback. Could the authors better elucidate this point? What happen at the peak of NMII recruitment? Could this be a mechanism used by the cell to end the contact and detach (which probably cannot be observed in this experimental setup)?

      This is an excellent comment! We have recently shown that NMIIA recruitment peaks right before B-cell contraction occurs, and inhibition of NMII by inhibitors or B-cell conditional knockout blocks B-cell contraction and enhances signaling (Seeley-Fallen et al. 2022. Frontiers in Immunology). In the revised manuscript, we have included new data showing that treatment with the NMII motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci associated with B-cell contraction. These results together suggest that the generation of inner actin foci depends on the coordination between N-WASP-activated actin polymerization and myosin contractile activity, supporting the reviewer’s comment. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Whether the recruited NMII pulls B-cells away from antigen-presenting surfaces remains an interesting question. We have previously shown that high-affinity interaction of surface BCRs with membrane-anchored antigen can cause NMII-dependent B-cell membrane permeabilization, which triggers lysosome exocytosis and lysosomal enzyme-mediated antigen cleavage, allowing antigen internalization and presentation to T-cells (Maeda et al. 2021. eLife). Furthermore, NMII is required for B cells to internalize surface antigens (Natkanski et al. 2013. Science). These results support the possibility that actomyosin structures formed during B-cell contraction may further drive B-cells to internalize antigen. We have discussed this interesting point in the revised manuscript.

      Some experiments/quantification are a bit more complex than others and a reader might find hard to follow them (in particular figs 7,8 and 9). The comprehension could be improved by providing a guide to read them. E.g. it is not clear what the population distribution represents (and it is not particularly affected by any manipulation. How were the group for test chosen? It seems they are based on intensity categories taken every 100 units: is it the case? even if arbitrary, this should be stated it in the legend.

      We thank the reviewer for understanding the complexity of image analysis and pointing out the unclear points. Based on the reviewer’s comments, we have revised Figures 7-9 and the figure legend. We utilized square brackets to indicate groups of clusters (blue points) being compared. The comparison groups were chosen arbitrarily based on Fab’ peak fluorescence intensity every 90 units for Figure 7 and 8 and every 100 units for Figure 9.

      Can the author speculate on how the actin organization passes from actin foci to recruitment of NMII and arc formation? Is it a rearrangement of the actin network (percolation) or simply recruitment of monomers?

      Our previous and new results show that both N-WASP-activated Arp2/3 and NMII are required to form inner F-actin foci. Based on these results, we speculate that N-WASP and Arp2/3mediated actin polymerization may initiate the process and recruit NMII, and recruited NMII coordinates with actin polymerization to reorganize actin structures, promoting inner actin foci maturation and arc formation. We have included these possibilities in the revised discussion.

      The role of SHIP recruitment as way to inhibit the signal downstream of the BCR is an interesting finding. Is this related to the termination of the synapse? Could we relate the time scales (accurately measured in this work) to contact times observed in vivo?

      The reviewer raises an interesting question. In the discussion section, we have speculated that the actomyosin structures responsible for B-cell contraction are potentially the precursor cytoskeleton structures for antigen internalization. However, the relationship of B-cell contraction and signaling attenuation with the termination of the synapse remains unclear.

      The BCR has been shown to be internalised mechanically: do these new data suggest a mechanisms for force generation in antigen internalization at the actin foci? Related to that, how do the dynamics of N-WASP recruitment relate to the force measurement highlighted in Traction Force Microscopy experiments (see for example Wang Sci.Signal. 2018, Kumari Nat.Comm.2019)? What happens in situation when the actin foci are unable to get transported, e.g. as on the more classical antigen on coverslip configuration?

      Indeed, our results allow us to speculate that the actomyosin structures responsible for B-cell contraction potentially contribute to antigen internalization by mechanical forces. We previously showed that the B-cell-specific N-WASP knockout drastically reduced BCR internalization of soluble antigen (Liu et al. 2013. PLoS Biol), and that NMII is required for BCR internalization of membrane-associated antigen (Maeda et al. 2021. eLife and Natkanski et al. 2013. Science). The effect of N-WASP knockout on the internalization of membrane-associated antigen and traction forces generated at the contact membrane and whether traction forces are generated from the inner F-actin foci have not been determined but will be pursued in the future.

      Our previous publication compared the BCR and actin dynamics of B-cells interacting with Fab’ tethered to planer lipid bilayers (Fab’-PLB) and cover glass (Fab’-G) (Ketchum et al. 2014. Biophys J). B-cells interacting with Fab’-G do not contract and generate inner F-actin foci and exhibit less dynamic BCR clusters and actin cytoskeleton than B-cells interacting with Fab’-PLB. Actin foci remain coincident with Fab’ clusters on glass rather than being positioned behind Fab’ clusters on PLB, thus driving their centripetal movement.

      Minor remarks: When several experiments (mice) are presented in dot plots (e.g. fig 2D-G 4J-M), color dot plot (so called "smart plot") where each experiment is identified by a color, could be used to highlight the sample-to-sample variability.

      This is an excellent suggestion. In the revised manuscript, we have utilized three shades of dots to represent the data points from three independent experiments.

      Fig 6A: the fluorophore should be indicated in the picture (Fab'-AF546)

      The suggested correction has been made.

      Fig 6D: how is the contraction phase (purple rectangle) determined? Curve by curve or on the average curve? Please specify this in the legend.

      The contraction phase (purple rectangle) was determined using the average curve of the contact area by IRM over time. We have added this sentence to the revised figure legend.

      Minor typos in the material and methods: in some case C56BL/6 is written instead of C57BL/6 Corrected.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels. The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung. Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored. The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear. The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1. Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      We thank the reviewer for their comments. We are prepared to carry out these power calculations and repeat the experiment if necessary.

      1. All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      We are prepared to measure the levels of Ace, biosynthetic enzyme expression in female mice by qPCR, and ACE protein expression by IF. Additionally, we will test females using the dexamethasone suppression study. The single cell RNA seq analysis was used primarily to inform our model, not for experimental readout. We will explore the dataset as the reviewer suggests and will add additional plots if the analysis substantively changes our previous findings.

      1. IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We have negative controls for antibody staining. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung.

      1. Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      The vendor of this antibody has verified by cell treatment to ensure that the antibody binds to the antigen stated .We are prepared to additionally validate the antibody using other tissues as control, though we point out that ACE is expressed, albeit at lower levels, in endothelial cells throughout the body and so some signal is to be expected in most if not all tissues.

      1. The link between alveolar macrophage Marco and ACE is poorly explored.

      We are prepared do co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence.

      1. Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      We argue that this would be outside the scope if this project, though we would consider exploring such experiments in future studies.

      1. Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We are prepared to measure blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We have access to an AM cell line which we plan to use to do co-culture experiments with an ACE-expressing endothelial cell line. In this way we will test whether this effect is linked to AMs.

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We are prepared to measure blood electrolytes and blood pressure (via tail cuff method) in Marco-deficient and Marco-sufficient mice.

      4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern, we plan to do a co-culture experiment as outlined above.

      Broadly, we thank the reviewers for taking the time to critically appraise this manuscript. The reviewers primary concern seems to be the lack of direct evidence of an effect of AMs on endothelial Ace expresion, which we plan to address as outlined above. We will adjust our conclusions as appropriate based on the results of the experiments outlined above.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Our comments on the initial eLife assessment

      “This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. The claim that these findings are of relevance to psychosis risk and have implications for policy changes are partially supported by the results”

      We sincerely appreciate the editor and reviewers for their valuable feedback and their willingness to accommodate our perspectives in the first revision. In this revision, the comments from the reviewers have allowed us to further improve our manuscript. Regarding the eLife assessment, we would like to discuss two points.

      Firstly, regarding your point of our “findings are of relevance to psychosis risk…partially supported…”, we want to address that our study is closely related to psychosis risk. Childhood psychotic-like experiences (PLEs) are closely linked to psychotic risk and have been shown to increase the risk of general psychopathology, as mentioned in our Introduction and Discussion.

      The reviewers asked for clearer differentiation between PLEs and schizophrenia, which we incorporated in this revision (line 100~111; line 419~430). So, this revised version now clearly points out that findings are relevant primarily to psychosis risk, and only partially relevant to schizophrenia risk.

      Secondly, regarding “…implications for policy changes are partially supported…”, we have revised our study’s social contribution more clearly and specifically. Incorporating the comments, we have revised that our study offers an insight to the future studies by showing the importance of integrative approaches, considering multi-factorial neurocognition and psychopathology ranging from genes to environment (line 503~512), rather than offers direct policy implications.

      Our collaboration with eLife and the reviewers has proven satisfactory and enriching. The community, coupled with the innovative system and culture established around eLife, has significantly advanced the progression of scientific research. We are privileged to contribute to this endeavor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I am happy with the revisions provided by the authors and I think most of my concerns have been addressed satisfactorily. One remaining concern is the authors' conflation of PLEs and schizophrenia. They stated, for example, that it is necessary to adjust for schizophrenia PGS. Even though studies have found a statistical relationship between schizophrenia PGS and PLEs, this relationship is not very strong (although statistically significant) and other studies have found no relationship. Similarly, having PLEs increases the risk of developing psychosis, but that does not necessarily mean that this risk is substantial or specific. I think this needs more nuance in the manuscript and the term 'schizophrenia' should be used sparsely and very carefully as the paper has focused on PLEs. Otherwise, great work on the revisions, thank you.

      Thank you for your comment on the use of PLEs and schizophrenia. We clearly understand the differences between the two and we made relevant corrections throughout the manuscript. In particular, we added that PLEs are not a direct predictor of schizophrenia and corrected any expressions that may imply that PLEs are closely related to schizophrenia in the Introduction.

      “Psychotic-like experiences (PLEs), which are prevalent in childhood, indicate the risk of psychosis (van der Steen et al., 2019; Van Os & Reininghaus, 2016). Although they are not a direct precursor of schizophrenia, children reporting PLEs in ages of 9-11 years are at higher risk of psychotic disorders in adulthood (Kelleher & Cannon, 2011; Poulton et al., 2000). PLEs also point towards the potential for other psychopathologies including mood, anxiety, and substance disorders (van der Steen et al., 2019), are linked to deficits in cognitive intelligence (Cannon et al., 2002; Kelleher & Cannon, 2011) and show a stronger association with environmental risk factors during childhood than other internalizing/externalizing symptoms (Karcher, Schiffman, et al., 2021).

      Maladaptive cognitive intelligence may act as a mediator for the effects of genetic and environmental risks on the manifestation of psychotic symptoms (Cannon et al., 2000; Keefe et al., 2006; Reichenberg et al., 2005).” (line 100~111)

      We also revised any expressions that could be perceived as implying relevance to schizophrenia in the Discussion. “Prior research identifying the mediation of cognitive intelligence focused on either genetic (Karcher, Paul, et al., 2021) or environmental factors (Lewis et al., 2020) alone. Studies with older clinical samples have shown that cognitive deficit may be a precursor for the onset of psychotic disorders (Eastvold et al., 2007; Fett et al., 2020; Vorstman et al., 2015). Our study advances this by demonstrating the integrated effects of genetic and environmental factors on PLEs through the cognitive intelligence in 9-11 years old children. Such comprehensive analysis contributes to assessing the relative importance of various factors influencing children's cognition and mental health, and it can aid future studies designed for identifying health policy implications. Considering the directions and magnitudes of the effects, though the effects of PGS remain significant, aggregated effects of environmental factors account for much greater degrees on PLEs.” (line 419~430)

      Reviewer #2 (Recommendations For The Authors):

      I thank the authors for addressing most of my comments. I feel the manuscript has already greatly improved.

      I have a few more comments.

      1) Although I did not make this comment, I find the authors' reply to the following comment by Reviewer #1 unclear: Original comment 'I like that the assessment of CP (cognitive performance) and self-reports PLEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL (Child Behavior Checklist) were used and how did they correlate with the child-reported PLEs? And how was distress taken into account in the child self-reported PLEs measurement? Which PLEs measures were used?'

      The authors' response refers to correlation coefficients, but I think Reviewer #1's inquiry was on more than these correlations.

      Thank you for your concern. We think that this comment was referring to our previous manuscript submitted elsewhere. In our initial submission to eLife, we already added the details about the four items from the parent-reported CBCL and how distress was considered in the child self-reported PLEs measurement (Appendix S1, page 48).

      2) Regarding the authors' reply that they have 'standardized the use of 'cognitive capacity' - I do not understand what this means. How exactly was this term standardized? In fact, I can find the term 'cognitive capacity' only once and it seemed to have been deleted from the manuscript. This is fine, but it doesn't clearly align with the statement that this term has been standardized.

      We apologize for causing such confusion. What we meant was that throughout our revised manuscript, we used the term “cognitive phenotypes” instead of “cognitive capacity”.

      3) Regarding my initial comment that 'it needs to be described how cognitive performance was defined in Lee 2018.' - I believe this is still not clarified. The authors write 'CP was measured as the respondent's score on cognitive ability assessments', but it remains unclear what exactly these assessments were.

      Thank you for pointing this out. We added that “CP, measured as the respondent's score on cognitive ability assessments of general cognitive function and verbal-numerical reasoning, was assessed in participants from the COGENT consortium and the UK Biobank” (line 204~206).

      4) Regarding the authors' reply to my comment 'In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?'

      I can see that the authors explained 'components' (=the weighted sum of observed variables), but please also add what you mean by 'factors' - and how these are different from 'components' (line 284). Furthermore, I don't think it is correct that SEMs can only model latent factors, but not components (=measured variables). I also cannot see how using a weighted sum of observed variables controls more effectively for bias in estimation than latent factors. However, even though I do have some knowledge on this method, I'm not an expert and would appreciate the authors, other reviewer and/or editor to weigh in on this point.

      Thank you for pointing this out. We added that latent factors are indirectly measured indicators that explain the covariance among observed variables (line 263~271). We also added that standard SEM method using latent factors assumes that observed variables within each construct share a common underlying factor, but if this assumption is not met, then the standard SEM method cannot effectively control for biases. This is the reason why the IGSCA method, which addresses this limitation by allowing for use of both composite and latent factors as constructs.

      “Standard SEM using latent factors (i.e., indirectly measured indicators that explain the covariance among observed variables) to represent indicators such as PGS or family SES relies on the assumption that observed variables within each construct share a common underlying factor. If this assumption is violated, standard SEM cannot effectively control for estimation biases. The IGSCA method addresses this limitation by allowing for the use of composite indicators (i.e., components)—defined as a weighted sum of observed variables—as constructs in the model, more effectively controlling bias in estimation compared to the standard SEM. During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components.” (line 263~271)

      5) I overall disagree with the authors' following statement 'It has been suggested from prior studies that these variables (PGS, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor', but I appreciate the authors' argument.

      Thank you for your comment. To make clarify our statement in the manuscript, we changed the sentence to “Considering that the observed variables of the PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs are evaluated as a composite index by prior research, the IGSCA method can mitigate bias more effectively by representing these constructs as components” (line 274~277).

      6) Regarding 'genetic ethnicity': please describe your methods on how this was defined.

      Genetic ethnicity was defined as the genetic ancestry of participants, which is included as one of observations in the original ABCD Study data. To avoid further confusion, we corrected ‘genetic ethnicity’ to ‘genetic ancestry’ throughout the manuscript.

      7) Regarding 'a more direct genetic predictor of PLEs' - I still don't understand what the contrast is here. More direct than what else?

      The description was unclear; we removed it from our manuscript.

      8) Regarding the factor loadings in Figure 3: I don't understand how deprivation loads positively on 'low neighborhood SES', but poverty loads negatively. Shouldn't they both show the same direction of effect/loading on neighbourhood SES, while 'years of residency' should show the opposite direction (i.e., deprivation and poverty = risk, while years of residency = protective)? Are these unexpected loadings?

      The authors did not yet respond to this point: 'Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?' Were these correlations not modelled? Why not?

      Figure 3B is still unclear. Was intelligence included here? What is the difference between Figure 3A and B? The legend suggests that 3B shows the indirect effects, but figure 3B looks like a direct effect, while 3A seem to show the indirect effect.

      The reviewer’s confusion resulted from our incorrect description. The factor loadings of low neighborhood SES were marked incorrectly. The loading for ‘years of residence’ and ‘poverty’ should be switched: -0.3648 for ‘years of residence’ and +0.877 for ‘poverty’. This was a mistake when we were applying factor loadings in the Figure. We thank you for pointing this out.

      We apologize for missing your point on autocorrelation. Adding autocorrelations between the three PLEs is unrelated to our research goal. In this paper, we investigated how genetic and environmental factors explain the variations in PLEs between participants, regardless of changes over time. Since we used PLEs of multiple follow-ups to ensure that the results are robust irrespective of the timing of PLE measurements, taking autocorrelation into account is not necessary.

      The decision to add autocorrelation, which involves using the outcome variable at time (t-1) as a predictor for the outcome variable at time t, depends on the research focus. If your interest lies in explaining inter-individual variation in the rate of change in PLEs over a one-year period, then autocorrelation should be controlled for (typically, predictors measured at different time points are used in such cases). However, this was not the focus of this paper, which is why we did not apply autocorrelation in the SEM analysis.

      We apologize for the confusion between Figure 3A and 3B. To clarify, we added titles in the figure images as “Direct effects” and “Indirect effects”. We also changed the legend as well.

      “A. Direct pathways from PGS, high family SES, low neighborhood SES, and positive environment to cognitive intelligence and PLEs. Standardized path coefficients are indicated on each path as direct effect estimates (significance level *p<0.05). B. Indirect pathways to PLEs via intelligence were significant for polygenic scores, high family SES, low neighborhood SES, and positive environment, indicating the significant mediating role of intelligence.” (line 968~973)

      Figure 3A shows direct effects: i.e., the coefficients of paths from PGS, family SES, neighborhood SES, and positive environment to intelligence and PLEs, as well as the coefficient of paths from intelligence to PLEs. This is why Figure 3A shows colored arrows starting from PGS, family and neighborhood SES, and positive environment towards intelligence and PLEs, as well as the arrows from intelligence to PLEs. On the other hand, in Figure 3B, the colored arrows staring from PGS, family and neighborhood SES, and positive environment goes through intelligence, and heads towards PLEs. This was meant to show that the indirect effects shown in Figure 3B indicate the specific effects of PGS, family SES, neighborhood SES, and positive environment on PLEs mediated by intelligence.

      In short, Figure 3 can be seen as a diagram drawn from Table 2: direct effects of the genetic and environmental variables on intelligence and PLEs, and direct effects of intelligence on PLEs are shown in Figure 3A; indirect effects of genetic and environmental variables on PLEs mediated by intelligence are shown in Figure 3B.

      9) Regarding Supporting Information tables: to make these more digestible, I suggest using Excel and adding one table per sheet with a clear title and legend, indicating what each table shows. For example, Table S1 has 9(?) different subsections, all called the same (Linear Mixed Model: Multiethnic). It is not clear how each subsection differs from the others. Separate tables in separate excel sheets might be easier.

      Also, I think two decimal points might be good enough, enhancing readability of these tables.

      Thank you for your suggestion. We moved the supplementary tables into an external Excel file, with each sheet showing different tables, as well as titles, legends, and clear subsections.

      10) Regarding reporting exact p-values in Table 2: I don't understand. At the moment, categorical significance statements are reported. Were these not based on exact p-values (or how else was it decided if a finding was significant at a 0.05 (?) significance level).

      Either remove the significance column completely (as p-values cannot be estimated due to non-normality) or specify exactly/clarify what this column shows and this was derived.

      We apologize for the confusion. In Table 2, we checked the significance of each path using 95% confidence intervals with 5,000 bootstrapping iterations. Since 95% confidence intervals that does not include zero is equivalent to p-values below 0.05 significance level, we believe this is an appropriate alternative for reporting the significance of each path in the SEM model.

      We specified the reason why we were not able to calculate exact p-values (clean copy: line 299~303). “As a trade-off for obtaining robust nonparametric estimates without distributional assumptions for normality, the IGSCA method does not return exact p-values (Hwang, Cho, Jung, et al., 2021). As a reasonable alternative, we obtained 95% confidence intervals based on 5,000 bootstrap samples to test the statistical significance of parameter estimates.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not. The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This important study from Godneeva et al. establishes a Drosophila model system for understanding how the activity of Tif1 proteins is modified by SUMO. The authors nicely show that Bonus, like homologous mammalian Tif1 proteins, is a repressor, and that it interacts with other co-repressors Mi-2/NuRD and setdb1 in Drosophila ovaries and S2 cells. They also show that Bonus is SUMOylated by Su(var)2-10 on at least one lysine at its N-terminus to promote its interaction with setdb1. By combining nice biochemistry with an elegant reporter gene approach, they show that SUMOylation is important for Bonus interaction with setdb1, and that this SUMO-dependent interaction triggers high levels of H3K9me3 deposition and gene silencing. While there are still major questions of how SUMO molecularly promotes this process, this study is a valuable first step that opens the door for interesting future experimentation.

      Major Point:

      The RNAseq and ChIPseq data is not available. This is critical for the review of the paper and would help the readers and reviewers interpret the Bonus mutant phenotype and its mechanism of repressing genes.

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      1) The author's conclusion that Bonus SUMOylation is "essential for its chromatin localization" is not supported by the data. Figure 5F shows less 3KR mutant in the chromatin fraction but there is still significant signal.

      We appreciate the reviewer's feedback and agree that the term "essential" was not appropriate in this context. We have revised the manuscript to replace "essential" with "contributes to" to accurately reflect our findings.

      2) The author's conclusion that Bonus is SUMOylated at a single site close to its N-terminus is not necessarily true. In several SUMO and Bonus blots throughout the paper (5B, 6C, S4A), there are >2 differentially migrating species that could represent more than one SUMO added to Bonus. While the single K20R mutation eliminates all of these species in Fig 5C, it is possible that K20R SUMOylation is required for additional SUMOylation events on other residues. One way to determine if Bonus is SUMOylated on multiple sites is to add recombinant SUMO protease to the extract and see if multiple higher molecular weight bands collapse into a single migrating species (implying multiple SUMOs) or multiple migrating species (implying something else is altering gel migration).

      We appreciate the suggestion made by the reviewer. While we acknowledge the presence of occasional multiple bands in SUMO Western blots, the predominant pattern is the presence of unmodified Bon and a single additional band corresponding to SUMO-modified Bon. To investigate the possibility of multi-site SUMOylation, we performed requested experiment where we added SENP2 SUMO protease to the extract and checked Bon's SUMOylation. In the presence of NEM, we observed the unmodified form of Bon, as well as a single additional band representing a SUMO-modified form of Bon. Following SENP2 SUMO protease treatment, SUMOylation form of Bon was completely abolished in all samples, leaving only the unmodified Bon band (Extended Data Fig. 4D). This indicates that Bon is not SUMOylated on multiple sites and that the observed differential migration species likely result from other factors affecting gel migration.

      3) The authors state that most upregulated genes in BonusGLKD are not highly enriched in H3K9me3. The heatmap in figure 3D is not an ideal presentation of this argument. The authors should show an example of what the signal on a highly enriched gene looks like for comparison. The authors also argue that because most upregulated genes in BonusGLKD are not highly enriched in H3K9me3, they must be indirectly repressed. Another possibility is that bonus-mediated H3K9me3 is only important (and present) during early nurse cell differentiation and is later lost and dispensable during the rapid endocycles. After bonus establishes repression though H3K9me3, it might be maintained through bonus-Mi2/Nurd, something else, or nothing at all. The authors could discuss this possibility or perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation rather than in whole ovaries, which are enriched for later stages.

      We thank the reviewer for their thoughtful comments and suggestions. In our revised manuscript we have included the tracks of gene that is highly enriched in H3K9me3 but remain unchanged upon Bon GLKD (Extended Data Fig. 3B). This addition allows for a visual comparison and better supports our argument that majority of genes upregulated in Bon GLKD are not enriched in H3K9me3 mark. We also appreciate the reviewer's suggestion regarding the potential temporal dynamics of Bon-mediated H3K9me3. It is indeed possible that Bon's role in establishing H3K9me3 might be more prominent during early nurse cell differentiation and less critical in later stages. We included discussion of this possibility in revised manuscript. To further explore it would be valuable to perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation. However, given the limitations of our current resources and time limitations, we were unable to perform these experiments for the revised manuscript.

      4) The BonusGLKD RNAseq analysis is underwhelming. The conclusion that "Bonus represses tissue-specific genes" has limited value. Every gene that is not expressed in ovaries is "tissue-specific." What subset of tissue-specific genes does Bonus repress? What common features do these genes have and how do they compare to other sets of tissue-specific genes, such as those reportedly repressed by setdb1, Polycomb proteins, small ovary, l(3)mbt, and stonewall (among others in female germ cells). Comparing these available data sets could help the authors understand the mechanism of Bonus repression and how BonusGLKD leads to sterility. The authors could also further analyze the differences between nos-Gal4 and MT-Gal4 to better understand why nos- but not MT-driven knockdown is sterile.

      We appreciate the reviewer's feedback regarding the RNA-seq analysis and acknowledge the importance of identifying the specific subset of tissue-specific genes. The Figure 2C shows specific tissues where genes derepressed upon Bon GLKD are normally expressed. These are tissues/organs such as the head, digestive system, and nervous system. The reviewer's suggestion to compare our findings with existing datasets are valid and could indeed provide a more comprehensive understanding of Bon repression and its implications in female germ cells. However, many of the published datasets are based on mutant fly lines or use different GAL4 drivers to induce knockdowns, making direct comparisons challenging. We have conducted a preliminary analysis of available data, specifically nos-Gal4>SetDB1KD (GSE109852), and identified an overlap of 135 genes out of the 464 genes upregulated upon nos-Gal4>BonusKD with those affected by SetDB1 knockdown. We have included this result in the revised manuscript.

      Main Study Limitations:

      1) It is unclear which genes are directly vs indirectly regulated by bonus, which makes it difficult to understand Bonus's repressive mechanism. Several lines of experiments could help resolve this issue. 1) Bonus ChIPseq, which the authors mentioned was difficult. 2) RNAseq of BonusGLKD rescued with KR3 mutation. This would help separate SUMO/setdb1-dependent regulation from Mi-2 dependent regulation. Similarly, comparing differentially expressed genes in Su(var)2-10GLKD, setdb1GLKD, 3KR rescue, and MI-2 GLKD could identify overlapping targets and help refine how bonus represses subsets of genes through these different corepressors.

      We appreciate the reviewer's suggestions and agree that discrimination between direct and indirect Bon targets should be the next step in understanding Bon repressive mechanism. We have previously attempted to determine Bon direct targets using ChIP-seq approach. However, despite our multiple efforts using both native Bon antibodies and GFP-tagged Bon fly lines, analysis of ChIP-seq data did not reveal specific enrichment indicating that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP. The recommendation for RNA-seq analysis of Bon GLKD rescued with the 3KR mutation is valuable, and we will certainly consider it for future investigations.

      We compared differentially expressed genes in Su(var)2-10 GLKD and Mi-2 GLKD and found limited overlap: out of the 231 genes affected by Bon GLKD, 39 genes were affected in Mi-2 GLKD and 42 in Su(var)2-10 GLKD. We acknowledge the importance of understanding which genes are directly or indirectly regulated by Bon and the potential for further experiments to address this question.

      2) The paper falls short in discussing how SUMO might promote repression. This is important when considering the conservation (of lack thereof) of SUMOylation sites in Tif1 proteins in distantly related animals. One piece of data that was not discussed is the apparent localization of SUMOylated bonus in the cytoplasmic fraction of the blot in Figure 5F. Su(var)2-10 is mostly a nuclear protein, so is bonus SUMOylated in the nucleus and then exported to the cytoplasm? Also, setdb1 is a nuclear protein, so it is unlikely that the SUMOylated bonus directly interacts with setdb1 on target genes. Together with Fig 5E (unSUMOylatable Bonus aggregates in the nucleus), one could make a model where SUMO solubilizes bonus (perhaps by disassembling aggregates) and indirectly allows it to associate with setdb1 and chromatin. It is also important to note that in Figure 5I, the K3R mutation appears to lessen but not eliminate Bonus interaction with setdb1. This data again disfavors a model where SUMO establishes an interaction interface between setdb1 and Bonus. To determine which form of Bonus interacts with setdb1, the authors could perform a setdb1 pulldown and monitor the SUMOylation state of coIPed Bonus through mobility shift. If mostly unSUMOylated bonus interacts with setdb1, and SUMO indirectly promotes Bonus interaction with setdb1 (perhaps by disassembling Bonus aggregates), then the precise locations of Bonus SUMOylation sites could more easily shift during evolution, disfavoring the author's convergent evolution hypothesis.

      We appreciate the reviewer's valuable feedback. Regarding the observation of SUMOylated Bon in the cytoplasmic fraction in Figure 5F, we recognize its significance. This finding has prompted us to consider a model in which SUMOylation may play a role in translocating Bon from the nucleus to the cytoplasm, potentially influencing interactions with SetDB1 and chromatin indirectly. Furthermore, Figure 5I which shows only a partial reduction in Bon-SetDB1 interaction with the 3KR mutation, suggests that SUMO may not be the primary mediator of this interaction. We recognize the need for further investigations to clarify SUMO's exact role in this context. In response to the reviewer's suggestion, we conducted SetDB1 pulldown experiments in S2 cells. The results reveal that indeed SetDB1 primarily interacts with unmodified Bon which is by far more abundant compared to SUMOylated form (Extended Data Fig. 5C). We think this experiment presents certain technical challenges, as the signal for Bon, when used as prey in co-IP experiments, is relatively faint, making it inherently difficult to detect the lower levels of SUMO-modified Bon. Additionally, in revised manuscript we have added new result of determining Bon interactors in ovary using mass-spec analysis, which showed that SetDB1 associates with wild-type, but not SUMO-deficient Bon. While our data support the idea that SUMO may contribute to Bon solubilization, possibly by disassembling aggregates, thereby indirectly facilitating its association with SetDB1 and chromatin, we acknowledge that the precise mechanism remains unclear.

      Reviewer #2 (Public Review):

      Summary:

      The authors analyze the functions and regulation of Bon, the sole Drosophila ortholog of the TIF1 family of mammalian transcriptional regulators. Bon has been implicated in several developmental programs; however, the molecular details of its regulation have not been well understood. Here, the authors reveal the requirement of Bon in oogenesis, thus establishing a previously unknown biological function for this protein. Furthermore, careful molecular analysis convincingly established the role of Bon in transcriptional repression. This repressor function requires interactions with the NuRD complex and histone methyltransferase SetDB1, as well as sumoylation of Bon by the E3 SUMO ligase Su(var)2-10. Overall, this work represents a significant advance in our understanding of the functions and regulation of Bon and, more generally, the TIF1 family. Since Bon is the only TIF1 family member in Drosophila, the regulatory mechanisms delineated in this study may represent the prototypical and important modes of regulation of this protein family. The presented data are rigorous and convincing. As discussed below, this study can be strengthened by a demonstration of a direct association of Bon with its target genes, and by analysis of the biological consequences of the K20R mutation.

      Strengths:

      1. This study identified the requirement for Bon in oogenesis, a previously unknown function for this protein.
      2. Identified Bon target genes that are normally repressed in the ovary, and showed that the repression mechanism involves the repressive histone modification mark H3K9me3 deposition on at least some targets.
      3. Showed that Bon physically interacts with the components of the NuRD complex and SetDB1. These protein complexes are likely mediating Bon-dependent repression.
      4. Identified Bon sumoylation site (K20) that is conserved in insects. This site is required for repression in a tethering transcriptional reporter assay, and SUMO itself is required for repression and interaction with SetDB1. Interestingly, the K20-mutant Bon is mislocalized in the nucleus in distinct puncta.
      5. Showed that Su(var)2-10 is a SUMO E3 ligase for Bon and that Su(var)2-10 is required for Bon-mediated repression.

      Weaknesses:

      The study would be strengthened by demonstrating a direct recruitment of Bon to the target genes identified by RNA-seq. Given that the global ChIP-seq was not successful, a few possibilities could be explored. First, Bon ChIP-qPCR could be performed on the individual targets that were functionally confirmed (e.g. rbp6, pst). Second, a global Bon ChIP-seq has been reported in PMID: 21430782 - these data could be used to see if Bon is associated with specific targets identified in this study. In addition, it would be interesting to see if there is any overlap with the repressed target genes identified in Bon overexpression conditions in PMID: 36868234.

      We greatly appreciate the reviewer's suggestion to demonstrate the direct recruitment of Bon to the target genes. As described in our answer to reviewer #1, we attempted to determine Bon direct targets using ChIP-seq approach using both native Bon antibodies and GFP-tagged Bon fly lines. However, analysis of ChIP-seq data did not reveal specific enrichment. Similarly, Bon ChIP-qPCR on individual targets showed the same results suggesting that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP protocol, at least in standard conditions. To further explore this issue, we have analyzed results of a global Bon ChIP-seq reported in PMID: 21430782. We did not find Bon binding to individual targets, but even more importantly, we did not see clear Bon enrichment elsewhere in the genome confirming a conclusion that Bon targets on chromatin cannot be determined by ChIP. Additionally, we explored the possibility of overlap between target genes repressed by Bon in our study and those observed under Bon overexpression conditions in PMID: 36868234. While we did identify 41 genes in common, it's important to note that the datasets are derived from different tissues (pupal eyes vs. ovaries), making direct comparison problematic.

      The second area where the manuscript can be improved is to analyze the biological function of the K20R mutant Bonus protein. The molecular data suggest that this residue is important for function, and it would be important to confirm this in vivo.

      We appreciate the reviewer's suggestion to analyze the biological function of the K20R mutant Bon protein. While we acknowledge that we did not use single-site K20R mutant for in vivo experiments, we demonstrated that the mutant with the three-residue substitution (3KR) is incapable of inducing repression (Figure 5G). Given that other experiments consistently showed that K20 is the primarily SUMOylation site, this result supports the conclusion that K20 SUMOylation plays an important role in Bon-mediated transcriptional silencing.

      Reviewer #1 (Recommendations for The Authors):

      Make the RNAseq and ChIPseq data publicly available!

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      Reviewer #2 (Recommendations for The Authors):

      It would be interesting to identify the biological basis of aberrant ovary development in Bon depletion conditions. Previous studies (e.g. PMID: 11336699) suggested that Bon loss of function clones are cell lethal, and the developmental defects in oogenesis presented in the current study offer an opportunity to delve more into the causes of cell loss, e.g. by showing that the cells die via apoptosis.

      Thank you for your valuable suggestion. In response to your comment, we performed a TUNEL assay to investigate whether germ cells in nos-Gal4>BonusKD ovaries undergo apoptosis. Our results indeed indicate that germ cells in these ovaries exhibit apoptosis, as evidenced by the TUNEL signal (Extended Data Fig. 1C). This information has been included in the revised manuscript to provide insights into the biological basis of aberrant ovary development in Bon depletion conditions.

      The K20 residue could also be ubiquitinated. This possibility could at least be discussed, particularly given the presence of the RING Ub ligase domain in Bon that might potentially perform self-ubiquitination.

      Indeed, the possibility that Bon can be ubiquitinated is a valid consideration. We have explored this possibility. We did not detect any signals with the Ubiquitin antibody in both wild-type Bon immunoprecipitant and triple-mutant [3KR] ovaries (in which K20 is also mutated) (Extended Data Fig. 4C). This suggests that K20 is more likely responsible for Bon SUMOylation rather than ubiquitination. We appreciate the reviewer's suggestion and have included this information into the revised manuscript.

    1. Author Response

      We very much appreciate all the reviewers’ positive feedback and additional comments and suggestions for this manuscript!

      In this provisional reply, we’d like to quickly address only one selected key point, for which we have already collected relevant experimental data:

      Reviewer 1 suggests that ‘it would have been more rigorous for the authors to independently reproduce the kinetics reported for nsp8/9 using their specific experimental conditions.’ We absolutely agree with this and have already carried out these kinetic experiments while our paper was under review. We have now measured kinetic parameters for cleavage of the nsp8/9 peptide in our own hands under the same conditions as we used for nsp4/5 and TRMT1. We measured kcat and KM values of 0.019 +/- 0.002 s-1 and 40 +/- 7.5 µM, respectively, for nsp8/9 cleavage; these data are very much in line with the previously reported values from MacDonald et al (kcat = 0.013 +/- 0.001 s-1, KM = 36 +/- 6.0 µM) that we used for comparison in Figure 4 and listed in Table S2. We will add our own measured kinetic values for nsp8/9 in the next version of our manuscript, but wanted to report these numbers as soon as possible, because this further supports and validates our claim that the human TRMT1 sequence is cleaved at a similar rate to the known nsp8/9 viral polypeptide cleavage site.

      We will provide a detailed, point-by-point reply to all reviewer comments accompanying the forthcoming revised manuscript, in which we intend to have new and updated data and additional MD simulations that directly address key questions raised by the reviewers.

    1. Author Response

      We thank the reviewers for their suggestions in improving the manuscript. We are currently working on a formal revision and plan to submit a revised manuscript in the near future. However, we would be remiss, if we did not address concerns regarding the conceptual merits of the paper. Below we speak to major points of note that address select reviewer comments and the eLife assessment of our manuscript.

      eLife assessment:

      However, the strength of evidence is incomplete due to the concern that larval contraction is a result of chilling the nervous system and muscles, which causes spreading depolarization and mechanical contraction of the body, rather than an active sensorimotor response to cold.

      Reviewer #3:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no.

      We thank the reviewer for their questions and clarify, here. Exposure to cold temperatures does elicit a recognizable and repeatable pattern of behavior across multiple strains, including both wildtype and genetic control strains (w1118, Oregon R) and numerous control conditions that have been previously published (Himmel et al., 2021, Himmel et al., 2023, Patel et al., 2022, Turner et al., 2016, Turner et al., 2018, Tenedini et al., 2019). Our initial publication on Drosophila cold nociception demonstrated a variety of cold-evoked behavior responses including head and/or tail raising of the larva as well as contraction behavior. These behaviors were repeatedly observed in assays involving either local cold stimulation with a cold probe or global cold stimulation on a cold plate. Head and/or tail raise behaviors are consistent with behavior that displaces the larval body from the cold surface, however, exposure to increasingly colder temperatures leads to an increasing level of cold-evoked contraction (CT) responses which result in a reduction of larval area (Turner et al., 2016). Presumably, increasing the level of CIII md neuron activation leads to greater activation of downstream circuitry. We previously performed optogenetic dose response assays to further clarify the increased prevalence CT response to strong noxious cold stimuli and investigated how CIII md neurons discriminate between innocuous touch and noxious cold stimuli. Here, we found that lower-level activation of CIII md neurons lead to predominantly touch-evoked behaviors whereas high-level activation led predominantly to cold-evoked responses (Turner et al., 2016). These analyses were coupled with stimulus-evoked calcium imaging, which revealed that touch-evoked Ca2+ levels were significantly lower than cold-evoked Ca2+ levels (Turner et al., 2016).

      In this manuscript, we confirm our previously published findings that neural silencing of CIII md neurons with either tetanus toxin expression or impairing action potential propagation results impaired cold-evoked CT responses (Turner et al., 2016, Turner et al., 2018). However, neural silencing of CIII md neurons did not eliminate cold-evoked CT responses. We interpret this finding as evidence that some component of cold-evoked CT response may be due to cold-induced muscle contraction. Furthermore, in this manuscript, we implicate the requirement of chordotonal (Ch) neurons in cold-evoked CT and demonstrate cold-evoked Ca2+ increases in Ch neurons. Furthermore, neural silencing of multiple sensory neuron types (CIII + Ch or CIII + CII) resulted in greater deficits in cold-evoked behaviors (Turner et al., 2016). Thus, the noxious cold stimulus is detected by multiple peripheral sensory neurons and inhibiting neural activity in CIII md neurons alone cannot eliminate cold-evoked CT responses.

      In this manuscript and in several other publications, studies have shown that optogenetic activation of CIII md neurons, or CIII neurons plus CII neurons or Ch neurons elicits CT-like responses (Hwang et al., 2007, Shearin et al., 2013, Turner et al., 2016). Conversely, optogenetic stimulation of CIII md neurons knocked down for paralytic, the α-subunit of voltage-gated sodium channel, did not elicit blue light-evoked CT responses due to impaired action potential propagation. These analyses collectively indicate that CIII md neuron activation is sufficient for eliciting CT-like responses. Additionally, we have previously published electrophysiological recordings of CIII md neurons under cold exposure. To address potential confounds of cold-induced muscle contraction on cold-induced electrical activity of CIII md neurons, we performed these analyses on de-muscled fillets revealing that CIII neural activity is not dependent upon muscles in response to cold. Exposure to noxious cold stimuli results in temperature-dependent increases in CIII neuron firing pattern consisting of both bursting and tonic firing (Himmel et al., 2021, Himmel et al., 2023, Maksymchuk et al., 2022, Patel et al., 2022, Himmel et al., 2022, Maksymchuk et al., 2023).

      Reviewer #3:

      Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm?

      We were similarly curious about the neuroethological and/or protective implications of cold-evoked behaviors. In Drosophila larvae, noxious mechanical stimuli-evoked body rolling allows for lateral escape from predatory wasp (Hwang et al., 2007). Reducing the overall surface area that is exposed to cold (e.g., huddling behavior) serves as a protective strategy in many species (Canals et al., 1997, Contreras, 1984, Gilbert et al., 2006, Vickery and Millar, 1984, Hayes et al., 1992). Low temperatures can be fatal to poikilotherms (e.g., insects), however, many species have evolved the ability to cold acclimate thereby increasing their cold tolerance. To explore the potential evolutionary benefit of CIII-mediated contraction response to cold, we previously published work revealing a neural basis for cold acclimation in Drosophila larvae implicating these neurons (Himmel et al., 2021). We demonstrated that cold-evoked CT behavior is evolutionarily conserved across 11 different drosophilid species and that other cold-induced behaviors (e.g., tail raise) were also observed. Furthermore, drosophilid species adapted to rapid temperature swings were more likely to retain the ability to locomote even at lower temperatures (Himmel et al., 2021). Next, we elucidated the role of CIII md neurons in cold acclimation. Silencing CIII md neurons resulted in the inability to cold acclimate. We additionally investigated roles of Ch or CII md neurons, which alone did not inhibit the ability of larvae to cold acclimate. However, combinatorial silencing of CIII with CII or Ch neurons resulted in an inability to cold acclimate but did not obviously increase baseline cold tolerance. We explored how developmental exposure to noxious cold temperature impacts CIII md neuron cold-evoked firing pattern. Electrophysiological analyses revealed that cold acclimation results in hypersensitization in CIII md neurons (Himmel et al., 2021). Lastly, developmental optogenetic activation of CIII md neurons led to increased cold tolerance. Therefore, CIII md neurons are necessary and sufficient for cold tolerance and our collective evidence demonstrate that CIII-mediated cold nociception constitutes a peripheral neural basis for Drosophila larval cold acclimation (Himmel et al., 2021).

      Reviewer #3:

      It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013)…The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites)…. Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?”

      We agree with the reviewer that mild activation of CIII md neurons results in gentle touch-like responses. In this manuscript, and other previously published work, it has been shown that optogenetic activation of CIII neurons, or CIII neurons and other sensory neurons, using a variety of optogenetic actuators (ChR2, ChETA, and CsChrimson) promotes bilateral contraction of the larval body along the anterior-posterior axis (Shearin et al., 2013, Hwang et al., 2007, Meloni et al., 2020, Turner et al., 2016, Patel and Cox, 2017, Patel et al., 2022, Himmel et al., 2023).

      As described above, in our initial publication documenting larval cold nociception in Drosophila, we investigated how CIII md neurons discriminate multimodal stimuli to elicit stimulus relevant behavioral responses. We reported that increased activation of CIII md neurons results in cold-evoked behaviors, where lower activation results in touch-evoked behaviors. Subsequent, calcium analyses revealed greater stimulus-evoked calcium response to noxious cold and milder calcium response to gentle touch (Turner et al., 2016).

      Though we have not performed cold-evoked Ca2+ imaging of CIII md neurons in larval preparations without muscles, we have recorded electrical responses of CIII md neurons in the absence of muscle contractions using de-muscled larvae fillets to analyze cold-evoked firing patterns of CIII md neurons (Himmel et al., 2021, Himmel et al., 2022, Himmel et al., 2023, Patel et al., 2022, Maksymchuk et al., 2022, Maksymchuk et al., 2023). These studies demonstrate the cold-evoked CIII neural activity is not dependent upon muscles.

      Reviewer #3:

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.”

      We conducted extensive screening of interneuron populations post-synaptically connected to CIII neurons in an effort to identify post-synaptic partners that were sufficient to trigger CT response. Much to our surprise, we were unable to find any individual neuron type or driver line that was sufficient to elicit a CT response. However, we provide substantial supporting evidence for our co-activation experiments including neural silencing, EM connectivity and calcium imaging. We also report necessity for the reported second/third order neurons in cold-evoked behavioral responses, where inhibiting neural activity resulted in reduced cold-evoked behavior. Second/third order neurons also exhibit cold-evoked calcium responses. Lastly, we also report CIII-evoked (using optogenetics) increases in calcium response in downstream post-synaptic neurons.

      Previously published literature investigating CIV md neuron circuitry has implicated downstream neurons that are not sufficient to elicit rolling behavior upon activation. In CIV md neuron circuit dissection, select neurons are reported as acting downstream of CIV md neurons that require additional circuit components in order to execute rolling behavior. For example, A00c neuron activation alone does not lead to rolling behavior, however, co-activation of A00c and Basin-4 neurons facilitates rolling response (Ohyama et al., 2015). Similarly, co-activation of Basin-1 and Basin-4 neurons significantly enhance rolling probability relative to Basin-4 alone (Ohyama et al., 2015). Further, DnB neurons require Goro command neuron activity to promote rolling behavior (Burgos et al., 2018). Thus, there is precedent for co-activation requirements to elicit robust behavioral output in sensorimotor circuits and we employed a similar strategy after we discovered that activation of second or third order neurons alone did not elicit CT response.

      Reviewer #3:

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this co-activation approach is not particularly clear.”

      We agree with the reviewer’s comment that various cell-types that were investigated are repeated in every abdominal neuromere, however, only select post-synaptic neurons (Basin 1-4, DnB, mCSI, and Chair neurons) are segmentally repeated in every abdominal segment. Conversely, other projection and ascending neurons we investigated (A09e, A00c, A05q, Goro, TePn04/05, and A08n) are not segmentally repeated in every section. We used connectome evidence to guide our experiments on populations of neurons to explore in cold-evoked behavior and as alluded to above our co-activation approach was driven by the observation that an individual subpopulation of connected interneurons was not found to be sufficient to elicit CT behavior. That said, it does not change the findings that inhibition of neural activity in these subpopulations impairs cold-evoked behavior, nor does it change the observation that connected interneurons exhibit cold-evoked Ca2+ responses that can also be observed with optogenetic activation of CIII neurons. Reviewer #3: “The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.” Many studies exploring circuit bases of behavior have applied large-scale optogenetic, including co-activation strategies, or silencing screens to identify circuit components involved in specific behaviors under investigation. We employed similar methods in our circuit-based dissection and our conclusions are not solely based upon optogenetic analyses.

      References: BURGOS, A., HONJO, K., OHYAMA, T., QIAN, C. S., SHIN, G. J.-E., GOHL, D. M., SILIES, M., TRACEY, W. D., ZLATIC, M., CARDONA, A. & GRUEBER, W. B. 2018. Nociceptive interneurons control modular motor pathways to promote escape behavior in Drosophila. eLife, 7:e26016.

      CANALS, M., ROSENMANN, M. & BOZINOVIC, F. 1997. Geometrical aspects of the energetic effectivenes of huddling in small mammals. Acta Theriologica 42(3):321-328..

      CONTRERAS, L. C. 1984. Bioenergetics of Huddling: Test of a Psycho-Physiological Hypothesis. Journal of Mammalogy, 65, 256-262.

      GILBERT, C., ROBERTSON, G., LE MAHO, Y., NAITO, Y. & ANCEL, A. 2006. Huddling behavior in emperor penguins: Dynamics of huddling. Physiol Behav, 88, 479-88.

      HAYES, J. P., SPEAKMAN, J. R. & RACEY, P. A. 1992. The Contributions of Local Heating and Reducing Exposed Surface Area to the Energetic Benefits of Huddling by Short-Tailed Field Voles (Microtus agrestis). Physiological Zoology, 65, 742-762.

      HIMMEL, N. J., LETCHER, J. M., SAKURAI, A., GRAY, T. R., BENSON, M. N., DONALDSON, K. J. & COX, D. N. 2021. Identification of a neural basis for cold acclimation in Drosophila larvae. iScience, 24, 102657.

      HIMMEL, N. J., SAKURAI, A., DONALDSON, K. J. & COX, D. N. 2022. Protocols for measuring cold-evoked neural activity and cold tolerance in Drosophila larvae following fictive cold acclimation. STAR Protoc, 3, 101510.

      HIMMEL, N. J., SAKURAI, A., PATEL, A. A., BHATTACHARJEE, S., LETCHER, J. M., BENSON, M. N., GRAY, T. R., CYMBALYUK, G. S. & COX, D. N. 2023. Chloride-dependent mechanisms of multimodal sensory discrimination and nociceptive sensitization in Drosophila. elife, 12:e76863.

      HWANG, R. Y., ZHONG, L., XU, Y., JOHNSON, T., ZHANG, F., DEISSEROTH, K. & TRACEY, W. D. 2007. Nociceptive Neurons Protect Drosophila Larvae from Parasitoid Wasps. Current Biology, 17, 2105-2116.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. 2022. Transient and Steady-State Properties of Drosophila Sensory Neurons Coding Noxious Cold Temperature. Frontiers in Cellular Neuroscience, 16:831803.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. S. 2023. Cold-Temperature Coding with Bursting and Spiking Based on TRP Channel Dynamics in Drosophila Larva Sensory Neurons. Int J Mol Sci, 24(19):14638.

      MELONI, I., SACHIDANANDAN, D., THUM, A. S., KITTEL, R. J. & MURAWSKI, C. 2020. Controlling the behaviour of Drosophila melanogaster via smartphone optogenetics. Scientific Reports, 10, 17614.

      OHYAMA, T., SCHNEIDER-MIZELL, C. M., FETTER, R. D., ALEMAN, J. V., FRANCONVILLE, R., RIVERA-ALBA, M., MENSH, B. D., BRANSON, K. M., SIMPSON, J. H., TRUMAN, J. W., CARDONA, A. & ZLATIC, M. 2015. A multilevel multimodal circuit enhances action selection in Drosophila. Nature, 520, 633-639.

      PATEL, A. & COX, D. 2017. Behavioral and Functional Assays for Investigating Mechanisms of Noxious Cold Detection and Multimodal Sensory Processing in Drosophila Larvae. BIO-PROTOCOL, 7(13):e2388.

      PATEL, A. A., SAKURAI, A., HIMMEL, N. J. & COX, D. N. 2022. Modality specific roles for metabotropic GABAergic signaling and calcium induced calcium release mechanisms in regulating cold nociception. Front Mol Neurosci 15:942548.

      SHEARIN, H. K., DVARISHKIS, A. R., KOZELUH, C. D. & STOWERS, R. S. 2013. Expansion of the Gateway MultiSite Recombination Cloning Toolkit. PLoS ONE, 8, e77724-e77724.

      TENEDINI, F. M., SÁEZ GONZÁLEZ, M., HU, C., PEDERSEN, L. H., PETRUZZI, M. M., SPITZWECK, B., WANG, D., RICHTER, M., PETERSEN, M., SZPOTOWICZ, E., SCHWEIZER, M., SIGRIST, S. J., CALDERON DE ANDA, F. & SOBA, P. 2019. Maintenance of cell type-specific connectivity and circuit function requires Tao kinase. Nature Communications, 10, 3506.

      TURNER, H. N., ARMENGOL, K., PATEL, A. A., HIMMEL, N. J., SULLIVAN, L., IYER, S. C., BHATTACHARYA, S., IYER, E. P. R., LANDRY, C., GALKO, M. J. & COX, D. N. 2016. The TRP Channels Pkd2, NompC, and Trpm Act in Cold-Sensing Neurons to Mediate Unique Aversive Behaviors to Noxious Cold in Drosophila. Curr Biol, 26, 3116-3128.

      TURNER, H. N., PATEL, A. A., COX, D. N. & GALKO, M. J. 2018. Injury-induced cold sensitization in Drosophila larvae involves behavioral shifts that require the TRP channel Brv1. PLoS One, 13, e0209577.

      VICKERY, W. L. & MILLAR, J. S. 1984. The Energetics of Huddling by Endotherms. Oikos, 43, 88-93.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: The current study reports a cryo-EM structure of MFS transporter MelB trapped in an inward-facing state by a conformationally selective nanobody. The authors compare this structure to previously-resolved crystal structures of outward-facing MelB. Additionally, the authors report H/D exchange/ mass spec experiments that identify accessible residues in the protein.

      Strengths: The authors overcame very significant technical challenges to solve the first inward-facing structure of the small, model MFS transporter MelB by cryo-EM. The use of conformation-trapping nanobodies (which had been reported previously by this group) is particularly nice.

      We appreciate reviewer #1’s positive comments.

      Weaknesses: Maps and coordinates were not provided by the authors, which presents a gap in this assessment.

      We didn’t know specific requests for maps & coordinates during the initial submission but will provide them per request.

      The authors highlight the use of HDX experiments as a measurement of protein conformational dynamics. However, this experiment does not measure the conformational dynamics of the transporter, since in these experiments exchange is not initiated by ligand addition or another trigger. The experiment instead measures the accessibility of different residues, and of course, a freely-exchanging sodium bound transporter would have more exchangeable positions than when a conformation-trapping nanobody is bound. It is not clear what new mechanistic information this provides, since this property of the nanobody has already been established.

      We thank you for your comment. We will address your and reviewer 2’s similar questions later.

      Based on the evidence presented, it is somewhat speculative that the structure represents the EIIa-bound regulatory state.

      We believe that have presented convincing evidence obtained by ITC and gel-filtration chromatography to support this statement. The effects of Nb725 or EIIAGlc on MelB functions are similar: little change in Na+ binding, little change in Nb725 or EIIAGlc binding in the absence or presence of the EIIAGlc or Nb725, but a great reduction in sugar-binding affinity (sFigs. 2&3; tables 1&2; published two papers in J. Biol. Chem. 2014; 289: 33012-33019 and 2023; 299:104967). To make it clear, we will add the related data from the two JBC papers into the table 2. Nb725 and EIIAGlc can concurrently bind to MelBSt (sFigs. 2&3; tables 1&2). Further, we will provide a new figure to show that a complex composed of all three proteins can be isolated by gel-filtration chromatography. We have also established this finding with another Nb733 from the same family (JBC, 2023; 299:104967). However, given the EIIAGlc-bound structure has not been resolved yet, we will tune down the related argument.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, Hariharan and colleagues present an elegant study regarding the mechanistic basis of sugar transport by the prototypical Na+-coupled transporter MelB. The authors identified a nanobody (Nb 725) that reduces melibiose binding but not Na+ binding. In vitro (ITC) experiments suggest that the conformation targeted by this nanobody is different from the published outward-open structures. They go on to solve the structure of this other conformational by cryo-EM using the Nanobody grafted with a fiducial marker and enhancer and, as predicted, capture a new conformation of MelB, namely the inward-open conformation. Through MD simulations and ITC measurements, they demonstrate that such state has a reduced affinity for sugar but that Na+ binding is mostly unaffected. A detailed observation and comparison between previously published structures in the outward-open conformation and this new conformational intermediate allows to strengthen and develop the mobile barrier hypothesis underpinning sugar transport. The conformational transition to the inward-facing state leads to the formation of a barrier on the extracellular side that directly affects the amino acid arrangement of the sugar binding site, leading to a decreased affinity that drives the direction of transport. In contrast, the Na+ binding remains the same. This structural data is complemented with dynamic insights from HDX-MS experiments conducted in the presence and absence of the Nb. These measurements highlight the overall protective effect of nanobody binding, consistent with the stabilization of one conformational intermediate.

      Strengths: The experimental strategy to isolate this elusive conformational intermediate is smart and well-executed. The biochemical and biophysical data were obtained in a lipid system (nanodiscs), which allows dismissing questions about detergent induced artefacts. The new conformation observed is of great interest and allows to have a better mechanistic understanding of ion-coupled sugar transport. The comparison between the two structures and the mobile barrier mechanism hypothesis is convincingly depicted and tested.

      We appreciate the reviewer’s insightful understanding of our novel findings and the associated explanations on the cation-coupled symport mechanisms.

      Weaknesses: This is excellent experimental work. My recommendations stem mostly from concerns regarding the interpretation of the observed results. In particular, I am somewhat puzzled by the important role the authors give to the regulatory protein EIIa with little structural or biophysical data to back up their claims. The hypothesis that the conformation captured by the Nb is physiologically and functionally equivalent to that caused by EIIa binding is definitely a worthy hypothesis, but it is not an experimental result. Evidence in support could include a structure with EIIa bound. Since it does not bind at the same location as the Nb, it seems feasible. Or, the authors could have performed HDX-MS in the presence of EIIa to determine if the effect is similar to that of Nb_725 binding. In the absence of these experiments, discussion about EIIa should be limited. Along the same lines, I find it misleading to put in the abstract a sentence such as "It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding, and also a structure mimicking the physiological regulatory state of MelB under the global regulator EIIAGlc of the glucose-specific phosphoenolpyruvate:phosphotransferase system." None of this is supported by the experimental work presented in this article: the Na+ is modelled (with great confidence, but still) and whether this structure mimics the physiological state of MelB bound to EIIa is not known. The results of the paper are strong and interesting enough per se, and there is no need to inflate them with hypothesis that belongs to the discussion section.

      As stated in the response to reviewer 1, we believe that we presented strong data to argue for a structure mimicking the physiological regulatory state of MelB. The only missing data is the lack of the structure determination of the EIIA-bound state. We will change the title and tune down the related discussions in a new version.

      Regarding our statement in our abstract that “It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding”, we believe that our claim is supported by the resolved Na+ binding in the cryoEM structure. So far, to our knowledge, there was no experimentally determined cation on its canonical binding site reported yet.

      I also note that the HDX-MS experiments do not distinguish between two conformational states, but rather an ensemble of states vs one state.

      We will address both reviewers 1 and 2 together. We agree with your comments and we compared the one (inward) state and ensembles of (predominantly outward) states. A lot of published data have demonstrated that the WT MelBSt predominantly populates outward-facing states, especially in the presence of Na+. The major differences in HDX-MS between the inward-facing state in the presence of the Nb and the outward-facing ensembles in the absence of the Nb should be related to the conformational changes between the inward- and outward-facing states, but not quantitatively. The type of measurements we performed do not contain information on the rates of conformational changes, but this study identified the dynamics regions involved in this conformational switch.

      Reviewer #3 (Public Review):

      Summary: The manuscript authored by Lan Guan and colleagues reveals the structure of the cytosol-facing conformation of the MelB sodium/Li coupled permease using the nab-Fab approach and cryoEM for structure determination. The study reveals the conformational transitions in the melB transport cycle and allows understanding the role of sugar and ion specificities within this transporter.

      Strengths: The study employs a very exciting strategy of transferring the CDRS of a conformation specific nano body to the nab-fab system to determine the inward-open structure of MelB. The resolution of the structure is reasonable enough to support the major conclusions of the study. This is overall a well-executed study.

      Thank you for your positive comments.

      Weaknesses: The authors seem to have mixed up the exothermic and endothermic aspects of ITC binding in their description. Positive heats correspond to endothermic heat changes in ITC and negative heat changes correspond to exothermic heats. The authors seem to suggest the opposite.

      This is consistently observed throughout the manuscript.

      All of our ITC data are correctly presented. Our data were collected from the NanoITC (TA instruments, Inc), which directly measures the heat release/enthalpic changes and projects exotherm with positive values. This is in contrast to the MicroCal device, which detects heat changes through voltage compensation and exotherm is depicted with negative values. We will further emphasize this in related figure legends.

    1. Author Response

      Reviewer 1 (Public Review)

      Summary: The authors have made a novel and important effort to distinguish and include different sources of active deformations for fitting C elegans embryo development: cyclic muscle contrac- tions and actomyosion circumferential stresses. The combination and synchronisation of both contributions are, according to the model, responsible for different elongation rates, and can in- duce bending and torsion deformations, which are a priori not expected from purely contractile forces. The model can be applied to other growth processes in initially cylindrical shapes.

      Strengths: The model allows us to fit and deduce specific growth patterns, frequencies, and lo- cations of contractions that yield the observed axial elongation during the 240 min of the studied process.

      The deformation gradient is decomposed according to muscle and actomyosin activity, which can be distinguished and quantified. An energy-transferring process allows for the retrieval of the nec- essary permanent deformations that embryo development requires.

      Weaknesses: Despite the completeness of the model, the explanation of the methodology needs to be improved. Parameters and quantities are not always explained in the main text and are intro- duced on some occasions in an ordered manner. This makes the comprehension and deduction of methodology difficult. There are some minor comments that are listed below. The most important points are:

      How are the authors sure that there is a torsional deformation? Without tracking the muscle fibers, bending with respect to different angles for different Zs may yield a shape similar to the one in Figure 6E. Furthermore, it is unclear why the model yields torsion deformation. If material points of actomyosin rings do not change in reference configuration, no helicoidal growth should be happening.

      Our torsional deformations were obtained computationally, and the results are plotted in Figure 6 according to our formalism. In our approach, the torsional deformation results from the interaction between the vertical muscles and the circumferential actin network: the muscles bend the cylinder and the bending modifies the direction of the actin fibers, as demonstrated in the experiment.

      -The triple decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺𝑖 ⋅ 𝐺0 seems to complicate the expressions of growth and requires the use of angles alpha and beta due to the initial deformation 𝐺0. Why not use a simpler decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺, where 𝐺 contains all contributions from actomyosin and muscle contrac- tions in a material frame? This would avoid considering angles alpha and beta.

      𝐺0 represents the active strain during the early elongation stage and 𝐺𝑖 during the late elongation stage respectively. Such a decomposition which is not mandatory, allows a better un- derstanding. In addition, due to the late elongation stage, both muscle and actin networks must be considered, and their orientation changes with deformation. Therefore, it is clearer and simpler to express the active strain in terms of alpha and beta angles.

      The section "Energy transformation and Elongation" is unclear. Indeed, stresses need to relax, oth- erwise, the removal of muscle and actin activity would send the embryo back to its initial state. How- ever, the rationale behind the energy transfer is not explained. Authors seem to impose 𝑊𝑐 = 𝑊𝑟, and from this deduce the necessary actin contraction after muscle relaxation. Why should energy be maintained when muscle relaxes? Which mechanism physically imposes this energy transfer? Muscle contraction could indeed induce elongation if traction forces at the opposite side of the contracting muscle relax. In fact, an alternative approach for obtaining stress relaxation and axial elongation would be converting part of the elastic deformation 𝐹𝑒 to a permanent deformation 𝐹𝑝.

      In this section, we do assume that all the energy accumulated by the muscle contrac- tions will be converted into the energy necessary for elongation, and as our estimate in the article shows, 𝑊𝑐 is indeed greater than 𝑊𝑟, indicating that a significant fraction of 𝑊𝑐 is converted into dissipation and friction, but also into the reorganization of the actin cables. Indeed, elongation of the cylinder induces a significant reduction in the experimentally observed and also in the actin cable density. However, this reduction in cable density is not observed experimentally. Thus, elon- gation requires a reorganization of the actin network, which is part of the energy consumption and which explains the existence of a permanent deformation 𝐹𝑝.

      Self contact is ignored. This may well be a shape generator and responsible for bending deforma- tions. The convoluted shape of the embryo in the confined space deserves at least commenting on this limitation of the model.

      Thank you for your suggestion. We have considered the effect of contact between C. elegans and the eggshell in the energy dissipation section but we also agree that the self-contact of the worm in confinement will be important. Here, we focus mainly on active filaments: actomyosin and muscle, and we restrict ourselves to a cylindrical shell that is far from the embryo.

      Reviewer 2 (Public Review)

      Summary

      During C. elegans development, embryos undergo elongation of their body axis in the absence of cell proliferation or growth. This process relies in an essential way on periodic contractions of two pairs of muscles that extend along the embryo’s main axis. How contraction can lead to extension along the same direction is unknown.

      To address this question, the authors use a continuum description of a multicomponent elastic solid. The various components are the interior of the animal, the muscles, and the epidermis. The different components form separate compartments and are described as hyperelastic solids with different shear moduli. For simplicity, a cylindrical geometry is adopted. The authors consider first the early elongation phase, which is driven by contraction of the epidermis, and then late elongation, where contraction of the muscles injects elastic energy into the system, which is then released by elongation. The authors get elongation that can be successfully fitted to the elongation dynamics of wild-type worms and two mutant strains.

      Strengths

      The work proposes a physical mechanism underlying a puzzling biological phenomenon. The framework developed by the authors could be used to explain phenomena in other organisms and could be exploited in the design of soft robots.

      Weaknesses

      1) This reviewer considers that the quality of the writing is poor. Because of this the main result of this work, how elongation is achieved by contraction, remains unclear to me. In the opinion of this reviewer, the work is not accessible to a biologist. This is a real pity because the findings are potentially of great interest to developmental biologists and engineers alike.

      We regret that, despite a general introduction and a number of figures, the work does not seem accessible to biologists.

      2) The authors assume that the embryo is elastic throughout all stages of development. Is this assumption appropriate? In my opinion, the authors need to critically discuss this assumption and provide justification. Would this still be true for the adult? If so could the adult relax back to the state prior to elongation? The embryo should be able to do that, if the contractility of the epidermis were sufficiently reduced, right?

      Soft tissues are elastic, the modeling of soft tissues, even with large deformations, is now well established. The difference between a worm embryo and an adult is first of all the quality of the tissues, their low degree of heterogeneity, the weakness of the muscles and the absence of bones. As for the question of complete relaxation of the stresses, the fact that different components are attached to each other limits complete relaxation. We keep our fingerprints and cortical undula- tions, although they originate from an elastic instability that occurs in fetal life. It never disappears.

      The authors impose strains rather than stress. Since they want to understand the final deformation, I find this surprising. Maybe imposing strain or stress is equivalent, but then you should discuss this.

      Perhaps, the referee has in mind the question of active strain versus active stress and is concerned about the representation of biological forces such as those produced by actomyosin or muscle. In fact, both exist in morphoelasticity and are, of course, related. Usually, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      4) Does your mechanism need 4 muscle strands or would 2 be sufficient?

      First, the 4 muscle strands are consistent with real C. elegans structures, and second, although we assume that two muscles on the same side contract simultaneously, their size and position affect the deformation results. Also, the time period we consider is just before the worm hatches. After that, the worm has to slide on the ground. So efficient muscles are needed.

      5) It is sometimes hard to understand, whether the authors are talking about the model or the worm.

      It will be corrected in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The authors thank the reviewers for their thoughtful and constructive comments. We address each comment below and have uploaded a revised manuscript.

      Public Reviews

      1) One key point that could use further clarification is how to interpret densities in the reconstruction that do overlap with the template. If the omitted regions can be reliably reconstructed, and the density is smooth throughout, it implies the detected particles are not only (mostly) true positives but also their poses must be essentially correct. Therefore, why cannot the entire reconstruction be trusted, including portions overlapping with the template? In the "Future applications" section, the authors state that in order to obtain a reconstruction that is entirely devoid of template bias, it would be necessary to successively omit parts of the template structure through its entirety. I wonder if that is really necessary and if the presented approach of omitting template portions could be better framed as a "gold-standard" validation procedure.

      Our assumption is indeed that the entire reconstruction can be trusted if the omitted features are faithfully reproduced in the reconstruction. We have added a sentence in the discussion to clarify this. However, we think that assessing template bias will still require the omit test (see also our reply below). Also, as discussed in the manuscript, there is likely a little bias left, even if it is not directly visible in the reconstruction. Therefore, if the goal is an entirely unbiased reconstruction, the only way will be to successively omit parts of the template structure throughout the template.

      2) In other words, given the compelling evidence provided by the reconstructions in the omitted areas, I find it hard to imagine how the procedure would be "hallucinating" features in the rest of the structure, as the entire reconstruction depends on the same pose and defocus parameters. A possible experiment to test this hypothesis would be to go the opposite way, deliberately adding an unrealistic feature to the bait and checking whether it comes up in the reconstruction, while at the same time checking how it behaves in omitted parts.

      Template bias might be generated in different ways. A common situation is the presence of noise, which causes biased deviations of the best template match from their “true” match that would just align the target signal to the template. Another type of bias may occur when there is a mismatch between the template and the detected target. The target may still be detected if there is sufficient structural overlap with the template. Since there might not be a clear “correct” alignment of a mismatching target to the template, the best alignment may again be biased, generating artificial density in the reconstruction. This second case may produce bias that is more pronounced in the mismatching regions. The different origins of bias will have to be investigated more thoroughly in another study. For the present study, however, we maintain that unless there is some assessment of bias in a given location, one cannot completely rule out bias based on the absence of it elsewhere in the reconstruction.

      3) When assessing their approach to in situ data (the yeast ribosome), it is intriguing to see that the resolution downgraded from 3.1 to 8 Å when refinement of the particle poses against the current reconstruction was attempted. The authors do provide some possible explanations, such as the reduced signal of the reconstruction at high resolution and the crowded background, but it leaves one to wonder if this means that a 3.1 Å reconstruction could never be obtained from these data by conventional single-particle analysis procedures.

      The refinement results with our in situ data do indeed appear to be limited to low resolution when using the conventional single-particle pipeline and software. It might be possible to improve refinement by introducing certain priors, filters and masking functions that are optimized for the increased background and spectral properties of in situ data. Also, we have not tested all available software, and some might perform better than others. It is worth noting that in a different study using our data, by Cheng et al (2023) and cited in our manuscript, the resolution of the refined reconstruction using different software was ~7 Å resolution, i.e., close to what we report here. Finally, refinement of the detected targets against a high-resolution template does work but since it involved the template, we regard this as part of the template matching process.

      4) Furthermore, in the section "Quantifying template bias", the authors make the intriguing statement that there can still be some overfitting of noise even in true positives. I understand this overfitting would occur in the form of errors in the pose and defocus estimation, but a clarification would be helpful.

      We have added a sentence in the Discussion to clarify where this bias may come from.

      5) In the Discussion, the claim that "it is not necessary to use tomography to generate high-resolution reconstructions of macromolecular complexes in cells" is a misconception, at least in part. As demonstrated in works by the same group and others (https://doi.org/10.1016/j.xinn.2021.100166, https://doi.org/10.1038/s41467-023-36175-y, https://doi.org/10.1038/s41586-023-05831-0), 2D imaging of native cellular environments does offer a faster and better way to obtain high-resolution reconstructions compared to tomography. However, tomography provides the entire 3D context of the macromolecules, such as their localization to membranes and the cellular architecture, which can be readily visualized in a tomogram even at low resolution, so methods for structure determination from tilt series data such as subtomogram averaging remain of paramount importance. Most likely, a combination of 2D and 3D imaging approaches will be necessary to retrieve both the highest structural resolution and their cellular context to address biological questions.

      We agree and have modified our statement accordingly.

      6) The "Materials and Methods" section lacks a description of transmission electron microscopy data collection.

      We are sorry for this oversight and have added these details.

      7) Finally, the preprint version of this work posted on bioRxiv (https://doi.org/10.1101/2023.07.03.547552) contains the following competing interests statement, which is missing from the submitted version: "The authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts."

      This is correct. The statement was missing in the first version of the uploaded manuscript and was added after consultation with the eLife editorial office.

      8) Quantification of the amount of model bias is then performed using omit maps, where every 20th residue is removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positives. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectation of the number of false positives does not always match the correct number of false positives, probably due to inaccuracies in the noise model for more complicated images. This may warrant further in-depth discussion in a revised manuscript.

      We have added further thoughts regarding the mismatch between expected and actual number of false positives in the Discussion section. A full understanding of the issue likely requires further study, which is currently underway.

      9) The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions, and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field. However, its main point is to promote high-resolution 2D template matching (2DTM) as a more universal analysis method for in vitro and, more importantly, in situ data. While the experiments performed to that end are sound and well-executed in principle, I fail to make that specific conclusion from their results.

      We do not see 2DTM as a more universal analysis method for in vitro and in situ data, but as simply as another method that can be used. We have added a sentence in the introduction to clarify this.

      10) The authors correctly point out that overfitting is largely enabled by the presence of false-positives in the data set. They go on to perform their in situ experiments with ribosomes, which provide an extremely favorable amount of signal that is unrealistic for the vast majority of the proteome. This seems cherry-picked to keep the number of false-positives and false-negatives low. The relationship between overfitting/false-positive rate and the picking threshold will remain the same for smaller proteins (which is a very useful piece of knowledge from this study). However, the false-negative rate will increase a lot compared to ribosomes if the same high picking threshold is maintained. This will limit the applicability of 2DTM, especially for less-abundant proteins.

      The reviewer is correct that the lower SNR of smaller targets poses a fundamental limit to 2DTM. We have stated this in previous studies and have added a sentence in the introduction of the current manuscript to clarify this.

      11) I would like to see an ablation study: Take significantly smaller segments of the ribosome (for which the authors already have particle positions from full-template matching, which are reasonably close to the ground-truth), e.g. 50 kDa, 100 kDa, 200 kDa etc., and calculate the false-negative rate for the same picking threshold. If the resulting number of particles does plummet, it would be very helpful to discuss how that affects the utility of 2DTM for non-ribosomes in situ.

      The suggested ablation study is a good idea and was reported by Rickgauer et al (2020), cited in our manuscript. We added our own analysis for this dataset in Figure 4-figure supplement 1 and show the proportion of LSUs detected as a function of template mass, indicating detection limit of ~300 kDa. We also added a note in the Results section to explain that the threshold we use to limit false positives means that there are also false negatives, with a rate that depends on their molecular mass.

      12) Another point of concern is the dramatic resolution decrease to 8 A after multiple iterations of refinement against experimental reconstructions described in line 159. Was this a local search from the poses provided by 2DTM, or something more global? While this is not a manifestation of overfitting as the authors have conclusively shown, I think it adds an important point to the ongoing "But do we really need tomograms, or can we just 2D everything?" debate in the field, which is also central to the 2D part of 2DTM. Reaching 8 A with 12k ribosome particles would be considered a rather poor subtomogram averaging result these days. Being in the "we need tilt series to be less affected by non-Gaussian noise" camp myself, I wonder if this indicates 2D images are inherently worse for in situ samples. If they are, the same limitations would extend to template matching. In that case, shouldn't the authors advocate for 3DTM instead of 2DTM? It may not be needed for ribosomes, but could give smaller proteins the necessary edge.

      We have extensively discussed the advantages and disadvantages of both tomography and 2DTM (Lucas et al, 2021) and think it is not useful to talk in terms of “better” and “worse”. Instead, each technique has its areas of application, and we maintain that a combination of the two may give the best results. The limitation of 8 Å does not apply to reconstructions aligned against high-resolution templates, as demonstrated in the present study. Regarding noise models, there is also need for these in 3DTM, as explained in recent publications: Maurer et al (2023), bioRxiv, doi.org/10.1101/2023.09.06.556487; Cruz-León et al (2023), bioRxiv, doi.org/10.1101/2023.09.05.556310; Chaillet et al (2023), Int. J. Mol. Sci. 24, 13375.

      13) Right now, this study is also an invitation to practitioners who do not understand the picking threshold used here and cannot relate it to other template-matching programs to do a lot of questionable template matching and claim that the results are true because templates are "unoverfittable". I think such undesirable consequences should be discussed prominently.

      We have added a discussion of this point in the Discussion section.

      Recommendations for the authors

      1) Lines 58-59: What does "nominally untilted" mean? Has the lamella pre-tilt (milling angle) been taken into account or not? If yes, how?

      The lamella milling angle was not taken into account, so there is a tilt built into the sample of about 8° that was not compensated for by a counter-tilt of the microscope goniometer. We have added a note to explain this in the text of the manuscript.

      2) Lines 113-114: A brief explanation of the threshold calculation method from Rickgauer et al, 2017 to achieve an expected false positive rate of one per micrograph would be helpful here.

      We describe the equation for estimating the false discovery rate later in the manuscript. We have added a note in the text to point the reader to the relevant section of the manuscript.

      3) For consistency, it would be interesting to include a plot of the SNR peaks found by 2DTM in the in situ dataset, that could be directly compared to Figure 1 - figure supplement 1B.

      We have added this to Figure 2 - figure supplement 1A-C, to directly compare to Figure 1 – figure supplement 1A-C.

      4) Showing model-map FSC curves between the density retrieved from the omitted areas and their respective models would provide further evidence not only that they are correct but to what extent.

      An FSC calculation would be challenging for small regions, such as side chains and drugs, due to masking artifacts. Moreover, the model was built into an in vitro determined map and was not fit into the in vivo map calculated here. Therefore, deviations between the map and model may reflect differences between the two conditions and may not reflect the agreement of the map to the in vivo structure.

      5) Lines 128-130: The figure references are wrong. Here, Figure 1B should probably be Figure 1A (or 1B), and Figure 1C clearly refers to Supplementary Figure 1F (FSC curve).

      We have corrected the incorrect figure references.

      6) Line 125: Wrong figure reference, Figure 1A here refers to Supplementary Figure 1B (cross-correlation peaks).

      We have corrected the incorrect figure references.

      7) I haven't been able to find mention of code availability in the manuscript. Given that it is a major outcome of the study, I think it should be provided.

      The code is available from the cisTEM repository, github.com/timothygrant80/cisTEM, and an executable version of the program measure_template_bias has been posted for download on the cisTEM webpage, cistem.org. We have added a note in the Methods section to point the readers to these resources.

      8) Line 50: "An additional complication of subtomogram averaging for in situ imaging is the selection of valid targets" - This is not specific to subtomogram averaging, but to in situ samples.

      We agree and have updated the text to reflect this.

      9) Line 77: "if this is true for high-resolution features, which are more susceptible to noise overfitting" - This is not intuitive to me. High-resolution features require more information to be overfitted with a constant set of model parameters, thus making their overfitting harder.

      The reviewer is correct that there is more information at high resolution, partially compensating for the low SNR. However, the overall refinement behavior is still dominated by overfitting at high resolution, as we have demonstrated in an earlier publication in Stewart & Grigorieff (2004), Ultramicroscopy 102, 67–84.

      10) Line 316: "Baited reconstruction is substantially faster and a more streamlined" - To back this and other similar statements, it would be helpful if the authors provided some time measurements for the execution of their potentially very computationally expensive search.

      The current implementation of 2DTM requires 45 GPU hours per template per K3 image to search 13 defocus planes. However, for a comparison, the manual work for annotation, as well as additional processing to align and classify sub-tomograms to generate high resolution averages should also be considered in this comparison. These are highly project-dependent and can exceed the time required for 3DTM manifold. We have clarified this in our Discussion section.

      11) Line 319: "We expect focused classification to identify sub-populations to further improve the resolution" - How would this work if refining the 2D data without a high-resolution template resulted in significantly worse resolution even for a ribosome? Or is this meant to be done with prior knowledge of every state?

      Classification can be done using existing single particle software. To avoid alignment errors, as described above, particle alignment angles and shifts are fixed during classification. This leaves only the particle occupancy per class to be refined, which appears to lead to good classification. We have added a brief note to explain this strategy. However, since this is not shown in this manuscript, we have not added a more extensive discussion of particle classification.

      12) Line 354: "without requiring manual intervention or expert knowledge" - Previous expert knowledge was arguably provided in the form of a high-resolution structure.

      We agree with the reviewer and have clarified our statement.

    1. Author Response

      Reviewer #1:

      We thank Reviewer #1 for their review of our manuscript.

      Reviewer #1, comment #1: “The authors of this manuscript are from the Canadian, public interest open-science company YCharos.”.

      It is important to state that none of the authors work for YCharOS. The YCharOS company has created an open ecosystem consisting of antibody manufacturers, knockout cell lines providers, academics, granting agencies and publishers. The Antibody Characterization Group (participating authors are affiliated to the Department of Neurology and Neurosurgery, Structural Genomics Consortium, The Montreal Neurological Institute, McGill University) works in collaboration with YCharOS to have access to commercial antibodies and knockout cell lines donated by YCharOS’ manufacturer partners.

      Reviewer #1, comment #2: In regard to ZENODO antibody characterization reports prepared by this group, Reviewer #1 wrote: “While the results are convincing, they could be more accessible. In the current format, researchers have to download reports for each target and look through all images to identify the most useful antibodies from the images. The reports I reviewed did not draw conclusions on performance. A searchable database that returns validated antibodies for each application seems necessary.”

      After careful consideration and consultation with YCharOS industry partners, we decided not to rate the performance of the antibodies tested. It was determined that antibody selection is best left to the user, who should analyze all parameters, including the type of antibody to be chosen (recombinant-monoclonal, recombinant-polyclonal, monoclonal), the species used to generate the antibody, the species predicted to react with the antibody, performance in a specific application, antigen sequences, and antibody cost.

      Reviewer #1, comment #3: “A key question is to what extent off-target binding was predictable from the WBs provided by the manufacturers. Thus, how often did the authors find multiple bands when the catalogue image showed a single band and vice versa?”

      In many cases, the antibodies were tested on cell lines other than those used by the manufacturers. Given that protein expression is specific to each line, we can't answer this question properly.

      Reviewer #1, comment #4: “Cross-reactive proteins will generally not be detected when blots are stained with an antibody reactive with a different epitope than the one used for IP. Possible solutions to overcome this limitation such as the use of mass spectrometry as readout should be discussed (Nature Methods volume 12, pages 725- 731 (2015)”.

      Our protocols only inform whether an antibody can capture the intended target, without any evaluation of the extend to the capture of unwanted, cross-reactive proteins. Thus, our data can only be used to aid in selection of the best performing antibodies for IP – our data does not inform profiling of non-specific interactions.

      IP/mass spec is an excellent approach for evaluating antibody performance for IP, and authors on this manuscript are experts in proteomics and recognize the importance of this methodology. We have considered implementing IP/mass in our platform. However, there are limitations, such as the cost of the approach and the difficulty of detecting smaller proteins or proteins with a certain amino acid composition (high presence of Cys, Arg or Lys). Fundamentally, we have decided to focus on throughput relative to details in this regard.

      Reviewer #1, comment #5: “Performance in immunofluorescence microscopy was performed on cells that were fixed in 4% paraformaldehyde and then permeabilized with 0.1% Triton-X100. It seems reasonable to assume that this treatment mainly yields folded proteins wherein some epitopes are masked due to cross-linking. The expectation is therefore that results from IP are more predictive for on-target binding in IF than are WB results (Nature Methods volume 12, pages725-731 (2015). It is therefore surprising that IP and WB were found to have similar predictive value for performance in IF (supplemental Fig. 3). It would be useful to know if failure in IF was defined as lack of signal, lack of specificity (i.e. off-target binding) or both. Again, it is important to note the IP/western protocol used here does not test for specificity.”

      The assessment of antibody performance is biased by how antibodies were originally tested by suppliers. Manufacturers primarily validate their antibody by WB. Thus, most antibodies immunodetect their intended target for WB. Thus, in retrospect, we tested a biased pool of antibodies that detect linear epitopes. Still, we observed that a large cohort of antibodies show specificity for their target across all three applications or for specific combinations of applications. This slightly challenges the idea that antibodies are fit-for-purpose reagents and can recognize either linear or native epitopes - a significant number of antibodies can specifically detect both types of epitope.

      Reviewer #1, comment #6: “The authors report that recombinant antibodies perform better than standard monoclonals/mAbs or polyclonal antibodies. Again, a key question is to what extent this was predictable from the validation data provided by the manufacturers. It seems possible that the recombinant antibodies submitted by the manufacturers had undergone more extensive validation than standard mAbs and polyclonals”.

      Our antibody manufacturing partners indicated that the recombinant antibodies are more recent products and have been more extensively characterized relative to standard polyclonal or monoclonal antibodies.

      The main message is that recombinant antibodies can be used in all applications once validated. Although recombinant antibodies are available for many proteins, the scientific community is not adopting these renewable regents as we believe it should. We hope that the data provided will encourage scientists to adopt recombinant technologies when available to improve research reproducibility.

      Reviewer #1, comment #7: “Overall, the manuscript describes a landmark effort for systematic validation of research antibodies. The results are of great importance for the very large number of researchers who use antibodies in their research. The main limitations are the high cost and low throughput. While thorough testing of 614 antibodies is impressive and important, the feasibility of testing hundreds of thousands of antibodies on the market should be discussed in more detail.”

      We thank the reviewer for this comment. One of our challenges is to increase the platform's throughput to succeed in our mission to characterize antibodies for all human gene products. We will continue to test antibodies using protocols agreed upon with our partners, commonly used in the laboratory, to ensure that ZENODO reports can serve as a guide to the wider community.

      In terms of development our marketing efforts have been substantially accelerated by our new partnership with the journal F1000. We have begun to convert our reports into peer-reviewed papers (20 ZENODO reports were converted into F1000 articles). This conversion allows researchers to find our work via PubMed, and easily cite any study. Producing peer-reviewed articles also further enhances the credibility of our research and our project as a whole: https://f1000research.com/ycharos

      Colleagues have published a letter to Nature explaining the problem and our technology platform: (Kahn, et al., Nature, 2023, DOI: https://doi.org/10.1038/d41586-023-02566-w).

      This project has been presented worldwide, with a presence at major antibody conferences, such as the annual Antibody Validation meeting in Bath (PSM attended the meeting in September 2023). The authors are organizing a sponsored mini-symposium on antibody validation at the next American Society for Cell Biology (ASCB) meeting in December 2023 (Boston, USA): https://plan.core- apps.com/ascbembo2023/event/6fb928f06b0d672e088c6fa88e4d77fb

      Colleagues have prepared petitions addressed to various governmental organizations (US, Canada, UK) to support characterization and validation of renewable antibodies: https://www.thesgc.org/news/support- characterization-and-validation-renewable-antibodies.

      Reviewer #2

      We thank Reviewer #2 for the review of the antibody characterization reports we have uploaded to ZENODO. A manuscript describing the full standard operating procedures of the platform, which has been used in all reports is in preparation, and should be available on a preprint server before the end of the year. Our protocols were reviewed and approved by each of YCharOS' manufacturer partners. Moreover, a recent editorial describes the platform used here and gives advice on how to interpret the data: https://doi.org/10.12688/f1000research.141719.1)

      Reviewer #2, comment #1: “A discussion of how the working concentrations of antibodies are selected and validated is required. Based on the dilutions described in the reports, it seems that dilutions suggested by the manufacturer were used - For LRRK2 it seems that antibody concentrations ranging from 0.06 to over 5 µg/ml for WB were used. Often commercial antibody comes in a BSA-containing buffer making it hard to validate the concentration of the antibody claimed by the manufacturer”.

      The concentration recommended by the manufacturer is our starting point. For WB, when the signal is at the level of detectability, we will repeat with a ~5-10 fold increase in antibody concentration. For >80% of the antibody tested, the use of the recommended concentration led to the detection of bands (specific or not to the target protein).

      Reviewer #2, comment #2: “In the authors' experience are the manufacturer's concentrations reliable? Additionally, if the information regarding applications provided by the manufacturers is unreliable how do the authors suggest working concentrations for antibodies to be assessed”?

      We do not evaluate the concentration of antibodies internally. In the immunoprecipitation experiments, we use 2.0 µg of antibody for each IP, based on the concentration provided by the manufacturers. On Ponceau staining of membranes, we can observe the heavy and light chains of the primary antibodies used, giving an indication of the amount of antibodies added to the cell lysate. In most cases, the intensity of the heavy and light chains is comparable.

      Reviewer #2, comment #3: “We understand that it would not be feasible to test every antibody at different concentrations, but this is an issue that should at least be mentioned. An antibody might be put in the wrong performance category solely because of the wrong concentration being used. Ie if an excellent antibody is used at too high a concentration, it may detect non-specific proteins that are not seen at lower dilutions where the antibody still picks up the desired antigen well”.

      We agree with Reviewer #2, we do not use an optimal concentration for all tested antibodies. As mentioned previously, the concentration recommended by the manufacturer is our starting point. By testing multiple antibodies side-by-side against a single target protein, we can generally identify one or more specific and selective antibodies. We leave it to users of our reports to optimize the antibody concentration to suit their experimental needs.

      Reviewer #2, comment #4: “Do the authors check different WB conditions ie 2h primary antibody with BSA or milk vs. overnight at 4 degrees with BSA or Milk”?

      All primary antibodies are always tested in milk overnight at 4 degrees. The overnight incubation is convenient in the timeline of the protocol. All protocols were agreed upon after careful consultation with our partners.

      Reviewer #2, comment #5: “Do the authors provide detailed WB protocols that include the description of the electrophoresis and type of gels used, transfer buffer and transfer method and time used, and conditions for all the primary and secondary blotting including times, buffers and dilutions of all antibodies and other reagents”?

      This information is included in all ZENODO reports.

      Reviewer #2, comment #6: “Do the authors discuss detection approaches- we have noticed for some antibodies there are significant different results using LICOR, ECL and other detection methods, with certain especially weaker antibodies preferring ECL-based methods”.

      We only use ECL-based methods.

      Reviewer #2, comment #7: “For IPs the amount of antibody needed can also vary-for some we can use 1 microgram or less, but for others, we need 5 to 10 micrograms. The amount of antibody needed to get maximal IP should be stated”.

      We use 2.0 ug of antibodies and we have found this to be adequate for lower abundance proteins (e.g. Parkin - https://zenodo.org/records/5747356) and higher abundance proteins (e.g. PRDX6 - https://zenodo.org/records/4730953). Abundance is based on PaxDb.com. For Parkin and PRDX6, we were able to enrich the expected target in the IP and observe depletion in the unbound fraction. Optimization of the IP conditions is left to the antibody users.

      Reviewer #2, comment #8: “Doing IPs with commercial antibodies can be very expensive or infeasible if many micrograms are needed especially if only packages of 10 micrograms for several hundred dollars are provided”.

      This is a major advantage of the side-by-side comparison: the reader is free to choose between high-performance antibodies from different manufacturers, with varying antibody costs. We also work in partnership with the Developmental Studies Hybridoma Band (DSHB), which supplies antibodies on a cost recovery basis.

      Reviewer #2, comment #9: “For IPs it is important to determine the percentage of antigen that is depleted from the supernatant for each IP. We think that this should be calculated and recorded in the Zenodo data. Some antibodies will only IP 10% of antigen whereas others may do 50% and others 80-90%. One rarely sees 100% depletion. For IPs the buffer detergent and salt concentration might also strongly influence the degree of IP and therefore these should be clearly stated”.

      In Box 1, we define criteria of success. For IP, “under the conditions used, a successful primary antibody immunocaptures the target protein to at least 10% of the starting material”. Colleagues have written an editorial on how to interpret and analyze antibody performance https://f1000research.com/articles/12-1344).

      The cell lysis buffer is a critical reagent when considering IP experiments. We use a commercial buffer consisting of 25 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1% NP-40 and 5% glycerol (Thermo Fisher, cat. #87787). This buffer is efficient to extract the target proteins we have studied thus far.

      Reviewer #2, comment #10: “Whether antibodies cross-react with human, mouse and other species of antigens is always a major question. It is always good to test human and mouse cell lines if possible. If antibodies cross-react in WB, in the authors' experience will they also cross-react for IF and IP”?

      The authors started this initiative by focusing on the 20,000 human proteins, defining an end point. We and our collaborators found that most of the cherry-picked selective antibodies for WB for human proteins, which manufacturers claim react with the murine version of the target proteins, were selective for murine tissue lysates.

      Indeed, poorly performing antibodies in WB mostly failed IF and IP. However, selective antibodies for IF or specific for IP were generally (>90%) selective for WB.

      Reviewer #2, comment #11: “Cell lines express proteins at vastly different levels and it is possible that the selected cell line does not express the antigen or expresses it at very low levels - this could be a reason for wrongly assessing an antibody not working. It would be useful to use cell lines in which MS data has defined the copy number of protein per cell and this figure could be included in the antibody data if available. This MS data is available for the vast majority of commonly used cells”.

      We agree with Reviewer #2 that MS data are useful for target protein selection. At the moment, our approach using transcriptomic data provided on DepMap.org proved to be a successful mechanism for cell line selection. We have identified a specific antibody for WB for each target, enabling the validation of expression in the cell line selected.

      For some protein targets, the parental line corresponding to the only commercial or academic knockout line available has weak protein expression. We thus needed to generate a KO clone in a second cell line background with high expression, and indeed found that some antibodies which failed in the first commercial line were successful in the new higher-expressing line (e.g CHCHD10 - https://zenodo.org/records/5259992).

      Reviewer #2, comment #12: “Some proteins are glycosylated, ubiquitylated or degraded rapidly making them hard to see in WB analysis”.

      We used the full gel/membrane length when analyzing antibody performance by WB. Indeed, proteins can show different isoforms and molecular weights compared to that based on amino acid sequence (e.g. SLC19A1 -https://zenodo.org/records/7324605).

      Reviewer #2, comment # 13: “We have occasionally had proteins that appear unstable when heated with SDS- sample buffer before WB. For these, we still use SDS-Sample buffer but omit the heating step. I often wonder how necessary the heating step is”.

      For WB, samples are heated to 65 degrees, then spun to remove any precipitate.

      Reviewer #2, comment # 14: “For IF the methods by which cells are fixed and stained, and the microscope and settings, can significantly influence the final result. It would be important to carefully record all the methods and the microscope used”.

      We agree with Reviewer #2 that many parameters influence antibody performance for imaging purposes. We are progressively implementing the OMERO software to monitor any experimental parameters and information (metadata) about the microscope itself.

      Reviewer #2, comment # 15: “How do the authors recommend antibodies are stored? These should be very stable, but I have had reports from the lab that some antibodies become less good when stored and others that recommend storing at 4 degrees”.

      Antibodies are aliquoted to avoid freeze-thaw cycles and stored at -20 degrees. If it is recommended to store antibodies at 4 degrees, we add glycerol to a final concentration of 50% and store them at -20 degrees.

      Reviewer #2, comment # 16: “Would other researchers not part of the authors' team, be able to add their own data to this database validating or de-validating antibodies? This would rapidly increase the number of antibodies for which useful data would be available for. It would be nice to greatly expand the number of antibodies being used in research and this is not feasible for a single team to undertake”.

      Yes! We believe that only a community effort can resolve the antibody liability crisis. We partner with the Antibody Registry (antibodyregistry.org - led by co-author Anita Bandrowski). In the Registry, each antibody is labelled with a unique identifier, and third-party validation information can be easily tagged to any antibody. Antibody users are invited to upload information about an antibody they have characterized into the Registry.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We were pleased with seeing our work published as a Reviewed Preprint online so swiftly. Now, we would like to take the opportunity to include our responses to the comments made by the reviewers into the Reviewed Preprint and also submit a revised version of the manuscript, in which we have incorporated and addressed the reviewers’ comments.

      We believe that our revisions have significantly improved the quality of the manuscript. Specifically, we have described our results more precisely and explained certain decisions that were made in the analysis pipeline more clearly. For example, Figure 4 was improved substantially, by incorporating a schematic representation of how ERP traces were extracted from neural data. Furthermore, we have added three paragraphs in the Discussion where we elaborate on 1) the two observed interaction effects between attention and drug condition, 2) the relation between behavioral, computational, and neural effects, and 3) the statistical robustness of our findings. As such, we believe our interpretation of the results and their robustness now more faithfully represents our observations.

      Moreover, we have incorporated the Supplementary Information and Figures, initially presented as a separate section of the manuscript, to the main manuscript and its accompanying supplementary figures. Thereby, the structure of the paper now better follows the eLife format. As a result, some of the previously included supplementary figures are now described in text of the main manuscript.

      Reviewer #1 comments:

      In the results section on page 6, the authors conclude that "Attention and ATX both enhanced the rate of evidence accumulation towards a decision threshold, whereas cholinergic effects were negligible." I believe "negligible" is wrong here: the corresponding effects of donepezil had p-values of .09 (effect of donepezil on drift rate), .07 (effect of donepezil on the cue validity effect on drift rate) and .09 (effect of donepezil on non-decision time), and were all in the same direction as the effects of atomoxetine, and would presumably have been significant with a somewhat larger sample size. I would say the effects of donepezil were "in the same direction but less robust" (or at the very least "less robust") instead of "negligible".

      We agree with the reviewer that ‘negligible’ may not properly capture the effects of DNP on DDM parameter estimates. Although we do feel that caution is warranted in interpreting the effects of DNP on computational parameter estimates, we have now described these effects in line with the reviewer’s suggestion: in the same direction as the effects of ATX, but not (or less) statistically robust.

      "In the results section on page 8, the authors conclude that "Summarizing, we show that drug condition and cue validity both affect the CPP, but they do so by affecting different features of this component (i.e. peak amplitude and slope, respectively)." This conclusion is a bit problematic for two reasons. First, drug condition had a significant effect not only on peak amplitude but also on slope. Second, cue validity had a significant effect not only on slope but also on peak amplitude. It may well be that some effects were more significant than others, but I think this does not warrant the authors' conclusion.

      Indeed, we observed that cue validity affected both CPP peak amplitude and slope and some effects were more significant than others. As such, we agree with the reviewer that the conclusion that cue validity and drug condition affect different features of the CPP was too strongly formulated. We have changed this statement in the manuscript to reflect the observed data pattern more appropriately. We would however like to point out that this does not undermine our main conclusion. Spatial attention and drug condition showed only limited interaction effects in terms of behavior and neural data and their effects on occipital activity were separable in terms of timing and spatial profile. Therefore, our conclusion that catecholamines and spatial attention jointly shape perceptual decision-making remains valid.

      In the discussion section on page 11, the authors conclude that "First, although both attention and catecholaminergic enhancement affected centro-parietal decision signals in the EEG related to evidence accumulation (O'Connell et al., 2012; Twomey et al., 2015), attention mainly affected the build-up rate (slope) whereas ATX increased the amplitude of the CPP component (Figure 3D-F)." As I wrote above, I believe it is not correct that "attention mainly affected the build-up rate or slope", given that the effect of cue-validity on CPP slope was also significant. Also, while the authors' data do support the conclusion that ATX increased the amplitude and not the slope of the CPP component, a previous study in humans found the opposite: ATX increased the slope but did not affect the peak amplitude of the CPP (Loughnane et al 2019, JoCN, https://pubmed.ncbi.nlm.nih.gov/30883291). Although the authors cite this study (as from 2018 instead of 2019), they do not draw attention to this important discrepancy between the two studies. I encourage the authors to dedicate some discussion to these conflicting findings.

      We thank the reviewer for spotting this error, we cited the preprint version (from 2018) of Loughnane and colleagues and not the published JoCN paper (from 2019). We have changed this in the updated version of the manuscript. We further thank the reviewer for asking about this interesting discrepancy between our observation that ATX increased CPP peak amplitude in absence of slope effects and the observation by Loughnane et al. (2019, JoCN) that ATX increased CPP slope, but not amplitude. We first would like to point out that the peak amplitude effect in Loughnane et al. (2019) was in the same direction as our reported effect, with numerically higher peak amplitudes for ATX compared to PLC (Figure 2A – right panel in Loughnane et al., 2019). However, as their omnibus main effect of drug condition on CPP peak amplitude was not significant, they did not provide statistics for a pairwise comparison of ATX and PLC in terms of CPP peak amplitude, which makes it hard to compare the effects directly. Regardless, Loughnane et al. (2019) did observe an effect on CPP slope, whereas we did not. Speculatively, this difference could be related to the behavioral tasks that were used in both studies. Below we have added a new paragraph from the Discussion in which we elaborate on this more.

      In Discussion, page 15:

      Here, we demonstrated that response accuracy and response speed are differentially represented in the CPP, with correct vs. erroneous responses resulting in a higher slope and peak amplitude, whereas fast vs. slow responses are only associated with increased slopes (Figure 3A-B). Speculatively, the specific effect of any (pharmacological) manipulation on the CPP may depend on task-setting. For example, Loughnane et al. (2019) used a visual task on which participants did not make many errors (hit rate>98%, no false alarms), whereas we applied a task in which participants regularly made errors (roughly 25% of all trials). Possibly, the effects of ATX from Loughnane et al. (2019) in terms of behavior (RT effect, not accuracy/d’) and CPP feature (slope effect, not peak) may therefore have been different from the effects of ATX we observed on behavior (d’ effect, not RT) and CPP feature (peak effect, not slope). Regardless, when we compared subjects with high and low drift rates (Figure 3C), we observed that both CPP slope and CPP peak were increased for the high vs. low drift group (independent of the drug or attentional manipulation). This indicates that both CPP slope and CPP peak were associated with drift rate from the DDM. Clearly, more work is needed to fully understand how evidence accumulation unfolds in neural systems, which could consequently inform future behavioral models of evidence accumulation as well.

      On page 12 and page 14 the authors suggest a selective effect of ATX on tonic catecholamine activity, but to my knowledge the exact effects of ATX on phasic vs. tonic catecholamine activity are unknown. Although microdialysis studies have shown that a single dose of atomoxetine increases catecholamine concentrations in rodents, it is unknown whether this reflects an increase in tonic and/or phasic activity, due to the limited temporal resolution of microanalysis. Thus, atomoxetine may affect tonic and/or phasic catecholamine activity, and which of these two effects dominates is still unknown, I think.

      We agree with the reviewer that the direct effects of ATX on tonic versus phasic catecholaminergic activity are not clear as initially stated in the manuscript. Equally problematic, previous work has demonstrated that changes in tonic neuromodulation shape evoked neuromodulatory discharge (Aston-Jones & Cohen, 2005, Annu. Rev. Neurosci; Knapen et al., 2016, PLoS ONE). As such, any effect of ATX on tonic neuromodulatory drive would probably have affected phasic catecholaminergic responses as well, although this claim will have to be experimentally addressed. We think that because of the close relation between tonic and phasic neuromodulation, it may indeed be better to refrain from the simplistic interpretation that ATX (and DNP) solely and specifically affects tonic neuromodulation. We have used more neutral language in that regard in the updated version of the manuscript, for example by only mentioning elevated neuromodulator levels (not specifying tonic or phasic). Moreover, we have extended a part of our previous Discussion, to elaborate this issue in more detail. An excerpt of this paragraph, consisting of previous and newly added text, can be seen below.

      In Discussion, page 14:

      In contrast with recent work associating catecholaminergic and cholinergic activity with attention by virtue of modulating prestimulus alpha-power shifts (Bauer et al., 2012; Dahl et al., 2020, 2022) and attentional cue-locked gamma-power (Bauer et al., 2012; Howe et al., 2017), the current work shows that the effects of neuromodulator activity are relatively global and non-specific, whereas the effects of spatial attention are more specific to certain locations in space. Our findings are, however, not necessarily at odds with these previous studies. Most recent work associates phasic (event-related) arousal with selective attention (for reviews see: Dahl et al., 2022; Thiele & Bellgrove, 2018). For example, cue detection in visual tasks is known to be related to cholinergic transients occurring after cue onset (Howe et al., 2017; Parikh et al., 2007). Contrarily, in our work we aimed to investigate the effects of increased baseline levels of neuromodulation by suppressing the reuptake of catecholamines and the breakdown of acetylcholine throughout cortex and subcortical structures. Tonic and phasic neuromodulation have previously been shown to differentially modulate behavior and neural activity (de Gee et al., 2014, 2020, 2021; McGinley et al., 2015; McGinley, Vinck, et al., 2015; van Kempen et al., 2019). Note, however, that it is difficult to investigate causal effects of tonic neuromodulation in isolation of changes in phasic neuromodulation, mostly because phasic and tonic activity are thought to be anti-correlated, with lower phasic responses following high baseline activity and vice versa (Aston- Jones & Cohen, 2005; de Gee et al., 2020; Knapen et al., 2016). As such, pharmacologically elevating tonic neuromodulator levels may have resulted in changes in phasic neuromodulatory responses as well. Concurrent and systematic modulations of tonic (e.g. with pharmacology) and phasic (e.g. with accessory stimuli; Bruel et al., 2022; Tona et al., 2016) neuromodulator activity may be necessary to disentangle the respective and interactive effects of tonic and phasic neuromodulator activity on human perceptual decision-making.

      Reviewer #2 comments:

      The main weakness of the paper lies in the strength of evidence provided, and how the results tally with each other. To begin with, there are a lot of significance tests performed here, increasing the chances of false positives. Multiple comparison testing is only performed across time in the EEG results, and not across post-hoc comparisons throughout the paper. In and of itself, it does not invalidate any result per se, but it does colour the interpretation of any results of weak significance, of which there are quite a few. For example, the effect of Drug on d' and subsequent post-hoc comparisons, also effect of ATX on CPP amplitude and others.

      We agree with the reviewer that the statistical evidence for some of the results presented in this study is limited. This issue mostly concerns the effects of the pharmacological manipulation (effects of attention were strong and robust), which is unfortunately often the case given the high inter-individual variability in responses to pharmaceutical agents. We have added a paragraph to the Discussion in which we discuss this limitation of the current study. Furthermore, we discuss our findings in the context of previous work, thereby showing that - although not always robust- most of the reported drug effects were in the direction that could be expected based on previous literature. We have pasted that paragraph below.

      In Discussion, pages 16:

      Although the effects of the attentional manipulation were generally strong and robust, the statistical reliability of the effects of the pharmacological manipulation was more modest for some comparisons. This may partly be explained by high inter-individual variability in responses to pharmaceutical agents. For example, initial levels of catecholamines may modulate the effect of catecholaminergic stimulants on task performance, as task performance is supposed to be optimal at intermediate levels of catecholaminergic neuromodulation (Cools & D’Esposito, 2011). While acknowledging this, we would like to highlight that many of the observed effects of ATX were in the expected direction and in line with previous work. First, pharmacologically enhancing catecholaminergic levels have previously been shown to increase perceptual sensitivity (d’) (Gelbard-Sagiv et al., 2018), a finding that we have replicated here. Second, methylphenidate (MPH), a pharmaceutical agent that elevates catecholaminergic levels as well, has been shown to increase drift rate as derived from drift diffusion modeling on visual tasks (Beste et al., 2018) in line with our ATX observations. Third, a previous study using ATX to elevate catecholaminergic levels observed that ATX increased CPP slope (Loughnane et al., 2019). Although in our case ATX increased the CPP peak and not its slope, this provide causal evidence that centro-parietal ERP signals related to sensory evidence accumulation are modulated by the catecholaminergic system (Nieuwenhuis et al., 2005). Fourth, we observed that elevated levels of catecholamines affected stimulus driven occipital activity relatively late in time and close to the behavioral response, which resonates with previous observations (Gelbard-Sagiv et al., 2018). Finally, ATX had robust effects on physiological responses (heart rate, blood pressure, pupil size), cue-locked ERP signals and oscillatory power dynamics in the alpha-band, leading up to stimulus presentation. We concur, however, that more work is needed to firmly establish how (various forms of) attention and catecholaminergic neuromodulation affect perceptual decision-making.

      The lack of an overall RT effect of Drug leaves any DDM result a little underwhelming. How do these results tally? One potential avenue for lack of RT effect in ATX condition is increased drift rate but also increased non-decision time, working against each other. However, it may be difficult to validate these results theoretically.

      As the reviewer remarks, an increase in performance/d’ in absence of any RT effects can be algorithmically explained by a combination of increased drift rate and prolonged non-decision time. This is indeed what we observed for ATX. Non-decision time is generally thought to reflect the time necessary for stimulus encoding and motor execution and as such is seen as separate from the evidence-accumulation decision process. We deem it possible that ATX simultaneously prolonged stimulus encoding/motor execution (reflected in changes in non-decision time) and fastened evidence accumulation (reflected in changes in drift rate). Although our neural data did not provide evidence for this claim, previous work has demonstrated that increased baseline (pupil-linked) arousal/neuromodulation is associated with a decreased build-up rate of a neural signal associated with motor execution (β-power over motor cortex, Van Kempen et al., 2019, eLife), potentially linking increased non-decision time under ATX to slowing down of motor execution processes. The same authors also report relationships between baseline (pupil-linked) arousal/neuromodulation and activity over occipital and centroparietal cortices, respectively associated with sensory processing and sensory evidence accumulation, suggesting that baseline neuromodulation may affect all stages leading up to a decision (sensory processing, evidence accumulation and motor execution). Note also that the attentional manipulation seems to simultaneously increase drift rate and shorten non-decision time in our case, as one would expect (Figure 2E, Figure 2 – Supplements 4&5).

      There is an interaction between ATX and Cue in terms of drift rate, this goes against the main thesis of the paper of distinct and non-interacting contributions of neuromodulators and attention. This finding is then ignored. There is also a greater EDAN later for ATX compared to PLA later in the results, which would also indicate interaction of neuromodulators and attention but this is also somewhat ignored.

      There are indeed some interesting interaction effects between ATX and spatial attention (cue), as pointed out by the reviewer. However, we did also observe striking differences in the effects of ATX and attention on stimulus-locked occipital activity (in timing and spatial specificity) as well as independent (main) effects on CPP amplitude and pre-stimulus alpha power. Therefore, throughout the paper we tried to carefully describe the effects of attention and ATX as largely independently and jointly modulating perceptual decision-making, while at the same time highlighting the interaction effects that we observed, where present. We have highlighted the effects the reviewer refers to even more explicitly in a separate paragraph that we added to the discussion, pasted below.

      In Discussion, page 13-14:

      We did observe two striking interaction effects between the catecholaminergic system and spatial attention. First, effects of attention on drift rate were increased under catecholaminergic enhancement (Figure 2D). Although this interaction effect was not reflected in CPP slope/peak amplitude, this does suggest that catecholamines and spatial attention might together shape sensory evidence accumulation in a non-linear manner. Second, the amplitude of the cue-locked early lateralized ERP component (resembling the EDAN) was increased under ATX as compared to PLC. The underlying neural processes driving the EDAN ERP, as well as its associated functions, have been a topic of debate. Some have argued that the EDAN reflects early attentional orienting (Praamstra & Kourtis, 2010) but others have claimed it is mere a visually evoked response and reflects visual processing of the cue (Velzen & Eimer, 2003). Thus, whether this effect reflects a modulation of ATX on early attentional processes or rather a modulation of early visual responses to sensory input in general is a matter for future experimentation.

      The CPP results are somewhat unclear. Although there is an effect of ATX on drift rate algorithmically, there is no effect of ATX on CPP slope. On the other hand, even though there is no effect of DNP on drift rate, there is an effect of DNP on CPP slope. Perhaps one may say that the effect of DNP on drift rate trended towards significance, but overall the combination of effects here is a little unconvincing. In addition, there is an effect of ATX on CPP amplitude, but how does this tally with behaviour? Would you expect greater CPP amplitude to lead to faster or slower RTs? The authors do recognise this discrepancy in the Discussion, but discount it by saying the relationship between algorithmic and CPP parameters in terms of DDM is unclear, which undermines the reasoning behind the CPP analyses (and especially the one correlating CPP slope with DDM drift rate).

      We thank the reviewer for pointing out this dissociation of drug effects in terms of the algorithmic (DDM) and neural (CPP) ‘implementations’ of the evidence accumulating process underlying perceptual decisions. We have added a new paragraph to the discussion where we interpret the effects of ATX on the neural and algorithmic levels of evidence accumulation. Below we have pasted that paragraph:

      In Discussion, page 14-15:

      We reported attentional and neuromodulatory effects on algorithmic (DDM, Figure 2) and neural (CPP, Figure 3) markers of sensory evidence accumulation. Recent work has started to investigate the association of these two descriptors of the accumulation process, aiming to uncover whether neural activity over centroparietal regions reflects evidence accumulation, as proposed by computational accumulation-to-threshold models (Kelly & O’Connell, 2015; O’Connell et al., 2018; O’Connell & Kelly, 2021; Twomey et al., 2015). Currently, the CPP is often thought to reflect the decision variable, i.e. the (unsigned) evidence for a decision (Twomey et al., 2015), and consequently its slope should correspond with drift rate, whereas its amplitude at any time should correspond with the so-far accumulated evidence. As -computationally- the decision is reached when evidence crosses a decision bound (the threshold), it may be argued that the peak amplitude of the CPP (roughly) corresponds with the decision boundary. This seems to contradict our observation that 1) ATX modulated drift rate, but not CPP slope and 2) ATX did not modulate boundary separation, but did modulate CPP peak. Note, however, that previous studies using pharmacology or pupil-linked indexes of (catecholaminergic) neuromodulation have also demonstrated effects on both CPP peak (van Kempen et al., 2019) and CPP slope (Loughnane et al., 2019).

      The posterior component effects are problematic. The main issue is the lack of clarification of and justification for the choice of posterior component. The analysis is introduced in the context of the target selection signal the N2pc/N2c, but the component which follows is defined relative to Cue, albeit post-target. Thus this analysis tells us the effect of Cue on early posterior (possibly) visual ERP components, but it is not related to target selection as it is pooled across target/distractor. Even if we ignore this, the results themselves wrt Drug lack context. There is a trending lower amplitude for ATX at later latencies at temporo-parietal electrodes, and more positive for DNP, relative to PLA. Is this what one would expect given behaviour? This is where the issue of correct component identification becomes critical in order to inform any priors on expected ERP results given behaviour.

      We thank the reviewer for raising this issue with the occipital ERP analysis, allowing us to clarify our decisions regarding the analyses and our interpretations of the results. First, the selection of electrodes was based on, and identical to, previous studies investigating lateralized target selection signals in visual tasks containing bilateral visual stimuli (Loughnane et al., 2016; Newman et al., 2017; Papaioannou & Luck, 2020; van Kempen et al., 2019). Second, the ERPs were defined relative to both the direction of the cue as well as the location of the target. As cue direction and target location were not always congruent (cue validity=80%), we could adopt a 2x2 (cue direction x stimulus identity) design for our ERP analyses (we are ignoring drug condition for explanation purposes). For example, for validly cued target trials we extracted two ERP traces: 1) from the hemisphere contralateral to both the cue and the target stimulus (representing processing of cued target stimulus) and 2) from the hemisphere ipsilateral to the cue and the target stimulus (representing processing of non-cued noise stimulus). However, for invalidly cued trials, ERP traces were extracted from 3) the hemisphere contralateral to cue direction and ipsilateral to the target stimulus (reflecting processing of cued noise stimuli) as well as 4) from the hemisphere ipsilateral to cue direction but contralateral to the target stimulus (reflecting processing of non-cued target stimuli). By defining our ERPs as such, we were able to gauge effects of cue direction (reflecting general shifts in attention), stimulus identity (reflecting target vs. noise selection processes) and their interaction (reflecting cue validity) on activity over occipito-temporal activity. Third, we did not pool data (across target/noise stimuli) for statistical analyses, but only for visualization purposes. To clarify how we extracted ERP traces, we have changed Figure 4 substantially. The updated figure now contains a schematic of how these four distinct ERP traces (cue x stimulus identity) were extracted from neural activity. Moreover, for clarity sake, we now show all 12 ERP traces (3x2x2, drug condition x cue direction x stimulus identity) as well as the three main effects that we observed after performing a 3x2x2 repeated measures (rm)ANOVA over time.

      We observed robust (cluster-corrected) effects of cue direction (not validity) on early occipital activity (Fig. 4C – left panel) and of stimulus identity (target/noise) and drug condition on later occipital activity (Fig. 4C – middle and right panel). These results crucially highlight the different temporal (early/late) and spatial (lateralized/not lateralized) profiles of cue, target and drug effects on occipital activity. Moreover, we observed a specific order of drug effects on late occipital activity (DNP>PLC>ATX). The behavioral relevance of this pattern of effects remains elusive. Although the effects of drug condition coincide in time with those of target selection (i.e. when activity contralateral and ipsilateral to the target stimulus was different), the effects of drug were bilateral, meaning that occipito-temporal activity related to the processing of the target (task-relevant) stimulus and non-target (task-irrelevant) stimulus was equally modulated by these pharmaceutical agents. One might argue that these effects show that neither ATX nor DNP modulated the signal-to-noise ratio (SNR), a feature that describes how well relevant stimulus information (signal) can be discerned from irrelevant information (noise). Although it may be tempting to extrapolate this finding to behavior, by suggesting that on the basis of these drug effect neither ATX nor DNP could have modulated d’ (behavioral measure describing how well signal is separated from noise), we would like to point out that our behavioral task specifically concerned a discrimination task about the (orientation of the) target stimulus in which the difference between signal and noise was only relevant for localization purposes and thus has a less direct relation with task performance. As such it is difficult to grasp how the modulation of late occipito-temporal activity by ATX and DNP relates to their behavioral effects. Moreover, the bilateral effect of both ATX and DNP also suggests an absence of interaction effects between drug conditions and visuo-spatial attention, as the effects of ATX/DNP were similar across all cue and target identity conditions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Cook, Watt, and colleagues previously reported that a mouse model of Spinocerebellar ataxia type 6 (SCA6) displayed defects in BDNF and TrkB levels at an early disease stage. Moreover, they have shown that one month of exercise elevated cerebellar BDNF expression and improved ataxia and cerebellar Purkinje cell firing rate deficits. In the current work, they attempt to define the mechanism underlying the pathophysiological changes occurring in SCA6. For this, they carried out RNA sequencing of cerebellar vermis tissue in 12-month-old SCA6 mice, a time when the disease is already at an advanced stage, and identified widespread dysregulation of many genes involved in the endo-lysosomal system. Focusing on BDNF/TrkB expression, localization, and signaling they found that, in 7-8 month-old SCA6 mice early endosomes are enlarged and accumulate BDNF and TrkB in Purkinje cells. Curiously, TrkB appears to be reduced in the recycling endosomes compartment, despite the fact that recycling endosomes are morphologically normal in SCA6. In addition, the authors describe a reduction in the Late endosomes in SCA6 Purkinje cells associated with reduced BDNF levels and a probable deficit in late endosome maturation.

      We would like to thank the reviewers for their careful reading of the paper, their feedback has helped us to add information and experiments to the paper that enhance the clarity of the findings.

      Strengths:

      The article is well written, and the findings are relevant for the neuropathology of different neurodegenerative diseases where dysfunction of early endosomes is observed. The authors have provided a detailed analysis of the endo-lysosomal system in SCA6 mice. They have shown that TrkB recycling to the cell membrane in recycling endosomes is reduced, and the late endosome transport of BDNF for degradation is impaired. The findings will be crucial in understanding underlying pathology. Lastly, the deficits in early endosomes are rescued by chronic administration of 7,8-DHF.

      We thank the reviewers for their positive feedback on this work.

      Weaknesses:

      The specificity of BDNF and TrkB immunostaining requires additional controls, as it has been very difficult to detect immunostaining of BDNF. In addition, in many of the figures, the background or outside of Purkinje cell boundaries also exhibits a positive signal.

      We agree with the reviewers that the performance of the BDNF and TrkB antibodies is an important concern. We have ourselves had difficulties with the performance of many antibodies and the images in this paper are the result of many years of optimization. We have therefore added further detail about the antibody optimization to the methods section of this paper, and have carried out new staining experiments with additional controls. We have added 2 new figure panels in supplementary figures 3 and 4 to demonstrate these tests.

      In the case of anti-BDNF antibodies, we have tested several antibodies and staining protocols and found that in our hands, the only antibody that reliably stained BDNF with a good signal to noise ratio was the one used in this paper (abcam ab108319). Even for this antibody, the staining was greatly enhanced by the use of a heat induced epitope retrieval (HIER) step, which allowed the visualization of BDNF within intracellular structures such as endosomes. When we quantified the intensity of this staining in our previous paper, the results were in agreement with those from a BDNF ELISA used to measure levels of BDNF in the cerebellar vermis of WT and SCA6 mice (Cook et al., 2022), which corroborates these results. As the staining was carried out in tissue sections and not dissociated cells, we also see positive signal from the BDNF staining outside of the Purkinje cells, since BDNF acts on cell-surface receptors and is thus released into the extracellular space around cells (Kuczewski et al., 2008) and is detectable in the extracellular matrix (Lam et al., 2019) and presynaptic terminals around neurons (Camuso et al., 2022; Choo et al., 2017). This is in contrast to studies that image BDNF mRNA with in-situ hybridization, which labels BDNF mRNA predominantly found in cells, and cannot tell us about sub-cellular or extracellular localization of BDNF protein. Together, these factors explain why we observe staining that is not cell- limited, but extends into the space around the cells of interest.

      We have added an additional supplemental figure to demonstrate the importance of using HIER when staining slices with anti-BDNF (Supplementary figure 3). We tested HIER protocols that involved heating the slices to 95°C in a variety of buffers. The buffers tested were sodium citrate buffer (10 mM sodium citrate, 0.05% Tween 20, pH 6), Tris buffer (10mM TBS, 0.05% Tween 20, pH 10), EDTA buffer (1mM EDTA, 0.05% Tween 20, pH 8) and neutral PBS. The PBS produced the best result, enhancing the staining of both anti-BDNF and anti-EEA1 antibodies (Supplementary figure 3). Therefore all slices stained using those antibodies were heated to 95°C in PBS using a heat block or thermocycler for 10 minutes, then allowed to cool before staining proceeded.

      The antibody we use (abcam ab108319) has been used in hundreds of other publications, including Javed et al., 2021 who ectopically expressed BDNF and noted colocalization between the antibody staining and the GFP tag of the BDNF construct, and Lejkowska et al., 2019 who overexpressed BDNF and saw a dramatic increase in antibody staining as well. The colocalization between ectopically expressed BDNF and the antibody in these studies demonstrates the specificity of the antibody.

      However, to further validate antibody specificity we used liver tissue as a negative control. In liver tissue from rodents and humans, the majority of the liver contains negligible levels of BDNF (Koppel et al., 2009; Vivacqua et al., 2014), see also the Human Protein Atlas. The exception is some cholangiocytes: epithelial cells that express BDNF at high levels (Vivacqua et al., 2014). We obtained liver tissue from a WT mouse that was undergoing surgery for an unrelated project and fixed and processed the tissue as we did for brain tissue (outlined in methods section). As we would expect, most of the cells in the liver showed BDNF immunoreactivity that was comparable to background levels (Supplementary figure 3). Interestingly, we were also able to detect sparse highly BDNF-positive cells in the liver, presumed cholangiocytes (Supp. Fig. 3). This pattern of liver BDNF expression is as predicted in the literature, and thus acts as a control for our antibody. We therefore believe that in our hands this antibody is able to stain BDNF with an appropriate degree of specificity.

      We also carried out staining experiments using a second anti-TrkB antibody that we had previously used to detect TrkB via Western bloing. We carried out immunohistochemistry as previously described using tissue sections from a WT mouse. The staining with the two different antibodies was carried out at the same time and all other reagents were kept constant. We found that both antibodies labelled TrkB in a similar pattern of localization, including in the early endosomes of the Purkinje cells (Supplementary figure 4). The second antibody however did have a lower signal to noise ratio and so we believe that the original anti-TrkB antibody used in this manuscript (EMD Millipore ab9872) is optimal for staining cerebellar tissue sections in our hands.

      One important concern about the conclusions is that the RNAseq experiment was conducted in 12-month- old SCA6 mice suggesting that the defects in the endo-lysosomal system may be caused by other pathophysiological events and, likewise, the impairment in BDNF signaling may also be indirect, as also noted by the authors. Indeed, Purkinje cells in SCA6 mice have an impaired ability to degrade other endocytosed cargo beyond BDNF and TrkB, most likely because of trafficking deficits that result in a disruption in the transport of cargo to the lysosomes and lysosomal dysfunction.

      We agree with the reviewers that the defects in the endo-lysosomal system may be caused by other events occurring in the course of disease progression. As mentioned by the reviewers, we have noted this possibility in the text. Detailed investigation into the sequence of events and the root causes of signaling disruption in SCA6 merits future study and we aim to address this in future work. We have expanded this explanation in the text.

      Moreover, the beneficial effects of 7,8-DHF treatment on motor coordination may be caused by 7,8-DHF properties other than the putative agonist role on TrkB. Indeed, many reservations have been raised about using 7,8-DHF as an agonist of TrkB activity. Several studies have now debunked (Todd et al. PlosONE 2014, PMID: 24503862; Boltaev et al. Sci Signal 2017, PMID: 28831019) or at the very least questioned (Lowe D, Science 2017: see Discussion: https://www.science.org/content/blog-post/those-compounds-aren-t- what-you-think-they-are Wang et al. Cell 2022 PMID: 34963057). Another interpretation is that 7,8-DHF possesses antioxidant activity and neuroprotection against cytotoxicity in HT-22 and PC12 cells, both of which do not express TrkB (Chen et al. Neurosci Lett 201, PMID: 21651962; Han et al. Neurochem Int. 2014, PMID: 24220540). Thus, while this flavonoid may have a beneficial effect on the pathophysiology of SCA6, it is most unlikely that mechanistically this occurs through a TrkB agonistic effect considering the potent anti-oxidant and anti-inflammatory roles of flavonoids in neurodegenerative diseases (Jones et al. Trends Pharmacol Sci 2012, PMID: 22980637).

      We thank the reviewers for raising this important point. We have noted in our previous paper (Cook et al., 2022) that 7,8-DHF may not be acting as a TrkB agonist in SCA6 mice, and are in agreement that other explanations are possible. We have now added information to the text of this paper to highlight this possibility. We did show in our previous paper that 7,8-DHF administration activates Akt signaling in the cerebellum of SCA6 mice, a signaling event that is known to take place downstream of TrkB activation. Additionally, 7,8-DHF treatment led to the increase of TrkB levels in the cerebellum of SCA6 mice (Cook et al., 2022), implicating TrkB in the mechanism of action, even if mechanistically, this is not via direct TrkB activation alone. However, even if the mechanism is currently incompletely explained, we believe that 7,8- DHF remains a valuable treatment strategy for SCA6. We have tried to rewrite the Discussion to highlight what we think is the most important takeaway: that 7,8-DHF can rescue endosomal and other deficits in SCA6, even if we do not currently know the full mechanism of action. We have therefore amended the text to add more detail about other potential explanations for the mechanism of action of 7,8-DHF.

      References

      Camuso S, La Rosa P, Fiorenza MT, Canterini S. 2022. Pleiotropic effects of BDNF on the cerebellum and hippocampus: Implications for neurodevelopmental disorders. Neurobiol Dis. doi:10.1016/j.nbd.2021.105606

      Choo M, Miyazaki T, Yamazaki M, Kawamura M, Nakazawa T, Zhang J, Tanimura A, Uesaka N, Watanabe M, Sakimura K, Kano M. 2017. Retrograde BDNF to TrkB signaling promotes synapse elimination in the developing cerebellum. Nat Commun 8:195. doi:10.1038/s41467-017-00260-w

      Cook AA, Jayabal S, Sheng J, Fields E, Leung TCS, Quilez S, McNicholas E, Lau L, Huang S, Watt AJ. 2022. Activation of TrkB-Akt signaling rescues deficits in a mouse model of SCA6. Sci Adv 8:3260. doi:10.1126/sciadv.abh3260

      Javed S, Lee YJ, Xu J, Huang WH. 2021. Temporal dissection of Rai1 function reveals brain-derived neurotrophic factor as a potential therapeutic target for Smith-Magenis syndrome. Hum Mol Genet 31:275–288. doi:10.1093/HMG/DDAB245

      Koppel I, Aid-Pavlidis T, Jaanson K, Sepp M, Pruunsild P, Palm K, Timmusk T. 2009. Tissue-specific and neural activity-regulated expression of human BDNF gene in BAC transgenic mice. BMC Neurosci 10:68. doi:10.1186/1471-2202-10-68

      Kuczewski N, Porcher C, Ferrand N, Fiorentino H, Pellegrino C, Kolarow R, Lessmann V, Medina I, Gaiarsa JL. 2008. Backpropagating action potentials trigger dendritic release of BDNF during spontaneous network activity. J Neurosci 28:7013–7023. doi:10.1523/JNEUROSCI.1673-08.2008

      Lam D, Enright HA, Cadena J, Peters SKG, Sales AP, Osburn JJ, Soscia DA, Kulp KS, Wheeler EK, Fischer NO. 2019. Tissue-specific extracellular matrix accelerates the formation of neural networks and communities in a neuron-glia co-culture on a multi-electrode array. Sci Rep 9. doi:10.1038/s41598- 019-40128-1

      Lejkowska R, Kawa MP, Pius-Sadowska E, Rogińska D, Łuczkowska K, Machaliński B, Machalińska A. 2019. Preclinical Evaluation of Long-Term Neuroprotective Effects of BDNF-Engineered Mesenchymal Stromal Cells as Intravitreal Therapy for Chronic Retinal Degeneration in Rd6 Mutant Mice. Int J Mol Sci 2019, Vol 20, Page 777 20:777. doi:10.3390/IJMS20030777

      Vivacqua G, Renzi A, Carpino G, Franchitto A, Gaudio E. 2014. Expression of brain derivated neurotrophic factor and of its receptors: TrKB and p75NT in normal and bile duct ligated rat liver. Ital J Anat Embryol 119:111–129. doi:10.13128/IJAE-15138

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their thoughful and careful evaluation of our manuscript. We appreciate your time and effort and have incorporated many of these suggestions to improve our revised manuscript.

      Reviewer #1 (Public Review):

      Summary: Cullinan et al. explore the hypothesis that the cytoplasmic N- and C-termini of ASIC1a, not resolved in x-ray or cryo-EM structures, form a dynamic complex that breaks apart at low pH, exposing a C-terminal binding site for RIPK1, a regulator of necrotic cell death. They expressed channels tagged at their N- and C-termini with the fluorescent, non-canonical amino acid ANAP in CHO cells using amber stop-codon suppression. Interaction between the termini was assessed by FRET between ANAP and colored transition metal ions bound either to a cysteine reactive chelator attached to the channel (TETAC) or metal-chelating lipids (C18-NTA). A key advantage to using metal ions is that they are very poor FRET acceptors, i.e. they must be very close to the donor for FRET to occur. This is ideal for measuring small distances/changes in distance on the scales expected from the initial hypothesis. In order to apply chelated metal ions, CHO cells were mechanically unroofed, providing access to the inner leaflet of the plasma membrane. At high pH, the N- and C- termini are close enough for FRET to be measured, but apparently too far apart to be explained by a direct binding interaction. At low pH, there was an apparent increase in FRET between the termini. FRET between ANAP on the N-and Ctermini and metal ions bound to the plasma membrane suggests that both termini move away from the plasma membrane at low pH. The authors propose an alternative hypothesis whereby close association with the plasma membrane precludes RIPK1 binding to the C-terminus of ASIC1a.

      Strengths: The findings presented here are certainly valuable for the ion channel/signaling field and the technical approach only increases the significance of the work. The choice of techniques is appropriate for this study and the results are clear and high quality. Sufficient evidence is presented against the starting hypothesis.

      Weaknesses: I have a few questions about certain controls and assumptions that I would like to see discussed more explicitly in the manuscript.

      My biggest concern is with the C-terminal citrine tag. Might this prevent the hypothesized interaction between the N- and C-termini? What about the serine to cysteine mutations? The authors might consider a control experiment in channels lacking the C-terminal FP tag.

      While it is certainly possible that the C-terminal citrine tag is preventing the hypothesized interaction between the intracellular termini, there are a few things that mitigate (but not eliminate) this concern. First, previous work looking at the interaction between the intracellular termini used FPs on both the N- and C-termini and concluded that in fact there is an interaction (PMID:31980622). Our channels have only a single FP, and we use a higher resolution FRET approach. Second, we aVach our citrine tag with a 11-residue linker, allowing for enhanced flexibility of the region and hopefully allowing for more space for an interaction that was posited to be between the very proximal part of the C-terminus (near the membrane and away from the tag) and the untagged N-terminus. Third, we previously showed that Stomatin, a much larger protein than the NTD, could bind the distal C-terminus of rASIC3 with a large fluorescent protein connected by the same linker on the C-terminus. In the case of Stomatin, the interaction involved the residues at the distal portion of the C-terminus close to the bulky FP. Interestingly, while we did not publish this, without this flexible linker, Stomatin could not regulate the channel and likely did not bind.

      Despite this, we agree that this is possible and have added a statement in our limitations section explicitly saying this.

      Figure 2 supplement 1 shows apparent read-through of the N-terminal stop codons. Given that most of the paper uses N-terminal ANAP tags, this figure should be moved out of the supplement. Do Nterminally truncated subunits form functional channels? Do the authors expect N-terminally truncated subunits to co-assemble in trimers with full-length subunits? The authors should include a more explicit discussion regarding the effect of truncated channels on their FRET signal in the case of such co-assembly.

      The positions that show readthrough (E6, L18, H515) were not used in the study. We eliminated them largely on the basis of these westerns. We elected to put the bulk of the blots in the supplement simply because of how many there were. We believe this is the best compromise. It allows us to show representative blots for all our positions without making an illegible figure with 7 blots.

      The N-terminally truncated subunits would create very short peptides that are not able to create functional channels. A premature stop at say E8 would create a 7-mer. Our longest N-terminal truncation would only create a protein of 32 amino acids. These don’t contain the transmembrane segments and thus cannot make functional channels.

      As the epitope used for the western blots in Figure 2 and supplements is part of the C-terminal tag, these blots do not provide an estimate of the fraction of C-terminally truncated channels (those that failed to incorporate ANAP at the stop codon). What effect would C-terminally truncated channels have on the FRET signal if incorporated into trimers with full-length subunits?

      Alternatively, C-terminally truncated subunits would be able to form functional channels because they contain the full N-terminus, the transmembrane domains, the extracellular domain and a portion of the C-terminus. We don’t think this is a major contaminant to our experiments. The only two C-terminal ANAP positions we use are 464 and 505. In each of these cases, they are only used for memFRET. The ones that do not contain ANAP are essentially “invisible” to the experiment. Since we are measuring their proximity to the membrane, having some missing should not maVer. However, there is some chance that truncations in some subunits could allosterically affect the position of the CT in other subunits. We have added a discussion of this in the manuscript.

      Some general discussion of these results in the context of trimeric channels would be helpful. Is the putative interaction of the termini within or between subunits? Are the distances between subunits large enough to preclude FRET between donors on one subunit and acceptor ions bound on multiple subunits?

      Thank you for this comment. We did not directly test whether the distances are within or between subunits. We considered using a concatemer to do this, however, the concatemeric channels do not express particularly well. Then, UAA incorporation hurts the expression as well. It was unlikely we would be able to get sufficient expression for tmFRET.

      However, the Maclean group has previously tested this using FRET between concatenated subunits and determined that FRET is stronger within than between subunits. We have updated the manuscript to reflect a more thorough discussion of our results in the context of their trimeric assembly.

      The authors conclude that the relatively small amount of FRET between the cytoplasmic termini suggests that the interaction previously modeled in Rosetta is unlikely. Is it possible that the proposed structure is correct, but labile? For example, could it be that the FRET signal is the time average of a state in which the termini directly interact (as in the Rosetta model) and one in which they do not?

      The proposed RoseVa model does not include the reentrant loop of the channel, so it is probable that this model would change if it were redone to include this new feature of the channel.

      However, we do discuss the limitation of FRET as a method that measures a time average that is weighted towards closest approach in our discussion section. The termini are most certainly dynamic and it is possible that spend some time in close proximity. Given that FRET is biased towards closest approach, we actually think this strengthens our argument that the termini don’t spend a great deal of time in complex. In addition, our MST data suggests that the termini do not bind. We have added some commentary on this to the discussion section for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The authors use previously characterised FRET methods to measure distances between intracellular segments of ASIC and with the membrane. The distances are measured across different conditions and at multiple positions in a very complete study. The picture that emerges is that the N- and C-termini do not associate.

      Strengths:

      Good controls, good range of measurements, advanced, well-chosen and carefully performed FRET measurements. The paper is a technical triumph. Particularly, given the weak fluorescence of ANAP, the extent of measurements and the combination with TETAC is noteworthy.

      The distance measurements are largely coherent and favour the interpretation that the N and C terminus are not close together as previously claimed.

      Weaknesses:

      One difficulty is that we do not have a positive control for what binding of something to either N- or Cterminus would look like (either in FRET or otherwise).

      We acknowledge that this is a challenge for the approach. Having a positive control for binding would be great but we are not sure such a thing exists. You could certainly imagine a complex between two domains where each label (ANAP and TETAC) are pointed away from one other (giving comparatively modest quenching) or one where they are very close (giving comparatively large quenching), both of which could still be bound. This is essentially a less significant version of the problem with using FPs to measure proximity…they are not very good proxies for the position of the termini. These small labels are certainly beVer proxies but still not perfect. Our conclusion here is based more on the totality of the data. We tried many combinations and saw no sign of distances closer than ~ 20A at resting pH. We think the simplest explanation is that they are not close to one another but we tried to lay out the limitations in the discussion.

      One limitation that is not mentioned is the unroofing. The concept of interaction with intracellular domains is being examined. But the authors use unroofing to measure the positions, fully disrupting the cytoplasm. Thus it is not excluded that the unroofing disrupts that interaction. This should be mentioned as a possible (if unlikely) limitation.

      Thank you for your comment. We discuss unroofing as a potential limitation because it exposes both sides of the plasma membrane to changes in pH. We have updated this section to include acknowledgement of the possibility that unroofing disrupts the interaction via washout of other critical proteins.

      Reviewer #3 (Public Review):

      Summary: The manuscript by Cullinan et al., uses ANAP-tmFRET to test the hypothesis that the NTD and CTD form a complex at rest and to probe these domains for acid-induced conformational changes. They find convincing evidence that the NTD and CTD do not have a propensity to form a complex. They also report these domains are parallel to the membrane and that the NTD moves towards, and the CTD away, from the membrane upon acidification.

      Strengths:

      The major strength of the paper is the use of tmFRET, which excels at measuring short distances and is insensitive to orientation effects. The donor-acceptor pairs here are also great choices as they are minimally disruptive to the structure being studied.

      Furthermore, they conduct these measurements over several positions with the N and C tails, both between the tails and to the membrane. Finally, to support their main point, MST is conducted to measure the association of recombinant N and C peptides, finding no evidence of association or complex formation.

      Weaknesses:

      While tmFRET is a strength, using ANAP as a donor requires the cells to be unroofed to eliminate background signal. This causes two problems. First, it removes any possible low affinity interacting proteins such as actinin (PMID 19028690). Second, the pH changes now occur to both 'extracellular' and 'intracellular' lipid planes. Thus, it is unclear if any conformational changes in the N and CTDs arise from desensitization of the receptor or protonation of specific amino acids in the N or CTDs or even protonation of certain phospholipid groups such as in phosphatidylserine. The authors do comment that prolonged extracellular acidification leads to intracellular acidification as well. But the concerns over disruption by unroofing/washing and relevance of the changes remain.

      We acknowledge that unroofing is a limitation of our approach and noted it in the discussion. However, we have updated the section to include the possibility that the act of unroofing and washing could also disrupt the potential interaction between the intracellular domains as well as between these domains and other intracellular proteins. This was the best approach we could use to address our questions and it required that we unroof the cells. However, we look forward to future studies or new techniques that do not require the unroofing of the cells.

      The distances calculated depend on the R0 between donor and acceptor. In turn, this depends on the donor's emission spectrum and quantum yield. The spectrum and yield of ANAP is very sensitive to local environment. It is a useful fluorophore for patch fluorometry for precisely this reason, and gating-induced conformational changes in the CTD have been reported just from changes in ANAP emission alone (PMID 29425514). Therefore, using a single R0 value for all positions (and both pHs at a single position) is inappropriate. The authors should either include this caveat and give some estimate of how big an impact changes spectrum and yield might have, or actually measure the emission spectra at all positions tested.

      This is a reasonable concern and one we considered. Measuring the quantum yield would be quite difficult. However, we have measured spectra at a number of positions and see a relatively minimal shik in the peak. Most positions peak between 481 and 484nm. If you calculate the difference in R0 using theoretical spectra with a blue shik of 20nm, the difference in R0 is only ~1.5A. A shik of 20nm is on the higher side of anything we have seen in the literature (PMID 30038260) and since even with that large a shik, the difference is minimal we do not think measuring spectra for each position would impact the overall conclusions presented. As you noted, though, the quantum yield also changes. Assuming a change in yield from 0.22 to 0.47, the largest we found reported in the literature (PMID:29923827) , the R0 would increase by 2A. This same paper showed that the blue shiked position was the one with the higher extinction coefficient so these changes would be working in opposition to one another making the difference in R0 even smaller. It is important to note, that while tmFRET is a much more powerful measure of distance than standard FRET, these distances, as you point out, are quite challenging to measure precisely. Our conclusions are based less on the absolute distances and more on the observation that no positions show large quenching and that if there is any change upon acidification, it is in the wrong direction.

      Overall, the writing and presentation of figures could be much improved with specific points mentioned in the recommendations for authors section.

      See below.

      The authors argue that the CTD is largely parallel to the plasma membrane, yet appear to base this conclusion on ANAP to membrane FRET of positions S464 and M505. Two positions is insufficient evidence to support such a claim. Some intermediate positions are needed.

      We do not see in the paper where we suggest that the CTD is parallel. However, your point that we could try and determine if this was the case is correct. However, we aVempted to create several other CTD TAG mutants but struggled with readthrough and poor expression of these mutants so we opted to just include S464 and M505. Our point from these data is only that the distal CTD (505) must spend significant time near the membrane to explain our FRET data.

      Upon acidification, NTD position Q14 moves towards the plasma membrane (Figure 8B). Q14 also gets closer to C515 or doesn't change relative to 505 (Figures 7C and B) upon acidification. Yet position 505 moves away from the membrane (Figure 8D). How can the NTD move closer to the membrane, and to the CTD but yet the CTD move further from the membrane? Some comment or clarification is needed.

      This is a reasonable question and one that is hard to definitively answer. Our goal here was to test the hypothesis that the termini are bound at rest. Mapping the precise positions of the termini is difficult for reasons we will enumerate in the question that asks why we didn’t make a model. There are potentially multiple explanations but the easiest one would be that the CTD could move away from the membrane but closer to Q14, for instance, if the distal termini, say, rotated towards the NTD. This would move 505 closer and have no impact on whether or not the NTD and CTD moved away or toward the membrane.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns

      The authors show the spectrum of ANAP attached to beads and use this spectrum to calculate R0 for their FRET measurements. Peak ANAP fluorescence is dependent on local environment and many reports show ANAP in protein blue-shiked relative to the values reported here. How would this affect the distance measurements reported?

      This is an important point. See above for the answer.

      Could the lack of interaction between the N- and C-terminal peptides in Figure 7 arise from the cysteine to serine mutations or lack of structure in the synthetic peptides. How were peptide concentrations measured/verified for the experiment?

      It is possible that cysteine to serine mutations could prevent the interaction. It is also possible that these peptides are not capable of adopting their native fold without the presence of the plasma membrane or due to being synthetically created. However, the termini are thought to be largely unstructured. We received these peptides in lyophilized form at >95% purity and resuspended to our desired stock concentration (3 mM C-terminus, 1 mM N-terminus). Even if our concentration was off, we see no signs of interaction up to quite a high concentration.

      How was photobleaching measured for correcting the data?

      We executed several mock experiments at various TAG positions using either pH 8 and pH 6, where we performed the experiments as usual but with a mock solution exchange when we would normally add the metal. We normalized the L-ANAP fluorescence to the first image and averaged together these values for pH 8 and pH 6. We then corrected using Equation 2 in the manuscript..

      We have updated the methods to include how we adjusted for bleaching.

      The authors may wish to make it more explicit that their Zn2+ controls also preclude the possibility that a changing FRET signal between ANAP and citrine may affect their data.

      Thank you for this comment. We agree, it would strengthen the manuscript to include this statement. We have now included this.

      It might be useful to the reader if the authors could include (as a supplement) plots of their data (like in Figure 6), in which FRET efficiency has been converted to distance.

      We considered this idea as well but felt like showing the actual data in the figures and the distances in a table would be best.

      Figure 5D is mentioned in the text before any other figures. This is unconventional. Could this panel be moved to Figure 1 or the mention moved to later?

      Changed

      western blot is not capitalized.

      Changed.

      Figure 1, the ANAP structure shown is the methyl ester, which is presumably cleaved before ANAP is conjugated to the tRNA. The authors may wish to replace this with the free acid structure.

      This is a fair point. We originally used the methyl ester structure to indicate the version of ANAP we chose to use. However, you are correct that the methyl ester is cleaved before conjugation to the tRNA. We replaced the methyl ester with the free acid structure to clarify this.

      Figures 1 and 4 should have scale bars for the images.

      Scale bars have been added to figures 1, 4, and 5.

      In Figure 3, the letters in the structures (particularly TETAC) are way too small. Please increase the font size.

      Changed

      In Figure 3 and Figure 3 supplement 1, the axes are labeled "Absorbance (M-1cm-1)." Absorbance is dimensionless. The authors are likely reporting the extinction coefficient.

      Thank you for catching this. We adjusted the axes to extinction coefficient.

      In Figures 5 B and C, it might be clearer if the headers read "Initial, +Cu2+/TETAC, DTT" rather than "Initial, FRET, Recovery."

      Changed

      The panel labels for Figure 8 seem to be out of order.

      Changed

      The L for L-ANAP should be rendered, by convention, in small caps.

      This is a good example of learning something new from the review process. This is the first I have ever heard of small caps. We can find no other papers that use small caps for L-ANAP so I am not 100% sure what convention this is referring to and don’t want to change the wrong thing in the paper. We are happy to change if the editorial staff at eLife agree but have lek this for now.

      Reviewer #2 (Recommendations For The Authors):

      With so many distances measured, why was not even a basic structural model attempted?

      We certainly considered it, but a number of things lead us to conclude that it might imply more certainty about the structure of these termini than we hope to give. 1) Given that the FRET is a time average of positions, these distance constraints would not do much constraining. 2) Given that the termini are likely unstructured and flexible this makes the problem in 1 worse. 3) There is no structural information to use as a starting point for a model. 4) The flexibility of the linkers for each FRET pair also introduces uncertainty. This can, in theory, be modeled as they do in EPR but all of this together made us decide not to do this. What we hope readers take home, is the overall picture of the data is not consistent with the original RIPK1 hypothesis.

      Maybe it would be good to draw a band on the graphs in Figure 6 for the FRET signal expected for interaction (and thus, disfavoured by these data). This would at least give context.

      We agree this could be helpful, but it is not so easy to do. What distance would we choose? We could put a line at ~5Å (the model predicted distance). As we noted above, a number of distances could be compatible with an interaction. However, we think it’s unlikely that if a complex was formed that none of our measurements would show a distance closer than 20Å at rest and that an unbinding event would then lead to a decrease in distance. This, to us, is the take home message.

      Minor points:

      "Aker unroofing the cells, only fluorescence associated with the "footprint", or dorsal surface, of the cell membrane is lek behind."

      The authors use dorsal and ventral in this section to describe parts of an adherent cell. But in the first instance, they remove the dorsal part of the cell, and then in this phrase, the dorsal part is lek behind....I am a bit confused.

      Thank you for pointing out this mistake, we have fixed this. It is indeed the ventral surface lek behind.

      "bind at rest an" - and?

      Changed

      "One previous study used a different approach to try and map the topography of the intracellular termini of ASIC1a comparable to our memFRET experiments." I think a citation is due.

      Citation added

      "great deal of precedent" even if this result is from my own lab, I would prefer that the authors note that it's one study from one lab! I think best just to delete "great deal of".

      “Great deal of” deleted

      I think the column "Significance" in the tables is unnecessary when the P value is given.

      Thank you for this suggestion. We agree and have made the change.

      Figure 7a Q14TAG has a clearly bimodal distribution at pH 8. What could be the meaning of this result? The authors do not mention it that I could find. Perhaps there is no meaning. The authors should state what they think is (or is not) going on.

      This is a good question and we don’t have a good answer. It appears to be experimental variability. The data from the “low fret” in this experimental condition all came from the same days. So something was different that day. We considered that they might be outliers to exclude but thought showing all of our data was the beVer path. We reperformed the ANOVA here separating out the “outlier” day and nothing of substance changed. Both populations were still different with P value less than 0.001.

      Typo: Lumencore

      Changed

      Maybe just a matter of taste but the panel created with Biorender in Figure 8 is not attractive and depicts the channel differently to in Figure 5D, which is again different from Figure 1A. Surely one advantage of using computer-generated artwork could be to have consistency.

      We agree and have used the same cartoon for all of our images with the one exception being the schematics that are just meant to show the positions that are present in each bar graph.

      Figure 4A was squashed to fit (text aspect ratio is wrong).

      Fixed

      Reviewer #3 (Recommendations For The Authors):

      Citrine is used to report incorporation. Yet citrine has a strong tendency to dimerize (PMID 27240257). Did the authors use mCitrine or just Citrine? This is quite important in interpreting their data.

      Thank you for pointing out this important distinction. We use mCitirine which we have added to the methods.

      The manuscript has numerous instances of imprecise language. For example, page 10, last para, first line, "previous studies have looked at..." or page 7, final paragraph "tell a similar story". Related, the figures could be much better. For example, in Figure 1, where the authors depict the anap chemical in red, as opposed to the blue one might expect of a blue emiqng fluorophore. In figure 6, ANAP is also in red with the quenching group in green. This is opposite to how one typically thinks of FRET with the warmer color being the acceptor not the donor. Moreover, the pH 6 condition is also colored the same shade of red as the ANAP. Labels of Cys positions would again be useful here. In Figure 3, the heteroatoms of TETAC and C18-NTA are very small and difficult to see. It would also be good to label these structures, and the spectra below, so the reader can tell at a glance without looking at the caption, what the structures and spectra arise from. Also, how are the absorption spectra normalized? This is not discussed in the methods. The lack of attention to presentation mars an otherwise nice study.

      Thank you for these points. We have made modifications to the manuscript to address these comments.

      Abstract, second last line "Aker prolonged acidification, ...", 'prolonged' could be interpreted as 'it takes a while for the domain to move' or 'the movement only happens aker a while'. This not what the authors intend to convey. Consider modifying to just 'Aker acidification,'

      We updated the main text to indicate that prolonged acidification is intended to describe acidification that occurs over the minutes timescale.

      Pdf page 6, bottom para on Anap incorporation not altering channel function: What is meant by 'steady state pH dependence of activation'? This implies the authors applied a pH stimulus, then waited until equilibrium was achieved ie. until desensitization was complete and measured the current at that point. It seems more likely they simply applied different pH stimuli and measured the peak response and that the use of 'steady state' here is a typo.

      We removed the phrase steady state.

      Same section, controls of electrophysiology allude to 485, 505 and 515 ANAP-containing channels. In fact, the authors have no way of determining what fraction (if any) of the pH evoked currents arise from channels containing Anap in those positions versus from simply having a translation stop but still functioning. This should be mentioned.

      This is correct. We cannot be sure the CTD TAG positions are not a mixture of ANAP-containing channels and truncations. See above for why we do not think this a big concern for the FRET experiments. Functionally, though, you are correct that we cannot tell. We now mention this in the paper.

      Methods, the abbreviation for SBT should be defined somewhere.

      Added.

      Methods, unroofing section, middle paragraph, the authors use nM not nm to list wavelengths of light.

      Changed.

      Figure 3C-D: There's an unexpected blip in the Anap emission spectra at ~500 nm. Are the grating efficiency of the spectrograph and quantum efficiency of the camera accounted for in these spectra?

      This is a good question. The data are not corrected for either camera efficiency or grating efficiency. We don’t have easy access to the actual data (although we can see a pdf version of each). There is a liVle blip in the grating efficiency graph that could partly explain the blip in our spectra.

      Figure 5C, were recovery experiments routinely done? If so, would be good to show more than n = 1 in the plot to get an idea of reproducibility.

      Recovery experiments were done in every experiment but are not shown for simplicity. We have included all FRET and recovery data for position Q14TAG-C469 at pH 6 in figure 5C to show reproducibility of our FRET and recovery data.

      Table 1, considering adding a Δ distance column (pH 8 versus 6) so the magnitude of changes are more easily seen.

      This is a reasonable suggestion but we decided not to include a Δ distance column. The data are whole numbers and people can easily determine the Δ distance. We felt that including that column would bring too much focus on what we think are preVy small changes. Our hope is that readers take away that the data are not consistent with complex formation between the determine and focus less on absolute distances.

      Figure 7A, Q14tag pH 8 condition has a quite a bit of spread and, likely, two populations. These data, as well as G11, are unlikely to be parametric and hence ANOVA is inappropriate. A normality test, and likely Kruskal-Wallis test is called for.

      Aker testing for normality, the data for Q14TAG C485 pH8 are non-normally distributed. However, a Kruskal Wallis is a non-parametric test for a one-way ANOVA and not applicable here. We separated the data out into population 1 and 2 and repeated the two-way ANOVA statistical test. When Q14TAG pH 8 is split into 2 populations, the statistics hardly change. When the data is not separated, Q14TAG pH 8 relative to pH 6 has a p-value <0.0001. When the 2 populations are separated, both populations relative to Q14TAG pH 6 still have a p-value of <0.0001.

    1. Author Response

      In this paper, we examine the behavioral context that generates foraging decisions at the boundaries of food patches in the nematode C. elegans. By analyzing animal locomotion at high spatial and temporal resolution, we identify discrete behavioral responses to encountering the edge of a food patch that can be understood as a decision: either to remain inside the food patch or to leave it. We find that the decision to leave a food patch is associated with increased behavioral arousal that unfolds on long and short timescales. The coupling of increased arousal to lawn leaving decisions is preserved across genetic, neuronal, and environmental manipulations that alter global arousal levels. However, genetic inactivation of a set of chemosensory neurons disrupts the coupling of arousal and lawn leaving, revealing a potential site of integration between internal signals and external sensation that governs foraging.

      We appreciate the reviewers’ thoughtful engagement with this work. In addition to modifications in the text to address minor concerns and ambiguities, we have conducted new analyses and made text and figure edits to strengthen or explain our conclusions. We have also investigated possible confounding explanations to our interpretation of the data.

      In newly added analysis, we show that increased arousal does not result in increased proximity to the lawn boundary, which would be a trivial reason why roaming animals leave more than dwelling ones (new Figure 2-Supplement 1E).

      We also addressed the concern that classifying the brief speed acceleration motif as a roaming state would inflate the apparent coupling of roaming to leaving. By measuring the duration of roaming states prior to leaving, we in fact found the opposite: roaming states that precede leaving are slightly longer than other roaming states, not short acceleration events (new Figure 2-Supplement 4).

      The reviewers also asked reasonable questions about variability between batches of experiments. In particular, reviewers pointed out high levels of roaming in wild type controls accompanying npr-1 mutants. Indeed, the simultaneously-tested wild type animals roamed more than usual in this experiment (Fig. 4C,K) and less than usual in other panels (Fig. 4A,B,I,J) in these small datasets. There is more to do here, but the results support the general point that roaming and leaving are correlated in several neuromodulatory mutants that regulate roaming. We have included a new sentence in the Figure 4 legend to draw the reader’s attention to the potential limitations of these results, and to explicitly state that results should not be compared across panels. Similarly, there is more to be done to understand tax-4, as we did not test all tax-4-expressing sensory neurons for their effects on roaming and leaving.

      In private comments, reviewers also asked about experimental design and statistics and were concerned that certain assays conducted on just a few days may not represent independent experiments. We have updated the Methods section to improve the description of the behavioral experiments, including more information about the behavioral chambers and imaging conditions. We note that for all experiments we tested all relevant genotypes in the same batches and days, enabling comparisons of experimental animals with matched controls conducted at the same time.

      Reviewers asked us to compare our results to those generated by Rhoades, et al. (2019) and Cermak, et al. (2020). To the best of our knowledge, our results are fully consistent with those studies. The study by Rhoades and co-authors is primarily concerned with behavioral slowing upon first encountering a food patch, and thus does not include data regarding roaming or lawn leaving (Rhoades et al., 2019). As we mention in the text, we were initially surprised that tph-1 did not eliminate regulation of roaming by feeding, but there are straightforward explanations (redundant transmitters, other neurons). tph-1 did have a significant, albeit small, effect. The study by Cermak and co-authors presents an alternative Hidden Markov Model that uses whole animal postures to segment on-food behavior into 9 states including 8 dwelling states and a single roaming state (Cermak et al., 2020); we refer to this analysis in the discussion. Cermak’s paper and ours differ in experimental conditions, the behaviors measured, and the models used to analyze them. The animals in the Cermak paper are exposed to a large bacterial lawn of uniform density, whereas animals in our study are recorded on small bacterial lawns with thick edges. The analysis tools also differ in their use of animal posture (Cermak only) and autoregressive dynamics (our work only). Further studies of the neurons and molecules involved may help to fully harmonize these models.

      References

      Cermak, N., Yu, S.K., Clark, R., Huang, Y.C., Baskoylu, S.N., and Flavell, S.W. (2020). Whole-organism behavioral profiling reveals a role for dopamine in statedependent motor program coupling in C. Elegans. Elife 9, 1–34.

      Rhoades, J.L., Nelson, J.C., Nwabudike, I., Yu, S.K., McLachlan, I.G., Madan, G.K., Abebe, E., Powers, J.R., Colón-Ramos, D.A., and Flavell, S.W. (2019). ASICs Mediate Food Responses in an Enteric Serotonergic Neuron that Controls Foraging Behaviors. Cell 176, 85-97.e14.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      We now make clear throughout the manuscript that our proposition, holding the fast cassette as central to control over powerful movements governed by the PMn, remains a hypothesis. However, we provide additional rationale for our thinking that this is the case based on functional distinctions between the PMns and SMns. Both reviewers 1 and 2 also questioned why so few synaptic and ion channel genes are seen for the SMn type. As pointed out by the reviewer, the idea that small differences in birthdates between Mn types seems like an unlikely explanation and was removed. Now, we better develop the idea that the low levels of expression of both ion channel and synaptic genes in SMns are consistent with the finding from electrophysiology that point to greatly lowered levels of transmitter release, compared to PMns. Additionally, for the purpose of identifying all synaptic and ion channel genes shared equally between Mn types, we re-examined the transcriptome. Figure 7A & B now reflect all genes in these two categories detected above threshold in PMn and SMn types, and not just examples.

      Reviewer 2

      We have added cell types in mammalian circuits shown to express the ion channel cassette members. Examples include the calyx of Held in the auditory circuit and the cerebellar Purkinje neurons. As we show with zebrafish PMn these mammalian neurons form fast, reliable circuits. In these cases, it is noteworthy that our proposal is the first to link all three as functional partners in fast AP firing and high-fidelity synaptic transmission. The suggestion that pancreatic cells would be represented in our data is deemed highly unlikely as our technique separated out the spinal cords prior to dissociation. Finally, as suggested, we added the disclaimer that we can not exclude the possibility that clusters sharing both glia and neuronal markers may represent cell doublets. Other minor corrections were all made.

      Reviewer 3

      First, we agree that the role of PMns is not restricted to escape behavior. They have been shown to participate in the highest speed of swimming as well. We have made this clear throughout the paper.

      Second, we are at odds with this reviewer over the Type I and Type II V2a recruitment during high speed swimming. We agree that both V2a types of interneurons are involved in high speed swimming and likely escape, as both directly innervate the PMns, as pointed out by the reviewer in Figure 2c of Menelaou and McLean 2019. However, the reviewer interprets Figure 2c to show that Type I, not Type II, V2a is more highly recruited over the range of higher swimming speeds whereas we conclude just the opposite. These data, along with other papers we cited, have been firmed up in the text to support a central role played by Type II.

      Third, the reviewer recommends we remove Figures 6b and 6c relating to our two newly discovered SMn markers, fox1b and alcamb. Our data shown in Figure 6a shows that these markers label SMn somas in two distinct layers along the dorsal-ventral axis in the spinal cord. The reviewer objects to Figures 6b and 6c which compare the location of our two markers to the distributions of two well studied SMn labeling transgenic lines, islet:GFP and gata2:GFP. The correspondence is not absolute but suggests that the fox1b labels islet SMns and alcamb labels the gata2 SMns. In the previous version of the paper, we suggested that this correspondence might further signal different dorsal-ventral projections. This suggestion was based solely on reports that islet and gata2 transgenic lines preferentially label SMns with different projections. We do not view this particular point as important and in light of the controversy surrounding these projections, as noted by the reviewer, we removed all reference to the subject of muscle target areas. We focus instead, on our finding of two new markers that label different dorsal ventral soma layers which MAY correspond to previously described SMn types. This reasoning is made clear in the manuscript and, because of its potential importance, we elected to retain Figures 6b and 6c as a call for future testing.

      The reviewer makes other suggestions that were all incorporated. The CoLo estimates indeed were too high, as questioned by the reviewer, because, early on, we inadvertently counted two clusters rather than the single cluster that was later authenticated. This has been corrected to reflect 1.1% in Table 1. The evx1 and evx2 data have been added to Figure 4C. Nomenclature is corrected for KA neurons. We make clear that the axonal projections for CoLo were made with mCherry expression not the in-situ label. The Hayashi reference was added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors)

      MAJOR CONCERNS

      1) Not addressed, but perhaps relevant, is that most of the postembryonic fish growth results from stem cells located in the ciliary marginal zone that make new neurons and Muller glia throughout the fish's life. Thus, Muller cell heterogeneity may result from the central to the peripheral gradient of Muller glial cell maturation.

      1a. Müller glial cell heterogeneity needs to be confirmed using, for example, in situ hybridization studies with gene-specific probes identified in the scRNAseq that distinguish these 2 populations. An additional approach could be the use of transgenic lines harboring tagged endogenous or transgene that reflects the promoter activity of the Muller glia subtypespecific gene.

      We thank the reviewer for the insightful comments and agree on the importance to substantiate the Müller glia heterogeneity in our manuscript. Our study is not the only study that provides evidence for Müller glia heterogeneity. In particular, we would like to refer to a recent publication (Krylov et al., 2023). Using single cell RNA sequencing, Krylov et al. detect Müller glia heterogeneity in the uninjured retina, as well as upon selective, genetic ablation of distinct subtypes of photoreceptors (e.g. long and short wavelength sensitive cones, as well as rods). They observe six distinct clusters of quiescent Müller glia that show differential spatial distribution along the dorsal/ventral retinal axis. For instance, they report a ventral quiescent Müller glia population that shares some marker genes (aldh1a3, rdh10a, smoc1) with our nonreactive Müller glia 2 (cluster 2, supplementary files 1 and 2). Moreover, the authors report that Müller glia located at different positions along the dorsal/ventral axis exhibit distinct patterns of pcna upregulation as well as subsequent re-activation upon photoreceptor ablation. We have added the supportive information from Krylov et al. in the discussion section (lines: 781-789) of our manuscript.

      2) Most interesting, but also least substantiated, is the authors' report of 2 different quiescent Muller glial cell populations in the uninjured retina and 2 different reactive Muller cell populations in the injured retina. If these populations exist independently of each other, it would be important to investigate if they differentially impacted retina regeneration.

      2a. CRISPR knockdown in F0 of factors thought to be involved in specific Müller glia-derived progenitor trajectories would be important to lend some functional significance to the data.

      We fully agree with the reviewer that addition of functional data would enrich the manuscript with valuable information. However, we don´t believe that the suggested CRISPR knockdown of selected genes in F0 animals (also known as crispants) represents a suitable approach. Crispants have been used successfully to investigate genetic contributions in embryonic-tolarval stages (the first few days) of zebrafish development, as injection of multiple gRNAs targeting the same gene is sufficient to achieve a bi-allelic knockout of the gene of up to 90% (Kroll et al., 2021). However, unless both alleles of the target gene(s) is/are mutated already early on with nearly 100%, it is unlikely that the gRNA inactivation would work equally well during subsequent development into adult stages (several months later, and after exponential growth and volume increase of the animal). Even if biallelic inactivation in the crispants does work early on, it remains unclear whether and how crispants survive to adulthood, which will be necessary in order to address gene function in the context of retina regeneration. Moreover, since we observe that the genetic events during adult retina regeneration are highly similar to the events during retina development, we would rather expect the crispants already display developmental phenotypes, which would further hamper the study of potential regenerationspecific phenotypes in adult animals. We are convinced that only ‘clean’ conditional gene inactivation studies will be suitable to address the impact of Müller glia and derived progenitor trajectories on retina regeneration. In this respect, we have recently developed the new conditional Cre-Controlled CRISPR mutagenesis system (Hans et al., Nature Comm 2021). We are currently establishing stable lines to enable controlled and specific gene inactivation, but have only obtained preliminary results so far; the final analysis will take much more time and is, therefore, beyond the scope of this work.

      3) The discussion should be modified to relate the data here presented with those described in Hoang et al., 2020.

      We followed the suggestions of the reviewer and compared our single cell RNA sequencing dataset to that described in Hoang et al., 2020. As one might expect, the comparison between the two datasets showed similarities but also significant differences due to the different experimental set-ups. We show the results of this comparison in additional main (new Figure 9) and supplementary figures (new Figure 9-figure supplement 1). In order to compare our newly obtained scRNAseq dataset of MG and MG-lineage-derived cells of the regenerating zebrafish retina to the previously published dataset of light-lesioned retina (Hoang et al., 2020), we employed the ingestion method (Scanpy, https://scanpy-tutorials.readthedocs.io/en/latest/ integrating-data-using-ingest.html) and mapped the clusters identified by Hoang and colleagues to our clusters (new Figure 9). While we applied a short-term lineage tracing strategy and only sequenced the enriched population of FAC-sorted MG and MG-derived cells of the regenerating zebrafish retina, Hoang and colleagues sequenced all retinal cells in the light-lesioned retina. Consequently, comparison between the two datasets uncovered similarities, but also significant differences, due to the different experimental set-ups (Figure 9A). Consistently, the cluster annotated as resting MG in Hoang et al. mapped to clusters annotated as non-reactive MG 1 and 2 in our dataset (new Figure 9B). The cluster annotated as activated MG in Hoang et al. mapped to clusters annotated as reactive MG 1 and 2, as well as to the cluster with hybrid identity of MG/progenitors in our dataset. Interestingly, some cells annotated as activated MG in Hoang et al. mapped also to neurogenic progenitor 2 and 3 clusters in our dataset (Figure 9B). The cluster annotated as progenitors in Hoang et al. mapped to the progenitor area in our dataset, which included neurogenic progenitors 2, 3 as well as photoreceptor and horizontal cell precursors (new Figure 9B). Finally, retinal ganglion cells, cones, GABAergic amacrine cells and bipolar cells annotated in Hoang et al. perfectly mapped to retinal ganglion cells, cone, amacrine and bipolar cells in our dataset (new Figure 9B). While we did not detect a mature horizontal cell cluster, Hoang and colleagues annotated a horizontal cell cluster, which cells mapped to reactive MG 2, MG/progenitors 1 and part of progenitors 3 in our dataset (new Figure 9B). Moreover, Hoang and colleagues annotated rod photoreceptors that mapped to progenitors 3, photoreceptor precursors, red and blue cones, horizontal cell precursors and bipolar cells in our dataset (new Figure 9B). Finally, due to the different cell isolation protocol, Hoang and colleagues annotated additional cell clusters that did not map to any cluster in our more selective dataset, and included oligodendrocytes, pericytes, retinal pigmented epithelial cells as well as vascular/endothelial cells (new Figure 9B). Next, we selected representative marker genes per cluster from our scRNAseq dataset and checked their expression in the dataset by Hoang and colleagues (Figure 9-figure supplement 1). The dot plot showing the expression of selected gene candidates per cluster further corroborated the large overlap between clusters annotated in the present study with those annotated in the study by Hoang and colleagues. These novel comparisons to the data of Hoang et al. are now included in the resubmitted version, and are described and discussed in an additional paragraph in the results (lines: 482-517) as well as discussion (lines: 766-807) sections.

      MINOR CONCERNS

      1) Fig 1C is difficult to interpret. I am also confused by the color coding which is not presented in the figure legend - why 3 shades of red and two of blue? Please define each (for example, what's the difference between red, purple, and light red in the 6dpl panel?). What are the white areas outlined by blue and red circles/cells (looks like a topography plot)? It appears that there is a fairly large amount of pcna:EGFP expression in the uninjured retina - what are these cells?

      We have replaced Figure 1C with a better one and rephrased/extended the explanation of the figure in the results (lines: 192-195). Figure 1C depicts contour plots, which represent the relative frequency of data. Each contour line encloses an equal percentage of events (that is, cells), and contour lines that are closely packed indicate a high concentration of events. In flow cytometry, contour plots are used to represent highly frequent events, as this kind of plots are independent on sample size.

      Concerning the observed pcna:EGFP expressing cells in the uninjured retina, we interpret them as proliferating cells coming from the ciliary marginal zone and from Müller glia of the central retina, which represent progenitors and Müller glia that have re-entered the cell cycle to generate rod progenitors, respectively. Consistent with that, we observe pcna:EGFPpositive cells in the ciliary marginal zone as well as central retina using immunofluorescence, as shown in Figure 1-figure supplement 1.

      2) Results, lines 186-188 are not presented clearly: EGFP+ cells may persist for some time after they leave the cell cycle, so stating EGFP+ cells are proliferating may not be correct. How long does PCNA promoter activity and EGFP expression remain after Muller cells exit the cell cycle? mCherry+/EGFP- cells may be non-reactive Muller glia or reactive Muller glia that have not entered the cell cycle. It seems likely that Muller glia start reprogramming before undergoing cell division.

      We agree with the reviewer that EGFP persists for some time after the cells have left the cell cycle, which we actually describe and use to benefit in our study. We do not know for how long exactly the pcna promoter is active within the cell cycle, but EGFP is known to have a half-life of approximately 24 hours (Li et al., 1998). Even though we cannot make a statement about EGFP persistence in Müller glia, we note that previous reports (Lahne et al., 2015; Nagashima et al., 2013; Nelson et al., 2013; Thummel et al., 2008) and our study (Figure 3-figure supplement 2) show PCNA at the protein level in Müller glia cells between 24 and 48 hpl, including our sampled 44 hpl time point (lines: 69-73). We also agree with the reviewer that Müller glia will become reactive to the injury most likely prior (lines: 67-69) to activation of the pcna promoter, meaning that Müller glia are EGFP-negative at this time point due to the immature status of EGFP after translation. However, we are confident that our data also comprises this cell state (early phase of Müller glia activation) because we sampled proliferating (EGFP- and mCherry-double positive cells) as well as non-proliferating Müller glia (mCherry-only positive cells) at all time points (lines: 213-215 and Figure 1C). We interpret that the early phase of Müller glia activation corresponds to Müller glia transitioning from a nonreactive to a reactive state. With respect to our UMAP, we map this cell state in cluster 1 localizing to the top left part of the cluster, abutting cluster 3, the reactive Müller glia 1 (Figure 2B).

      3) I am concerned by the observation that microglia were identified by scRNAseq as a contaminating cell population. Since FACS was based on gfap:mCherry expression, why did microglia end up in the mix? Also, what are the ‘...low-quality cells expressing many ribosomal transcripts...’ and why, if they are low-quality cells, did they pass the sequencing quality control as stated on lines 208-209?

      The reviewer is right that microglia should actually not end up in the sample when using the gfap:mCherry line. However, microglia always displayed a certain level of autofluorescence in our experimental set-up (possibly because they may have ingested some cell debris), which may have contributed to their presence in the FACS samples. In contrast to the reviewer, we were not concerned about this ‘contamination’, because the microglia could be easily identified and sorted out using bioinformatics. This is supported by the presented supplementary figure in which microglia separate from the core of clusters containing Müller glia and Müller gliaderived cells in the UMAP of the full dataset (Figure 2-figure supplement 1).

      We also acknowledge that ‘low quality cells’ is not an appropriate term for cells in the cluster expressing ribosomal mRNAs at high levels, as ribosomal enrichment actually does not give any information concerning their quality. We referred to them as ‘low quality’ because the enrichment in ribosomal transcripts masks their identity considerably. To correct this, we now renamed cells in this cluster descriptively as ‘ribosomal gene-enriched’ cells (Figure 2-figure supplement 1, line: 226).

      4) Fig. 2: please list in the text or fig legend the specific genes used to identify each cell cycle state. Why is cluster 3 considered a reactive Muller population when expressing S phase markers and PCNA? These features seem to distinguish cluster 3 from 4 and may suggest cluster 3 is a progenitor population. Further explanation is necessary to understand the assignments.

      Information about the specific genes used to identify each cell cycle state is provided in the paragraph “Bioinformatic analysis” (lines: 925-934) in the Materials and Methods section. We considered listing all the markers in either the results or the figure legends as well, but decided against it, as it impairs readability in our opinion. Nevertheless, we have now highlighted also in the results (line: 261) that the list of cell cycle markers is available in the Materials and Methods section.

      We understand the reviewer´s point that cluster 3 represents progenitors and not Müller glia, when PCNA expression is considered as a sole marker of progenitors or of Müller glia reprogrammed to a progenitor state (Hoang et al., 2020). However, we disagree with this view for three reasons. First, although PCNA is used as a marker of Müller glia reprogrammed to a progenitor state and of progenitors in Hoang et al., 2020, it should be noted that PCNA-positive, Müller glia cells are present in the central retina already in uninjured conditions, when regeneration-associated, Müller glia-derived progenitors are not detectable. Second, cluster 3 is evident only at 44 hpl, a time point at which Müller glia cells are about to divide or have undergone their first and only cell division. In this regard, we would like to refer to the discussion about Müller glia and Müller glia-derived progenitors as distinct populations in Lenkowski and Raymond, 2014. Third, we have performed in situ hybridization for starmaker (stm), a marker gene highly specific for cells in cluster 3 (supplementary files 1 and 3), combined with immunohistochemistry for GFAP and PCNA. The results of the staining are depicted in a new Figure 3-figure supplement 2. In strong agreement with our sequencing results, we observe stm expression only at 44 hpl, whereas no signal is detected in the uninjured as well as 4 and 6 dpl retina (Figure 3- figure supplement 2). Virtually all stm-positive cells at 44 hpl are also PCNA- and GFAP-double positive cells displaying a clear Müller glia morphology (Figure 3- figure supplement 2). Hence, we interpret cells in cluster 3 as reactive Müller glia, indicating that pcna can be used as a marker of progenitors, but not exclusively of progenitors, prevalently at later stages. At 44 hpl, Müller glia express pcna in order to undergo asymmetric cell division giving rise to the renewed Müller glia and the multipotent progenitor that will continue dividing.

      5) I am confused by the crlf1a scRNAseq data indicating it is associated with proliferating PCNA+ reactive Muller glia Cluster 3 and PCNA- reactive Muller glia Cluster4 at 44 hpl (Fig. 3), yet in Fig. 4 crlf1a in situ signal is exclusively associated with proliferating Muller glia at 44 hpl. Why don't we observe the crlf1a+/PCNA- cell population?

      We highlight that crlf1a expression is actually detected also at 4 dpl (Fig. 3). We also note that immunofluorescence in Fig 3. shows crlf1a mRNA and PCNA protein, whereas single cell RNA sequencing detects crlf1a and pcna transcripts. In this context, it is possible that crlf1a-, PCNAdouble positive cells detected at 4 dpl are still positive for the PCNA protein, but no longer express the pcna transcript. Double in situ hybridization for pcna and crlf1a would be needed to fully address whether crlf1a-positive cells are still pcna-positive at 4 dpl. It is also possible that crlf1a-, GFAP-double positive, PCNA-negative Müller glia are fewer and only masked in the crowd of crlf1a-, PCNA-double positive, GFAP-negative progenitors at 4 dpl (Raymond et al., 2006). We amended the discussion section with this information (lines: 634-654).

      6) scRNAseq cluster 3 is a proliferating population that is assigned "reactive Muller glia", whereas cluster 5 is assigned Muller glia/progenitor and in the Discussion referred to as MG about to go or already underwent asymmetric division to generate a progenitor (lines 568-571). I don't understand why cluster 3 is not referred to as the one harboring reactive MG/progenitors that underwent or are undergoing asymmetric cell division - The timing is right, as are the markers.

      We would like to refer the reviewer to the discussion in point 4, including the changes we introduced in the Materials and Methods (Lines 925-934). As mentioned above, we do not agree that PCNA alone represents an exclusive marker of progenitors, but is rather a marker of cells undergoing proliferation. Moreover, we note that Müller glia first and only division occurs between 31 and 48 hpl. Finally, as mentioned above, expression of stm is a unique marker for cluster 3, which is only evident at 44 hpl, but not of cluster 5, which is evident at 4 dpl.

      It seems cluster 5 might better fit the amplifying progenitor stage where some MG markers are retained but diluted by cell division. Please clarify the reasoning behind the labeling of this cluster. It is not clear why this cluster has to contain self-renewed Muller glia - why wouldn't these Muller cells partition to quiescent MG clusters 1 and 2 or reactive Muller glia in clusters 3 and 4?

      We partially agree with the reviewer that cluster 5 might better fit the amplifying progenitor state, and this is why we indicate this cluster as a “crossroad in the trajectory” in the discussion (lines: 613-631). However, we cannot entirely exclude that cells in cluster 5 contain selfrenewed Müller glia (differential gene expression analysis highlights glial markers too, see Figure 3A, supplementary file 6). Cells that we interpret as self-renewing Müller glia do not partition back to quiescent Müller glia (cluster 1 and 2) because they are on the way to be quiescent Müller glia again, yet they did not reach that point, maybe due to sampling reasons. Unfortunately, our short-term lineage tracing strategy ceases at 6 dpl. We also speculate in the discussion (lines: 679-682) that if we had sampled at later time points (e.g. at 14 dpl), we might have been able to detect the density of the cells in the glial area moving back to clusters 1 or 2 (cell density plots, Figure 2B).

      I also have trouble understanding cluster 4's assignment. The Discussion states it represents cells at the crossroad of glial and neurogenic trajectory containing self-renewed Muller glia as well as first-born MG-derived progenitors. However, it is populated by cells after 44 hpl (Fig. 2B) which is when reactive Muller glia are detected and lacks proliferative markers.

      We think that there is a misunderstanding here. We never refer to cluster 4 as a crossroad in the glial and neurogenic trajectory. We state that cluster 5 is actually the crossroad between the two trajectories (line 629). We further propose that self-renewed MG close the cycle via late reactive MG (cluster 4) and return into non-reactive Müller glia (clusters 1 and 2, red, dashed line in Figure 10) (now described in lines 631-633). The cell density plots support the direction of the cycle closing towards non-reactive Müller glia, in particular at 4 and 6 dpl (Figure 2B).

      Might cluster 4 represent a population of reactive MG remaining at 4 dpl that never entered the cell cycle and therefore would be devoid of Muller glia-derived progenitors?

      As stated in the manuscript, we actually think that marker expression as well as the cell density plots support our assignment of cluster 4 to represent self-renewed Müller glia closing the cycle to non-reactive Müller glia. Our assignment also fits well with the expected events following asymmetric cell division. However, as we cannot rule out the reviewer´s entire idea, we included the suggestion in the updated discussion (lines 651-654).

      7) Results, lines 163-164; Please provide a reference for "..... consistent with the previously described....."

      We thank the reviewer for this observation and we added the appropriate references (Fimbel et al., 2007; Lenkowski and Raymond, 2014; Thummel et al., 2008) in the updated version of the manuscript (lines: 171-172).

      Reviewer #2 (Recommendations For The Authors):

      Overall, this very thorough study provides interesting and unexpected results. The published data set will be useful for many subsequent studies. I have only a few remarks that the authors may consider discussing. Their cluster analysis revealed most of the expected cell clusters with some interesting surprises. One relates to photoreceptors where the authors describe well-separated clusters for red and green cones, while rods, UV and blue cones do not form clusters. For rods, this is discussed, but I miss a brief discussion on the "missing" cone subtypes.

      We thank the reviewer for the insightful comments. It is correct that we indeed detect only red and blue cones, as indicated by their expression of red-sensitive opsin gene (opn1lw2) and the blue-sensitive opsin gene (opn1sw2), respectively. It is possible that missing cone subtypes are born later than 6 dpl. As the reviewer suggested, we amended the discussion and added information about the missing cone subtypes (lines: 724-726).

      I am also intrigued by the two, quite separated amacrine cell clusters, while bipolar cells cluster in one cluster, without separation in (say) ON and OFF bipolar cells. This may also merit a discussion. What are their ideas on the small and quite separated amacrine cell cluster (cluster 14).

      Bipolar cells in cluster 15 are very sparse in our dataset, with only 40 cells in total. Hence, the sample size might be too small to be separated into ON and OFF subtypes. Alternatively, cells might be still immature, as we use 6 dpl as our latest sampled time point. Concerning cells in cluster 14, we think they are starburst amacrine cells, as indicated by their simultaneous expression of gad1b and chata (Figure 8-figure supplement 2B), which is a characteristic feature of starburst amacrine cells in mouse (O´Malley et al., 1992). We added this observation in the discussion (lines: 706-712).

    1. Author Response

      The following is the authors’ response to the original reviews.

      The Authors wish to thank the Reviewers for their detailed and insightful comments. By properly addressing these critiques, we sincerely believe our finished product will be substantially improved and provide greater insight to the academic community.

      Both Reviewers noted the importance of identifying the limitations of our study with particular emphasis on embedded implant heating due to switching gradient coils. Understanding the limitations of any model and/or simulation process is critical when adopting its use, especially when estimating the safety of embedded devices. For this reason, we have included the following text and corresponding references in our Discussion section:

      While the workflow presented herein establishes a validated approach to estimate RF heating due to the presence of a passive implant within a human subject undergoing an MR procedure, certain limitations and proper use stipulations of this methodology should be identified. These include:

      1) The approach of embedding a given passive implant must be carefully considered and supervised by an orthopaedic subject matter expert, preferably an orthopaedic surgeon. While the procedures described above focus on insertion and registration of an implant to make it numerically suitable for simulation, relevant anatomic and physiological considerations must also be addressed to ensure a physically realistic and appropriate result. This will enable a proper simulated fit and no empty spaces or unintended tissue deformations.

      2) Temperature changes presented are due only to RF energy deposition. The results do not take into account the impact of low-frequency induction heating of metallic implants naturally caused by the switching gradient coils. Important work on this subject matter has recently been reported in [21],[22],[23],[24],[25],[26],[27]. Unless an orthopaedic implant has a loop path, heating due to gradient fields is typically less than heating due to RF energy deposition. The present testbed would be applicable to the induction heating of implants (and the expected temperature rise of nearby tissues), after switching from Ansys HFSS (the full wave electromagnetic FEM solver) to Ansys Maxwell (the eddy current FEM solver). Two examples of this kind have already been considered in [25],[45].

      3) The procedures presented in this work have been based on the response of a single human model of advanced age and high morbidity.

      4) Finally, validation was achieved using available published data [42]-[44] and relies upon the legitimacy and veracity of that data. Coil geometry, power settings, and other relevant parameters were taken explicitly from these sources and modeled to enable a faithful comparison.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Comments on the original submission:

      Trypanosoma brucei undergoes antigenic variation to evade the mammalian host's immune response. To achieve this, T. brucei regularly expresses different VSGs as its major surface antigen. VSG expression sites are exclusively subtelomeric, and VSG transcription by RNA polymerase I is strictly monoallelic. It has been shown that T. brucei RAP1, a telomeric protein, and the phosphoinositol pathway are essential for VSG monoallelic expression. In previous studies, Cestari et al. (ref. 24) has shown that PIP5pase interacts with RAP1 and that RAP1 binds PI(3,4,5)P3. RNAseq and ChIPseq analyses have been performed previously in PIP5pase conditional knockout cells, too (ref. 24). In the current study, Touray et al. did similar analyses except that catalytic dead PIP5pase mutant was used and the DNA and PI(3,4,5)P3 binding activities of RAP1 fragments were examined. Specifically, the authors examined the transcriptome profile and did RAP1 ChIPseq in PIP5pase catalytic dead mutant. The authors also expressed several C-terminal His6-tagged RAP1 recombinant proteins (full-length, aa1300, aa301-560, and aa 561-855). These fragments' DNA binding activities were examined by EMSA analysis and their phosphoinositides binding activities were examined by affinity pulldown of biotin-conjugated phosphoinositides. As a result, the authors confirmed that VSG silencing (both BES-linked and MES-linked VSGs) depends on PIP5pase catalytic activity, but the overall knowledge improvement is incremental. The most convincing data come from the phosphoinositide binding assay as it clearly shows that N-terminus of RAP1 binds PI(3,4,5)P3 but not PI(4,5)P2, although this is only assayed in vitro, while the in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al (ref. 24) already. Considering that many phosphoinositides exert their regulatory role by modulate the subcellular localization of their bound proteins, it is reasonable to hypothesize that binding to PI(3,4,5)P3 can remove RAP1 from the chromatin. However, no convincing data have been shown to support the author's hypothesis that this regulation is through an "allosteric switch".

      Comments on revised manuscript:

      In this revised manuscript, Touray et al. have responded to reviewers' comments with some revisions satisfactorily. However, the authors still haven't addressed some key scientific rigor issues, which are listed below:

      1) It is critical to clearly state whether the observations are made for the endogenous WT protein or the tagged protein. It is good that the authors currently clearly indicate the results observed in vivo are for the RAP1-HA protein. However, this is not as clearly stated for in vitro EMSA analyses. In addition, in discussion, the authors simply assumed that the c-terminally tagged RAP1 behaves the same as WT RAP1 and all conclusions were made about WT RAP1.

      There are two choices here. The authors can validate that RAP1-HA still retains RAP1's essential function as a sole allele in T. brucei cells (as was recommended previously). Indeed, HA-tagged RAP1 has been studied before, but it is the N-terminally HA-tagged RAP1 that has been shown to retain its essential functions. Adding the HA tag to the C-terminus of RAP1 may well cause certain defects to RAP1. For example, N-terminally HA-tagged TERT does not complement the telomere shortening phenotype in TERT null T. brucei cells, while C-terminally GFP-tagged TERT does, indicating that HA-TERT is not fully functional while TERT-GFP likely has its essential functions (Dreesen, RU thesis). Although RAP1-HA behaves similar to WT RAP1 in many ways, it is still not fully validated that this protein retains essential functions of RAP1. By the way, it has been published that cells lacking one allele of RAP1 behave as WT cells for cell growth and VSG silencing (Yang et al. 2009, Cell; Afrin et al. 2020, mSphere). In addition, although RAP1 may interact with TRF weakly, the interaction is direct, as shown in yeast 2-hybrid analysis in (Yang et al. 2009, Cell).

      Alternatively, if the authors do not wish to validate the functionality of RAP1-HA, they need to add one paragraph at the beginning of the discussion to clearly state that RAP1-HA may not behave exactly as WT RAP1. This is important for readers to better interpret the results. In addition, the authors need to tune down the current conclusions dramatically, as all described observations are made on RAP1-HA but not the WT RAP1.

      The results with RAP1-HA are consistent with previous knowledge of RAP1 interactions with telomeric proteins and DNA. Hence, the C-terminal HA-tagged RAP1 seems, by all measures, functional. Nevertheless, to make it clear for the reader, we added a note in the discussion, lines 244-246: “Although we showed that C-terminal HA-tagged RAP1 protein has telomeric localization (Cestari et al. 2015, PNAS) and interactions with other telomeric proteins (Cestari et al. 2019 Mol Cell Biol); we cannot rule out potential differences between HA-tagged and non tagged RAP1.”

      For a similar reason, the current EMSA results truly reflect how C-terminally His6-tagged RAP1 and RAP1 fragments behave. If the authors choose not to remove the His6 tag, it is essential that they use "RAP1-His6" to refer to these recombinant proteins. It is also essential for the authors to clearly state in the discussion that the tagged RAP1 fragments bind DNA, but the current data do not reveal whether WT RAP1 binds DNA. In addition, the authors incorrectly stated that "disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166). In ref 29, deletion of Myb domain did not abolish RAP1-telomere association. However, point mutations in MybL domain that abolish RAP1's DNA binding activities clearly disrupted RAP1's association with the telomere chromatin. Therefore, the current observation is not completely consistent with that published in ref 29.

      We stated in line 149-150 “…we expressed and purified from E. coli recombinant 6xHistagged T. brucei RAP1 (rRAP1)”. To clarify to the authors, we replaced rRAP1 with rRAP1-His throughout the manuscript and figures. As for the statement that “disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166).”. We removed the statement from the manuscript.

      2) There is no evidence, in vitro or in vivo, that binding PI(3,4,5)P3 to RAP1 causes conformational change in RAP1. The BRCT domain of RAP1 is known for its ability to homodimerize (Afrin et al. 2020, mSphere). It is therefore possible that binding of PI(3,4,5)P3 to RAP1 simply disrupts its homodimerization function. The authors clearly have extrapolated their conclusions based on available data. It is therefore important to revise the discussion and make appropriate statements.

      We did not state that PI(3,4,5)P3 causes RAP1 conformational changes. We discussed the possibility. We stated in lines 199-201: “PI(3,4,5)P3 inhibition of RAP1-DNA binding might be due to its association with RAP1 N-terminus causing conformational changes that affect Myb and MybL domains association with DNA.” This is a reasonable discussion, given the data presented in the manuscript.

      Reviewer #2 (Public Review):

      In this manuscript, Touray et al investigate the mechanisms by which PIP5Pase and RAP1 control VSG expression in T. brucei and demonstrate an important role for this enzyme in a signalling pathway that likely plays a role in antigenic variation in T. brucei. While these data do not definitively show a role for this pathway in antigenic variation, the data are critical for establishing this pathway as a potential way the parasite could control antigenic variation and thus represent a fundamental discovery.

      The methods used in the study are generally well-controlled. The authors provide evidence that RAP1 binds to PI(3,4,5)P3 through its N-terminus and that this binding regulates RAP1 binding to VSG expression sites, which in turn regulates VSG silencing. Overall their results support the conclusions made in the manuscript. Readers should take into consideration that the epitope tags on RAP1 could alter its function, however.

      There are a few small caveats that are worth noting. First, the analysis of VSG derepression and switching in Figure 1 relies on a genome which does not contain minichromosomal (MC) VSG sequences. This means that MC VSGs could theoretically be mis-assigned as coming from another genomic location in the absence of an MC reference. As the origin of the VSGs in these clones isn't a major point in the paper, I do not think this is a major concern, but I would not over-interpret the particular details of switching outcomes in these experiments.

      We agree with the reviewer and thus made no speculations on minichromosomes. The data analysis must rely on the available genome, and the reference genome used is well-assembled with PacBio sequences and Hi-C data (Muller et al. 2018, Nature).

      Another aspect of this work that is perhaps important, but not discussed much by the authors, is the fact that signalling is extremely poorly understood in T. brucei. In Figure 1B, the RNA-seq data show many genes upregulated after expression of the Mut PIP5Pase (not just VSGs). The authors rightly avoid claiming that this pathway is exclusive to VSGs, but I wonder if these data could provide insight into the other biological processes that might be controlled by this signaling pathway in T. brucei.

      We published that the inositol phosphate pathway also plays a role in T. brucei development (Cestari et al. 2018, Mol Biol Cell; reviewed by Cestari I 2020, PLOS Pathogens)

      Overall, this is an excellent study which represents an important step forward in understanding how antigenic variation is controlled in T. brucei. The possibility that this process could be controlled via a signalling pathway has been speculated for a long time, and this study provides the first mechanistic evidence for that possibility.

      Reviewer #1 (Recommendations For The Authors):

      Please see the public review for recommendations.1. It is critical to clearly state whether the observations are made for the endogenous WT protein or the tagged protein. It is good that the authors currently clearly indicate the results observed in vivo are for the RAP1-HA protein. However, this is not as clearly stated for in vitro EMSA analyses. In addition, in discussion, the authors simply assumed that the c-terminally tagged RAP1 behaves the same as WT RAP1 and all conclusions were made about WT RAP1.

      There are two choices here. The authors can validate that RAP1-HA still retains RAP1's essential function as a sole allele in T. brucei cells (as was recommended previously). Indeed, HA-tagged RAP1 has been studied before, but it is the N-terminally HA-tagged RAP1 that has been shown to retain its essential functions. Adding the HA tag to the C-terminus of RAP1 may well cause certain defects to RAP1. For example, N-terminally HA-tagged TERT does not complement the telomere shortening phenotype in TERT null T. brucei cells, while C-terminally GFP-tagged TERT does, indicating that HA-TERT is not fully functional while TERT-GFP likely has its essential functions (Dreesen, RU thesis). Although RAP1-HA behaves similar to WT RAP1 in many ways, it is still not fully validated that this protein retains essential functions of RAP1. By the way, it has been published that cells lacking one allele of RAP1 behaves as WT cells for cell growth and VSG silencing (Yang et al. 2009, Cell; Afrin et al. 2020, mSphere). In addition, although RAP1 may interact with TRF weakly, the interaction is direct, as shown in yeast 2-hybrid analysis in (Yang et al. 2009, Cell).

      Alternatively, if the authors do not wish to validate the functionality of RAP1-HA, they need to add one paragraph at the beginning of the discussion to clearly state that RAP1-HA may not behave exactly as WT RAP1. This is important for readers to better interpret the results. In addition, the authors need to tune down the current conclusions dramatically, as all described observations are made on RAP1-HA but not the WT RAP1.

      The results with RAP1-HA are consistent with previous knowledge of RAP1 interactions with telomeric proteins and DNA. Hence, the C-terminal HA-tagged RAP1 seems, by all measures, functional. Nevertheless, to make it clear for the reader, we added a note in the discussion, lines 244-246: “Although we showed that C-terminal HA-tagged RAP1 protein has telomeric localization (Cestari et al. 2015, PNAS) and interactions with other telomeric proteins (Cestari et al. 2019 Mol Cell Biol); we cannot rule out potential differences between HA-tagged and non tagged RAP1.”

      For a similar reason, the current EMSA results truly reflect how C-terminally His6-tagged RAP1 and RAP1 fragments behave. If the authors choose not to remove the His6 tag, it is essential that they use "RAP1-His6" to refer to these recombinant proteins. It is also essential for the authors to clearly state in the discussion that the tagged RAP1 fragments bind DNA, but the current data do not reveal whether WT RAP1 binds DNA. In addition, the authors incorrectly stated that "disruption of the MybL domain sequence did not eliminate RAP1-telomere binding in vivo" (lines 165-166). In ref 29, deletion of Myb domain did not abolish RAP1-telomere association. However, point mutations in MybL domain that abolish RAP1's DNA binding activities clearly disrupted RAP1's association with the telomere chromatin. Therefore, the current observation is not completely consistent with that published in ref 29.

      We stated in lines 149-150 “…we expressed and purified from E. coli recombinant 6xHistagged T. brucei RAP1 (rRAP1)”. To clarify to the authors, we replaced rRAP1 with rRAP1-His throughout the manuscript text. As for the statement that “disruption of the MybL domain sequence did not eliminate RAP1telomere binding in vivo" (lines 165-166).”. We removed the statement from the manuscript.

      2) There is no evidence, in vitro or in vivo, that binding PI(3,4,5)P3 to RAP1 causes conformational change in RAP1. The BRCT domain of RAP1 is known for its ability to homodimerize (Afrin et al. 2020, mSphere). It is therefore possible that binding of PI(3,4,5)P3 to RAP1 simply disrupts its homodimerization function. The authors clearly have extrapolated their conclusions based on available data. It is therefore important to revise the discussion and make appropriate statements.

      We did not state that PI(3,4,5)P3 causes RAP1 conformational changes. We discussed the possibility. We stated in lines 199-201: “PI(3,4,5)P3 inhibition of RAP1-DNA binding might be due to its association with RAP1 N-terminus causing conformational changes that affect Myb and MybL domains association with DNA.” This is a reasonable discussion, given the data presented in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the valuable and constructive review of our manuscript. The reviewers’ comments have helped us to improve the quality of the paper. Here we provide detailed responses to the reviewers’ comments and discuss the new experiments we performed.

      Reviewer #1

      Summary:

      In this study, the authors generate a Drosophila model to assess disease-linked allelic variants in the UBA5 gene. In humans, variants in UBA5 have been associated with DEE44, characterized by developmental delay, seizures, and encephalopathy. Here, the authors set out to characterize the relationship between 12 disease-linked variants in UBA5 using a variety of assays in their Drosophila Uba5 model. They first show that human UBA5 can substitute all essential functions of the Drosophila Uba5 ortholog, and then assess phenotypes in flies expressing the various disease variants. Using these assays, the authors classify the alleles into mild, intermediate, and severe loss-of-function alleles. Further, the authors establish several important in vitro assays to determine the impacts of the disease alleles on Uba5 stability and function. Together, they find a relatively close correlation between in vivo and in vitro relationships between Uba5 alleles and establish a new Drosophila model to probe the etiology of Uba5-related disorders.

      Strengths:

      Overall, this is a convincing and well-executed study. There is clearly a need to assess disease-associated allelic variants to better understand human disorders, particularly for rare diseases, and this humanized fly model of Uba5 is a powerful system to rapidly evaluate variants and relationships to various phenotypes. The manuscript is well written, and the experiments are appropriately controlled.

      Recommendations For The Authors:

      1) It would seem of value to determine what tissue(s) the essential function of Uba5 resides. The authors nicely detail the expression of Uba5 in a subset of neurons and glia, and indicate it is expressed in a variety of other tissues. Null mutants are embryonic lethal, suggesting an essential function. From the mouse study cited, it appears Uba5 functions early in development in the hematopoietic system. The authors can express their UAS-Uba5 rescue construct using a variety of tissue-specific Gal4 lines to determine whether the essential function of Uba5 is in the nervous system or other tissues, which would be of interest in understanding key functions of Uba5.

      We thank the reviewer for the suggestion. We tried to rescue the lethality of the Uba5 mutants by expressing human UBA5 reference protein in different tissues. We found that ubiquitous expression of UBA5 (da-GAL4 or act-GAL4) successfully rescues the lethality, however, expression of UBA5 in neurons (elav-GAL4), glia (repo-GAL4), or both neurons and glia does not. In addition, expression of UBA5 in fat body (SPARC-GAL4) or muscles (Mef2-GAL4) does not rescue the lethality either. These results suggest that Uba5 is required in multiple tissues in flies. These data are included in the revised manuscript.

      2). Do intermediate Uba5 alleles impact synaptic function or growth? The etiology of the disease is linked with epilepsy and neurodevelopmental disorders, and the interesting parallels the authors note between Uba5 and Para expression indicate perhaps shared roles in neurons that drive firing activity. Together, these lines of evidence may suggest the Uba5 alleles may have possible impacts on synaptic growth, morphology, and/or function. It would be of interest to examine the larval neuromuscular junction and assess NMJ growth, morphology, and perform some basic electrophysiology to determine if there are any functional defects.

      Following the reviewer’s suggestion, we tested the morphology of NMJs in the humanized flies. We did not observe any obvious changes in the number or size of the synaptic boutons caused by the Group II variants. Hence, we conclude that the Uba5 variants do not cause an obvious defect in synaptic growth. The results are included in the Figure S3.

      More generally, can the authors comment on the expression pattern of Uba5? One might consider something like Uba5 to be a "housekeeping" gene and expressed/required in most if not all cell types. From the data presented in Fig. 2, it appears expression is more sparse, perhaps, as the authors point out, because of roles in mature neurons that actively fire (like Para). Are neuronal targets of Uba5 known, which might suggest key pathways it modulates?

      We showed that Uba5 is broadly expressed in third instar larvae. FlyAtlas2 and FlyCellAtlas datasets show that Uba5 is broadly expressed but not in all the cells. In the larval CNS and adult brain, Uba5 is not expressed in all cells either. Hence, we cannot say Uba5 is a “housekeeping” gene. Regarding the neuronal targets of Uba5, we do not know which types of neurons express Uba5 and which pathways Uba5 modulates. This could be studied in the future.

      3) Does strong overexpression of the various Uba5 alleles in otherwise wild-type flies cause any phenotypes? This might support possible antimorphic/dominant negative functions of some of the variants. Is it plausible that any of the alleles could impact oligomerization of Uba5?

      We have not observed compromised viability or any obvious phenotype in flies overexpressing human reference UBA5 or UBA5 variants. So, our results do not support a dominant negative effect of any of the variants.

      To our knowledge, people do not have sufficient knowledge on UBA5 dimerization to speculate on whether some variants could play a dominant negative role. There is one variant, V260M, that lies at the dimer interface. We showed that the V260M variant biochemically affects ATP binding as well as UFM1 activation, but we do not have evidence to support that it causes dominant negative effects by affecting UBA5 dimerization.

      Minor points:

      1) Page 5 line 45: It seems a reference is missing about the temperature dependence of Gal4 activity.

      We apologize for the missing reference. We have incorporated a reference to PMID 25824290.

      2) It might be of interest to assay the various transgenic rescue alleles at a higher temperature (say 29C) in addition to the nice work looking at 18/25C survival. Perhaps some of the alleles display temperature sensitivity at low (18) and high (29) temperatures.

      We now include the survival rate data at 29C. The enzyme dead and severe LoF variants fail to rescue the lethality at 29C, while the mild (Group IA and IB) variants fully rescue. For the three Group II variants, the survival rate at 29C is higher than that at 25C and 18C. The results support the dosage sensitive effects of UBA5 overexpression, but do not support any variant to be temperature sensitive within this range.

      Reviewer #2

      Relative simplicity and genetic accessibility of the fly brain make it a premier model system for studying the function of genes linked to various diseases in humans. Here, Pan et al. show that human UBA5, whose mutations cause developmental and epileptic encephalopathy, can functionally replace the fly homolog Uba5. The authors then systematically express in flies the different versions of the gene carrying clinically relevant SNPs and perform extensive phenotypic characterization such as survival rate, developmental timing, lifespan, locomotor and seizure activity, as well as in vitro biochemical characterization (stability, ATP binding, UFM-1 activation) of the corresponding recombinant proteins. The biochemical effects are well predicted by (or at least consistent with) the location of affected amino acids in the previously described Uba5 protein structure. Most strikingly, the severity of biochemical defects appears to closely track the severity of phenotypic defects observed in vivo in flies. While the paper does not provide many novel insights into the function of Uba5, it convincingly establishes the fly nervous system as a powerful model for future mechanistic studies.

      One potential limitation is the design of the expression system in this work. Even though the authors state that "human cDNA is expressed under the control of the endogenous Uba5 enhancer and promoter", it is in fact the Gal4 gene that is expressed from the endogenous locus, meaning that the cDNA expression level would inevitably be amplified in comparison. The fact that different effects were observed when some experiments were performed at different temperatures (18 vs. 25) is also consistent with this. While I do not think this caveat weakens the conclusions of this paper, it may impact the interpretation of future experiments that use these tools, and thus should be clearly discussed in the paper. Especially considering the authors argue that most disease variants of UBA5 are partial loss-of-functions, the amplification effect could potentially mask the phenotypes of milder hypomorphic alleles. If the authors could also show that the T2A-Gal4 expression pattern in the brain matches well with that of endogenous RNA or protein (e.g. using HCR-FISH or antibody), it would help to alleviate this concern.

      We thank the reviewer for pointing out the issue.

      Regarding the humanization strategy we used in the study, we agree that this is a binary system which could induce overexpression of the target protein. However, as the reviewer also points out, this temperature sensitive system also enables us to flexibly adjust the expression level of the target protein (PMIDs 34113007, 35348658, 36206744), which is especially useful to study partial LoF variants. In our study we have successfully compared the relevant allelic strength of most of the variants.

      We agree with the reviewer that a masking effect may exist in our system due to its gene overexpression nature. However, we cannot conclude that this masking effect really affects the three Group IA variants in our tests. The three variants are mild LoF, which is supported by our biochemical assays. Individuals homozygous for one of the Group IA variants, p.A371T, do not have any obvious phenotype, which is also consistent with our findings in flies.

      Regarding the expression pattern of the T2A-GAL4, the Bellen lab has generated T2A-GAL4 lines for more than 3,000 genes. The expression pattern of many GAL4 lines faithfully reflect the expression pattern of the endogenous genes, which has been shown in our previous publications (PMIDs 25824290, 29565247, 31674908).

      Recommendations For The Authors:

      As related to the expression pattern comment in the public review, I think the authors could also take advantage of Fly Cell Atlas or other available scRNA-seq atlases of the fly brain to present a much more detailed description of the Uba5 expression profile with minimal additional effort. If the cells that express it share other features or genes (other than the para that the authors mention), this could lead to further insights about the gene's neuronal or glial functions.

      In response to the reviewer, we show the expression pattern of Uba5 documented in FlyCellAtlas and another adult brain single-cell RNA seq profile (PMID 29909982) in the revised manuscript.

      In addition, one of the mutants (assuming the same one) is referred to as Leu254Pro in some parts of the manuscript while in some other parts (including tables 1-2) it is Lys254Pro.

      We apologize for the mistakes. The variant should be Leu254Pro and we have made these corrections in the revised manuscript.

      Reviewer #3

      Summary:

      Variants in the UBA5 gene are associated with rare developmental and epileptic encephalopathy, DEE44. This research developed a system to assess in vivo and in vitro genotype-phenotype relationships between UBA5 allele series by humanized UBA5 fly models and biochemical activity assays. This study provides a basis for evaluating current and future individuals afflicted with this rare disease.

      Strengths:

      The authors developed a method to measure the enzymatic reaction activity of UBA5 mutants over time by applying the UbiReal method, which can monitor each reaction step of ubiquitination in real time using fluorescence polarization. They also classified fruit fly carrying humanized UBA5 variants into groups based on phenotype. They found a correlation between biochemical UBA5 activity and phenotype severity.

      Weaknesses:

      In the case of human DEE44, compound heterozygotes with both loss-of-function and hypomorphic forms (e.g., p.Ala371Thr, p.Asp389Gly, p.Asp389Tyr) may cause disease states. The presented models have failed to evaluate such cases.

      We agree with the reviewer that our current system has a limitation that it evaluates one variant at a time rather than any combination of variants. However, our biochemical data do show that the three Group IA variants are mild LoF variants rather than benign variants. One of these variants, p.A371T, does not cause any obvious phenotype in homozygous individuals, which is also consistent with our findings in flies. The modeling of variant combinations, especially the Group IA/Group III combinations could be carried out in future studies.

      Recommendations For The Authors:

      Figure 3G. Typo. "ContonS" should be replaced by "CantonS."

      We apologize for the spelling mistake. We correct the typo in the revised manuscript.

      Figure 5. The labels should be in uppercase instead of lowercase.

      We correct the panel labels in the revised manuscript.

      Figure 6A. Is the molecular weight of UBA5~UFM1 intermediate (99 kDa) in model Figure correct? In Fig. 6B, the molecular weight of UBA5~UFM1 intermediate seems to be 70-75 kDa.

      Both are correct. The molecular weight depicted in the schematic of Figure 6A is based on the UBA5 dimer, which dissociates in the SDS-PAGE gel shown in Figure 6B. We have reconfigured the schematic to make this more apparent.

      Figure. 6E, F, H, and I. The time points for quantification in these figures should be specified.

      We apologize for the confusion. The details on data quantification are now more thoroughly explained in the Methods.

    1. Author Response

      We thank the reviewers and editorial team for their positive and thoughtful comments and recommendations for our paper. We will provide a detailed point-to-point response accompanying a revised version of our paper to carefully incorporate all the recommendations and clarify several confusing points. Here we provide a brief provisional response to summarize the key points.

      1) Are the two factors in the enslavement patterns after stroke, changes in shape (loss of complexity) and magnitude (intrusion of flexor bias), dissociable? Our results show both a loss of shape (Fig. 5) and an increase of magnitude (Fig. 7) in enslavement patterns in the paretic hand. We agree with the reviewers that the key measures for these two factors, Angular (Cosine) and Euclidean Distances, are not mathematically orthogonal because, while Angular Distance is indeed only influenced by shape, Euclidean Distance is influenced by both magnitude and shape changes of the enslavement patterns. However, our LME results show that increased flexor bias in the paretic hand strongly predicts Euclidean Distance but not Angular Distance (Fig. 9), thereby suggesting that pattern shape change cannot be fully accounted for by flexor intrusion. This analysis was also recommended by Reviewer 1. In the revised version, we will further clarify the dissociation of the two components.

      2) Can biomechanical factors be ruled out from the enslavement patterns in the paretic hand? We agree with the reviewers that resting hand posture measures alone cannot fully assess biomechanical factors, given that biomechanical constraints during action and abnormal postures due to neural loss after stroke were not captured in these measures. In the paper, however, we used three analyses to justify this point. In the first analysis, we showed that resting hand posture (Mount Distance and Mount Angle) could not account for the Biases in all groups (healthy, paretic, non-paretic). In the second analysis, we showed that resting hand posture could not account for Enslavement in all groups. In the third analysis, we showed that Biases in the non-paretic hand could not predict Biases or Enslavement in the paretic hand within the same patients. The third analysis was done based on the existing literature that secondary biomechanical change after stroke was likely not the major contributor in the hand impairment, where passive muscle stimulation could successfully evoke a similar level of fingertip forces in both stroke and control hands (Hoffmann et al. 2016) and median nerve stimulation could significantly reduce intrusion of finger flexion (Kamper et al. 2003). The resting hand posture and non-paretic hand biases would include both biomechanical and neural factors, but since none of these measures could predict enslaving patterns, we maintain that biomechanical factors would not be a contribution to the enslavement in the paretic hand.

      3) Neural correlates of behavioral changes were not tested, therefore claims such as "low-level," "subcortical," and "top-down cortical" contributions are not fully justified. We agree with the reviewers, and we will clear references to these neural correlates from the text of the Results section in the revised version of the paper. These neural correlates will only be discussed in the Discussion section.

      4) RDM construction for "by-Target Direction" was not clearly explained. We agree with the reviewer that the diagram in Fig. 4D was a little confusing. To construct these matrices, we analyzed differences in coactivation patterns of the non-instructed fingers when two fingers move in the same target direction. A cleaner pattern comparison should exclude both the two instructed fingers to be compared from the enslavement matrices. This will be clarified in the revised version.

      References

      Hoffmann, Gilles, Megan O. Conrad, Dan Qiu, and Derek G. Kamper. 2016. “Contributions of Voluntary Activation Deficits to Hand Weakness after Stroke.” Topics in Stroke Rehabilitation 23 (6): 384–92. https://doi.org/10.1179/1945511915Y.0000000023.

      Kamper, D G, R L Harvey, S Suresh, and W Z Rymer. 2003. “Relative Contributions of Neural Mechanisms versus Muscle Mechanics in Promoting Finger Extension Deficits Following Stroke.” Muscle & Nerve 28 (3): 309–18. https://doi.org/10.1002/mus.10443.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Editorial comments:

      Comment 1 - Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      We appreciate the feedback from the 3 Reviewers and Editor. We have enumerated each Reviewer comment and provide a detailed response. We endeavoured to include each suggestion into the revised manuscript. All changes in the manuscript are indicated in red font. In instances in which we respectfully disagree with the Reviewer, we have provided a fair rebuttal. We feel the comments from the Reviewers has significantly improved the clarity and quality of the manuscript.

      Comment 2 - The revision process has demonstrated the value of your work, highlighting both its strengths and shortcomings. Importantly, it provides detailed and achievable suggestions for improving the current version of your contribution.

      We thank the Reviewers and Editor for their time and expert input on our manuscript. We feel the suggestions from the Reviewers to address the shortcomings has resulted in a significantly improved manuscript.

      Comment 3 - There is a general consensus among the reviewers on three key aspects. Firstly, the article would greatly benefit from a clearer layout of the experimental design and methodology, potentially including schematics to help readers comprehend the complexity and details of the study.

      We appreciate the feedback from Reviewer 2 in particular. We have added a new schematic for Experiment 3 (see PUBLIC REVIEWS Reviewer #2 Comment 2). We have also revised the Results section by including subheadings and additional text to help explain the methods.

      Comment 4 - Secondly, conducting a more comprehensive analysis of the available dataset, utilizing tools such as WGCNA to explore gene co-expression networks beyond specific genes, is recommended. Additionally, it is advised to exercise greater caution when discussing the limitations of the employed methods.

      The suggestion for the WGCNA is excellent and very much appreciated. The revised manuscript includes WGCNA for both the MBH and pituitary gland. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 5 - Thirdly, expanding the results section to create a more engaging narrative that guides readers through the numerous findings, and extending the discussion and conclusions to emphasize the ecological relevance of learning photoperiodic/seasonal responses and highlighting the presented model, would be valuable.

      These were excellent suggestions that significantly improved the clarity and quality of the manuscript. The results section included several subheadings to help break up of the transitions across experiments. We have also significantly revised the introduction and discussion to include the ecological relevance and importance to consider sex as a factor in the interpretations.

      Comment 6 - Finally, please pay close attention to the comment on the statistical analysis provided by Rev#2.

      It is unclear why the Benjamini-Hochberg’s FDR analyses was suggested. The statistical test is a version of the Bonferroni test but is less stringent. We prefer to use conservative tests (i.e., Bonferroni correction). Moreover, the Bonferroni correction is the commonly used statistical tests in the field. To be consistent with the field and to be careful in our statistical approach, the revised manuscript did not change the post-hoc correction.

      PUBLIC REVIEWS:

      Reviewer #1:

      Comment 1 - The authors investigated the molecular correlates in potential neural centers in the Japanese quail brain associated with photoperiod-induced life-history states. The authors simulated photoperiod to attain winter and summer-like physiology and samples of neural tissues at spring, and autumn life-history states, daily rhythms in transcripts in solstices and equinox, and lastly studies FSHb transcripts in the pituitary. The experiments are based on a series of changes in photoperiod and gave some interesting results. The experiment did not have a control for no change in photoperiod so it seems possible that endogenous rhythms could be another aspect of seasonal rhythms that lack in this study. The short-day group does not explain the endogenous seasonal response.

      We thank the Reviewer for the fair assessment of the manuscript. The statement ‘the experiment did not have a control for no change in photoperiod’ is not clear to us. We think the Reviewer is arguing that prolonged constant photoperiod was not conducted to examine circannual timing in avian reproduction. The constant short photoperiod in Exp3 does provide the ability to examine the initial stages of interval timing. A different endogenous mechanism used by animals. The revised manuscript has clarified the different physiological responses.

      Comment 2 - The manuscript would benefit from further clarity in synthesizing different sections. Additionally, there are some instances of unclear language and numerous typos throughout the manuscript. A thorough revision is recommended, including addressing sentence structure for improved clarity, reframing sentences where necessary, correcting typos, conducting a grammar check, and enhancing overall writing clarity.

      We have incorporated the suggestions from both Reviewer 1 and Reviewer 2 that aimed to increase the clarity of the manuscript. We have provided detailed responses to each comment below and state how each comment was incorporated in the revised manuscript. We also had the manuscript reviewed by a colleague to help identify issues associated with sentence structure, grammar, and spelling.

      Comment 3 - Data analysis needs more clarity particularly how transcriptome data explains different physiological measures across seasonal life-history states. It seems the discussion is built around a few genes that have been studied in other published literature on quail seasonal response. Extending results on the promotor of DEGs and building discussion is an extrapolating discussion on limited evidence and seems redundant.

      A new statistical analysis (ie., WGCNA) was conducted to identify relations between photoperiod, physiology and transcripts. The focus on the few photoperiodic gene was kept in the discussion as the transcript expression is important to highlight the differences from the prevailing hypotheses and novel patterns of expression across seasonal timescales. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 4 - Last, I wondered if it would be possible to add an ecological context for the frequent change in the photoperiod schedule and not take account of the endogenous annual response. Adding discussion on ecological relevance would make more sense.

      This is an excellent suggestion. The introduction and discussion were substantially revised to include the ecological relevance.

      Reviewer #2:

      Comment 1 - This study is carefully designed and well executed, including a comprehensive suite of endpoint measures and large sample sizes that give confidence in the results. I have a few general comments and suggestions that the authors might find helpful.

      We appreciate the Reviewers support for our manuscript. We have endeavoured to incorporate all suggestions in the revised manuscript.

      Comment 2 - I found it difficult to fully grasp the experimental design, including the length of light treatment in the three different experiments (which appears to extend from 2 weeks up to 8 weeks). A graphical description of the experimental design along a timeline would be very helpful to the reader. I suggest adding the respective sample sizes to such a graphic, because this information is currently also difficult to keep track of.

      We have created a new figure panel to address the Reviewer’s concern. See figure S4 panel ‘a’. The new schematic representation was designed to illustrate the similarity in experimental design used in Experiment 1 and Experiment 2. But clearly illustrates the extended short photoperiod manipulation (4 weeks and not 8 weeks). We added the sample sizes to initial drafts but felt the added text hindered the clarity of the schematic representation (particularly for Fig1a). The sample sizes for each experiment and treatment are provided in the raw data provided in the supplementary Table 1. For this reason, we have opted to not add the sample size to each diagram. We hope that the Reviewer will understand our perspective.

      Comment 3 - The authors use a lot of terminology that is second nature to a chronobiologist but may be difficult for the general reader to keep track of. For example, what is the difference between "photoinducibility" and "photosensitivity"? Similarly, "vernal" and "autumnal" should be briefly explained at the outset, or maybe simply say "spring equinox" and "fall equinox."

      This is a very helpful suggestion, and we thank the Reviewer. Two changes were made to the manuscript to address this comment. First, we revised the second introductory paragraph to describe the photoperiodic response and the terms used. Second, we have removed all reference to ‘vernal’ and replaced with ‘spring’. We opted to keep ‘autumn’ as the change to ‘fall’ did not provide the clarity of seasonal state in some statements (as fall is also used as a downward direction).

      Comment 4 What was the rationale for using only male birds in this study? The authors may want to include a brief discussion on whether the expected results for females might be similar to or different from what they found in males, and why.

      We agree with the Reviewer’s position that studies should include, or least describe, male and female biology. We have revised the text to address this comment. In the methods, we provide 2 sentences that state the photoperiodic response is the same for both male and females, and why males were selected. See lines (352-355). Then, in the discussion, we describe why females will be important to study how other supplementary environmental cues impact seasonal timing of reproduction. See lines (312-330; and 334-339).

      Comment 5 - The authors used the Bonferroni correction method to account for multiple hypothesis testing of measures of testes mass, body mass, fat score, vimentin immunoreactivity and qPCR analyses in Study 1. I don't think Bonferroni is ever appropriate for biological data: these methods assume that all variables are independent of each other, an assumption that is almost never warranted in biology. In fact, the data show clear relationships between these endpoint measures. Alternatively, one might use Benjamini-Hochberg's FDR correction or various methods for calculating the corrected alpha level.

      This concern is not clear to us. The Benjamini-Hochberg’s FDR is a slight modification of the Bonferroni correction. Moreover, the FDR is a less-stringent statistical test compared to the Bonferroni correction. We prefer to keep the Bonferroni approach to correct for multiple tests for two reasons. First, this test is commonly used in the field of chronobiology, and second, the Bonferroni correction is more conservative. We hope the Reviewer will appreciate our perspective to be consistent with the research field and higher stringency in our statistical approach.

      Comment 6 - The graphical interpretations of the results shown in Figure 1n and Figure 3e, along with the hypothesized working model shown in Figure S5, might best be combined into a single figure that becomes part of the Discussion. As is, I do not think these interpretative graphics (which are well done and super helpful!) are appropriate for the Results section.

      We appreciate the Reviewer’s suggestion. During the revision we developed a single figure to show the graphical representation for the respective experiments. Unfortunately, we found the single source to be very difficult to provide a clear description and overview of the findings. We feel that the interpretations, (admittedly unusual for Results section) are best placed in the respective figures that correspond to the different experiments.

      Reviewer #3:

      Comment 1a - It is well known that as seasonal day length increases, molecular cascades in the brain are triggered to ready an individual for reproduction. Some of these changes, however, can begin to occur before the day length threshold is reached, suggesting that short days similarly have the capacity to alter aspects of phenotype. This study seeks to understand the mechanisms by which short days can accomplish this task, which is an interesting and important question in the field of organismal biology and endocrinology.

      We thank the Reviewer for their positive feedback.

      Comment 1b - The set of studies that this manuscript presents is comprehensive and well-controlled. Many of the effects are also strong and thus offer tantalizing hints about the endo-molecular basis by which short days might stimulate major changes in body condition. Another strength is that the authors put together a compelling model for how different facets of an animal's reproductive state come "on line" as day length increases and spring approaches. In this way, I think the authors broadly fulfill their aims.

      We thank the Reviewer for the positive support of our research and manuscript.

      Comment 1c - I do, however, also think that there are a few weaknesses that the authors should consider, or that readers should consider when evaluating this manuscript. First, some of the molecular genetic analyses should be interpreted with greater caution. By bioinformatically showing that certain DNA motifs exist within a gene promoter (e.g., FSHbeta), one is not generating robust evidence that corresponding transcription factors actually regulate the expression of the gene in question. In fact, some may argue that this line of evidence only offers weak support for such a conclusion. I appreciate that actually running the laboratory experiments necessary to generate strong support for these types of conclusions is not trivial, and doing so may even be impossible. I would therefore suggest a clear admission of these limitations in the paper.

      We agree with the Reviewer’s position. The transcription binding protein analyses was used as a means to identify potential factors involved in the regulation of transcript expression. We have written a new paragraph to address this comment. In the discussion, we that highlight the links between the well characterised circadian regulation of photoperiodic transcripts (e.g, D- & E-box elements and the photoperiodic control of TSHβ. We also indicate that our bioinformatic approach identified potentially new transcription binding motifs, and provide a clear admission and state that functional analyses are required to determine necessity of these pathways (e.g., MEF2). See lines 293-295.

      Comment 2 - Second, I have another issue with the interpretation of data presented in Figure 3. The data show that FSHbeta increases in expression in the 8Lext group, suggesting that endogenous drivers likely act to increase the expression of this gene despite no change in day length. However, more robust effects are reported for FSHbeta expression in the 10v and 12v groups, even compared to the 8Lext group. Doesn't this suggest that both endogenous mechanisms and changes in day length work together to ramp up FSHbeta? The rest of the paper seemed to emphasize endogenous mechanisms and gloss over the fact that such mechanisms likely work additively with other factors. I felt like there was more nuance to these findings than the authors were getting into.

      We agree with the Reviewer and a similar concern was raised by Reviewer 1. Our aim was to highlight that FSH expression increased in constant short photoperiod. We have revised the manuscript to address the concern raised by the Reviewer. We have added 2 sentences in the results to highlight the additive role of endogenous timing and photoperiodic effects on FSH expression (see lines 223-226). We have kept the text that describes endogenous increases in expression (e.g., FSH/GnRH) in response to short photoperiod in the manuscript as this observation is not influenced by long photoperiod.

      Comment 3 - Third, studies 1 - 3 are well controlled; however, I'm left wondering how much of an effect the transitions in day length might have on the underlying molecular processes that mediate changes in body condition. While the changes in day length are themselves ecologically relevant, the transitions between day length states are not. How do we know, for example, that more gradual changes in day length that occur over long timespans do not produce different effects at the levels of the brain and body? This seemed especially relevant for study 3, where animals experience a rather sudden change in day length. I recognize that these experimental methods are well described in the literature, and they have been used by endocrinologists for a long time; nonetheless, I think questions remain.

      There are two points raised in this comment. First, the effect of transition in day length on body condition. We are investigating the impact of photoperiodic transitions on body condition. The ongoing project has examined the changes in tissue lipid content and conducted transcriptomic analyses of multiple peripheral tissues involved in energy balance. Although we made an initial attempt to combine all the findings into a single manuscript, the large datasets resulted in an overwhelming manuscript that lacked clarity. Instead, we have opted for two manuscripts that focus on the respective physiological systems. Those data should be published shortly. We did expand the discussion by developing a single paragraph that focused on the pattern of POMC expression and changes in quail body mass and adipose tissue. See lines 300-311.

      Second, the Reviewer raised the issue of more gradual changes in day length over longer timespans. The day length and duration of exposure selected was to replicate previously used photoperiod manipulations to ensure reproducibility in research programmes, and to reduce the impact of photoperiod history (see lines 367-369). The present manuscript is the first study in birds to examine multiple intervening (ie within the extreme long- and short-photoperiods) day length conditions and we feel this is a major and novel contribution to the field. We agree that other time points (e.g., 13L:11D), or quicker/longer timespans could provide additional insight into the molecular mechanisms that govern seasonal transitions in reproduction/energy balance. The question raised by the Reviewer requires the types of studies that use natural conditions from wild-caught animals (or semi-natural laboratory settings) and beyond the focus of the current manuscript.

      Recommendations For The Authors:

      Reviewer #1

      Comment 1 - Abstract: Overall abstract needs more clarity in rationale, hypothesis, and result outcomes. How this study advances our knowledge in seasonal/ photoperiodic regulation of reproduction in birds. Particularly what knowledge gap FSHb results fill in.

      We have substantially revised the abstract considering the Reviewer’s suggestions. The abstract has clarified the rationale, hypothesis and results outcomes. We have also added new introductory and concluding statements that place the work into a wider ecological context (as suggested below).

      Comment 2 - In general the introduction needs more clarity and doesn't seem to cover the ecological relevance of learning photoperiodic/seasonal response.

      We agree with the Reviewer the introduction could be improved. We have substantially revised the introduction with an aim to increase the clarity. This involved an addition on the ecological context, clarification of the photoperiodic states in birds, and a description of the general and specific objectives. Note we did not include an introduction to ‘learning’ of the photoperiodic response, as the term implies a cognitive component is involved which is incorrect. See lines (61-67, 71-74, 80-86, and 100-105).

      Comment 3 - Line 58: What does the author mean by "future seasonal environment" Is it to introduce change in climate or future seasonal events? This sentence needs rephrasing and more clarity.

      In response to Comment 2, we have revised the introductory paragraph and the sentence was removed from the text.

      Comment 4 - Line 63: I would recommend the use of circannual rhythms with caution for the kind of experiments authors have proposed. The approach used here is beyond the scope of addressing circannual endogenous rhythms, which can be tested only independent of photoperiod change.

      We agree with the Reviewer’s concern. The use of circannual rhythms was limited to the first paragraph (lines 56-63) only to introduce the concept of endogenous rhythmicity. We were careful to not use the term ‘circannual’ for the rest of the manuscript, as the Reviewer has indicated, would be inappropriate. We have retained the use of ‘endogenous program’ to refer to the molecular and physiological changes that can occur independent of photoperiod change (ie Experiment 3). In this case, the use of endogenous is appropriate as this form of timing adheres to an interval timer. We also provided a definition for interval timer and ecological examples to illustrate the difference between circannual rhythms and annual interval timer (see lines 71-74). We also reviewed the entire manuscript to ensure the distinction for the endogenous program was clear.

      Comment 5 - Another aspect authors missed is that Quail is not an absolute photorefractory (Robinson and Follett, 1982).

      We agree with the Reviewer that quail are not absolute photorefractory (but instead relative photorefractory). As our photoperiod manipulations do not address criterion 1, or criterion 2 of the avian photoperiodic response (MacDougall-Shackelton et al., 2009; see https://doi.org/10.1093/icb/icp048), we feel that adding the type of photorefractory response would be a distraction and reduce the clarity of the concepts/experimental design described in the manuscript.

      Comment 6 - Line 223-234: "Chicks were raised under constant light and constant heat lamp". Constant photoperiod experienced during development raises concern on how this pretreatment would shape the adult seasonal response, which could be different in the seasonal response of birds raised in natural photoperiod. If this is correct, the results shown are not tenable for birds inhabiting the natural environment.

      The light schedule used in our experiment is the most appropriate for laboratory reared chicks. The light schedule, use of an incubator and hatchery is commonly used in research laboratories. The procedure serves to increase the hatch rate and welfare of chicks. Undoubtedly there will be some early developmental programming effects on quail development. However, the gonadal response across all 3 experiments was consistent with the vast scientific literature on the avian photoperiodic response in both laboratory and wild birds. As the robust gonadal response clearly replicated previous studies, we are confident the results are tenable for birds inhabiting natural environments.

      Comment 7 - Numerous studies done in mammals suggest that photoperiod experienced in the early life stage affects the circadian and seasonal response in adults (Ciarleglio et al., 2011, Perinatal photoperiod imprints the circadian clock, Nat Neurosceince; Stetson M., et al., 1986, Maternal transfer of photoperiodic information influences the photoperiodic response of prepubertal Djungarian hamsters).

      We agree with the Reviewer that developmental programming in mammals is important for the photoperiodic response. However, there are vast differences between the avian and mammalian photoperiodic response. Critically, in mammals, the maternal transfer of information to the offspring is achieved via the melatonin hormone. Conversely, in birds, melatonin is not necessary, nor sufficient for photoperiodic time measurement (Juss et al., 1993; see https://doi.org/10.1098/rspb.1993.0121). It is not scientifically tenable to relate the mammalian and avian photoperiodic responses in adulthood based on early developmental programs. For this reason, we did not introduce or discuss developmental programming in our manuscript.

      Comment 8 - Please give details on the month in which these birds were exposed to different short and long photoperiods. It is not clear in the method section. The birds experience long to short day transition and then back to long day in 16 weeks (~ 4 months). The annual cycle is ~12 months long in nature. Again, what is the ecological relevance of such an experimental paradigm. This could give some idea on photoperiodic response, but not on how the endogenous annual cycle would respond.

      Birds were delivered in September 2019 and 2020. (We have added these details to the manuscript (see lines 351-352). We agree with the Reviewer that the ecological relevance of the experimental design is limited. Our focus was to use laboratory conditions and well characterised photoperiodic manipulations to examine the role of the environmental, initial predictive cue to time seasonal transitions in reproduction. The 2-week duration for each photoperiod state in Experiment 1 provides the ability to eliminate the impact of photoperiodic history (see lines 367-369; Stevenson et al., 2012a) and reduce the time necessary for the research project. As described above in Comment #4 – we did not examine the endogenous annual cycle – but instead focused on an endogenous interval timer. Experiment 3 was designed to best examine an endogenous interval timer.

      Comment 9 - Line 251: "A jugular blood sample" Please rephrase this sentence and add 50 ul heparinized tubes

      We thank the Reviewer for identifying this oversight. The text was changed accordingly.

      Comment 10 - Line 259: The scale.....fat pads" - The sentence doesn't read correctly.

      The sentence was revised accordingly.

      Comment 11 - Line 274: Male.....six weeks. It is not clear from this sentence; what photoperiod birds were exposed to before transferring to 2 long days. Is it 16 or 14 LD.

      The birds were held in 16L. The text has been revised accordingly.

      Comment 12 - Line 276: It is not clear what is Home Office approved schedule 1. This may be a commonly used term for animal sacrifice protocol in UK and Europe. But it is not familiar jargon for the rest of the globe.

      We apologise for the jargon. The text was revised to include the exact methods (decapitation followed by exsanguination).

      Comment 13 - Line 277-284: Birds under SD for 4 weeks (8 Lext) is a bit confusing and particularly in the context of studying endogenous rhythm. Needs more clarity.

      The text was revised to improve the clarity. The manuscript now states: ‘A subset of birds (n=6) was maintained in short day photoperiods for four more weeks (8Lext). This group of birds provided the ability to examine whether an endogenous increase in FSHβ expression would occur in constant short day photoperiod condition.’

      Comment 14 - Line 322-323: Give RIN number (RNA integrity number) here which is a very common parameter to determine RNA degradation in RNAseq experiments. I guess, the MiniON is a portable sequencer and sequences one sample at a time. If this is true authors should consider any batch effect in sequencing and use it as a covariate in the model.

      The RIN values from our extraction protocol reliably produce RIN values >9.0. The text now states: Isolated RNA reliably has RIN values >9.0 for both the mediobasal hypothalamus and pituitary gland. Our RIN values are well above the recommended 7.0 limit. The Reviewer is correct that MinION is portable, however, more than one sample can be run at a time. We stated in the text (lines 454-460) that birds were counterbalanced across Flow cells so that each sequencing run had 9 samples, one from each treatment group. Our counterbalancing approach and quality control steps prevented batch effects.

      Comment 15 - Line 397-398: Adding quail or chicken-specific vimentin peptide pre-incubation with primary Ab will serve more confirming control. Omitting primary Ab doesn't address cross-reactive/ nonspecific binding issues.

      We agree that a positive control (ie primary Ab) is the gold standard to support specificity of the antibody. Unfortunately, we have not found a supplier of the epitope for quail/chicken vimentin. We have conducted another in silico analysis an established that the sequences for the vimentin antibody is specific for vimentin. The next closest sequence alignment is only 68% for a protein that is not expressed in the brain. The immunoreactive pattern observed in our histology reproduces work from mammalian models in which the epitope is available. Therefore, we are confident that our immunoreactive signal for vimentin is specific. We have added the in silico analysis in the manuscript on lines 535-538.

      Comment 16 - Line 430: Was the GLM model used for testing all variables? Running a statistical model to explain Differentially expressed genes, photoperiod, and physiological variables together will give a more conclusive outcome to explain the photoperiod effect and seasonal state.

      A similar comment was raised by Reviewer 2. We have conducted a WGCNA analyses to examine the relationship between photoperiod, physiological variables and DEG. See Figures S3 & Table S6 and lines 166-182; 497-505).

      Comment 17 - It is a bit unclear why the author used cherry-picking approach by talking about only a few genes that have been studied as key regulators of photoperiodic response in quail. What was the purpose of transcriptome? A better approach would have been to use a model to reduce the data (PCA) and explain the physiological response by regression against different PCs.

      We agree with the Reviewer that other statistical approaches could be conducted, and other genes could be discussed. However, we focussed on the key regulators of the photoperiodic response in quail as these are the well characterised genes. It is important that our discussion focused on these transcripts as most do not conform to the predicted patterns of expression. We feel it is best that we keep the focus on these genes.

      Comment 18 - TSHb result is inconsistent with past studies, where TSHb is the first responder gene on photoinduction. The author did not pay attention to explaining it further in the discussion.

      We respectfully disagree with the Reviewer. Our results are consistent with past studies and show that TSHβ expression is a molecular marker of long day photoperiod. Our study does not examine photoinduction; which does not provide the ability to compare between our study and previous work (eg., Nakao et al., 2008; see doi: 10.1038/nature06738). We have revised the text in consideration of the concern raised by the Reviewer. The text now states ‘Previous reports established that TSHβ expression is significantly increased during the period of photoinducibility in quail (Nakao et al., 2008). Although the present study did not directly examine photoinduction, TSHβ expression was consistently elevated in long day photoperiod (i.e., 16L).’. (see lines 262-265).

      Comment 19 - PRL result seems interesting and there could be more discussion in relation to the rise in PRL transcripts levels termination of breeding. Elaborating on PRL expression and breeding termination can add more information to the discussion.

      This comment is not clear to us, and we would incorporate a clarified comment in a revised manuscript. The increased expression of prolactin does not occur during the termination of breeding. The increase in prolactin occurs during the vernal increase in photoperiod (ie 14L) but does not have a clear link with gonadal growth.

      Comment 20 - Line 217-219: Based......respectively. Sounds like a big claim with less evidence.

      We have removed the sentence from the discussion.

      Comment 21 - Line 220-223: The .....Bird. The sentence is not clear about how this study would add to ecological studies. Need more clarity on the importance of such data.

      The sentence was removed from the text.

      Comment 22 - I think that it would be helpful to add a couple of caveats to provide more ecological context. First, the model is only based on males, and responses in females could be different.

      We agree with the Reviewer there are undoubtedly sex differences in timing seasonal biology. However, the photoperiodic response (growth and regression) is similar in both males and females. Sex differences exist in response to supplementary environmental cues (e.g., temperature). Males were used in these studies as the gonadal response to changes in photoperiod manipulations are much larger compared to ovarian changes in females. The focus on males allows for fewer animals to be used in the experiments and greater statistical power. To address the Reviewers concern, we have added a paragraph in the discussion that describes the similarity in photoperiodic responses in males and females, and the importance of supplementary cues for full reproductive development in female birds. We also provide a couple sentences in the methods that describe the justification for only males in the present study. See lines (Methods 352-355; Discussion 312-330; and 334-339).

      Comment 23 - Last, I wondered if it would be possible to add an ecological context for the frequent change in the photoperiod schedule and not take account of the endogenous annual response. Would the procedure simulate a similar kind of underlined molecular response for a bird under natural conditions responding to changing daylight cycles on an annual time frame?

      The discussion was considerably revised to address the ecological relevance of the study, and findings. We have added a sentence at the beginning of the discussion to highlight that the laboratory-based approach and photoperiodic manipulations reliable replicate previous findings using semi-natural conditions (Robinson and Follett, 1982) (See lines 248-250). We have already reduced the focus on the endogenous annual response.

      Reviewer #2:

      Comment 1 - The writing is very terse and could benefit from a more narrating style, which would make it a lot easier for the reader to get through some of the very data-heavy text. Breaking up the Results with subheadings would also be helpful.

      We appreciate the suggestion to add subheadings to the Results. We added 3 descriptive headings for each other studies conducted in the manuscript. We feel the added revision (e.g., ecological) has improved the narrative and made the manuscript accessible to the wider readership.

      Comment 2 - The transcriptome analyses could be developed a bit more. First, using the limma package would allow the authors to apply a more complete model to the DEG analyses, which would likely be superior to EdgeR. Second, the authors may want to consider WGCNA or a similar approach to discover gene co-expression modules, and then examine whether any of the resulting module eigengenes co-vary with any morphological or physiological measures and/or vary rhythmically.

      This is an excellent suggestion, and the new analyses was incorporated into the revised manuscript. Using the Langfelder and Horvath 2008 WCGNA package we conducted module-trait analyses to examine co-variation in our findings. These data are presented in Figure S# and lines 476-484. We agree that other DEG analyses would be useful; our main objectives was to use BioDare2.0 to identify rhythmic transcription in the seasonal transcriptomes. EdgR provides an excellent approach to identify transcripts and commonly used.

      Comment 3 - In the Data and code availability statement (lines 226ff) the authors state that "all raw data are available in Extended data Table 1." However, they should be submitted to the GEO database or a similar public repository along with all relevant metadata. Also, and maybe I overlooked this, I did not see anywhere that the "R code used in Study 1 is freely available" (I was not sure what "the methods reference list" was supposed to refer to). Instead of stating that "the full R code used is available upon request" I suggest making all scripts available via GitHub or Dataverse, along with all non-omics data. The advantage of the latter platform is that a citable DOI is assigned to each upload.

      The data are now available in the GEO database and can be accessed see GSE241775. We have added this information to the text. The R code is now provided as a Table S11 so that the reader can directly access the script.

      Comment 4 - Line 191: Delete the extra "that"

      We thank the Reviewer for identifying the oversight. We have revised the text accordingly.

      Comment 5 - Line 24f: What does "pseudo-randomly" mean? Maybe "haphazardly" would be more appropriate here?

      The term pseudo-randomly is used to describe the organized manner in which subjects are assigned to each treatment group. The aim is to ensure that a particular physiological variable, such as body mass, is evenly distributed across treatment groups. (Note although the term derived from the field of psychology). The aim is to reduce bias in the experiment due to an initial bias established when assigning treatment group. We are reluctant to replace pseudorandomly with haphazardly as the latter does not imply a logical organization. We have added text to help clarify the reason. The text now state: At the end of each photoperiodic treatment a subset of quail (n=12) body mass was used as a measure to pseudo randomly select birds for tissue collection and served to reduce the potential for unintentional bias.

      Comment 6 - Figure 1e,j: The text indicates that 398 and 130 genes were "rhythmically expressed" in the MBH and pituitary, respectively, but considerably fewer genes are shown in the heatmaps in Figure 1e,j. How were these genes selected, and what was the rationale for doing so? Also, some autumnal and vernal expression patterns show some strong similarities (e.g., 16a and 16v in the MBH), which could be discussed. Consider showing the two heatmaps with the columns also hierarchically clustered in a supplementary figure.

      We agree with the Reviewer that the full heatmap for the transcripts should be provided. The heat maps in Figure 1 are based on the transcripts with the most significant change; and were selected to provide a graphical representation that would be easily digested by the wide readership. We have created a new figure (ie. Fig. S1) that provides all the transcripts in heat maps for both the MBH and pituitary gland.

      Reviewer #3:

      Comment 1 I do not have too much to add to this section of my review. Broadly speaking, I would suggest that the authors address some of the concerns I highlight above, and integrate their thoughts into the paper more than they currently do. I think this is particularly important with respect to the limitations of many of the bioinformatic analyses.

      We thank the reviewer for their input and time assessing the manuscript. We have revised the manuscript in many sections incorporating the suggestions by Reviewer 3 above, and Reviewers 1 and 2.

      Comment 2 Some of the methods are also a little scant. For example, the qPCR analyses are not described in sufficient detail to replicate the study. What are the efficiencies? Were samples run in duplicate? What was the housekeeping control gene used? Was there only one, or were multiple housekeeping genes used?

      We apologise for the oversight, the absence of information was a mistake that missed our previous early revisions. The revised manuscript includes all the requested information. Line 333 states that all samples were run in duplicate. The efficiency for each transcript was within the MIQE guidelines (indicated on line 342) and were within the 0.7 to 1.0 range. Actin and glyceraldehyde 3-phosphate dehydrogenase were used as the reference transcripts. The most stable reference transcript was used to calculate fold change in target gene expression (lines 343-345).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, the authors report a link between brumation and tissue size in frogs, summarizing convincing evidence that extended brumation is associated with smaller brain size and increased investment in reproduction-related tissues. The research will be of broad interest to ecologists, evolutionary biologists, and those interested in global change biology. While the dataset involves significant field work and advanced statistical analyses, the manuscript would benefit from more explanation of the models, including why frogs are a good model in which to address these questions, and from general improvement in the structure and conciseness.

      We highly appreciate your positive assessment and that you considered our paper important and convincing.

      Reviewer #1 (Public Review):

      The authors have conducted lots of field work, lab work and statistical analysis to explore the effect of brumation on individual tissue investments, the evolutionary links between the relative costly tissue sizes, and the complex non-dependent processes of brain and reproductive evolution in anuran. The topic fits well within the scope of the journal and the manuscript is generally written well. The different parameters used in the present study will attract a board readership across ecology, zoology, evolution biology, and global change biology.

      Thank you for your positive and supporting feedback.

      Reviewer #2 (Public Review):

      The authors set out to show how hibernation is linked to brain size in frogs. If there were broader aims it is hard to decipher them. The authors present an extremely impressive dataset and a thorough set of cutting-edge analyses. However not all details are well explained. The main result about hibernation and brain size is fairly convincing, but it is hard to think of broader implications for this study. Overall, the manuscript is very confusing and hard to follow.

      Thank you for your compliments on our paper. As for your concerns, we have greatly revised our paper and, as we hope, improved its clarity. We have also added a few sentences to the conclusions to draw attention to potentially broader implications. Specifically, we describe how the focal traits of our study may all be affected by climate change. Differential constraints in necessary investments could be one of several reasons for the varying resilience to climate change between species in the same habitat.

      Reviewer #1 (Recommendations For The Authors):

      There are no issues on the availability of data and code.

      Thank you.

      Line 15: in the author contribution section, it seems that C.L.M. and J.P.Y are not in the author list.

      These two authors are not part of this study. This was a mistake.

      Line 24: I don't think it is vital or logical to address or compare too much on birds or mammals, which are not the focused taxa of the present study. Instead, it is better to clarify the reason why frogs and toads are ideal model taxon to this study.

      The reason for comparisons with birds and mammals was that all hypotheses related to the various trade-offs tested here had been developed in these taxa. One of the points of our paper was that these needed validation beyond the two taxa, in addition to being tested against one another (each prediction had been developed in a specific group and typically in isolation of all other hypotheses).

      Line 25-26: as the authors are shooting for eLife, as a general journal, it is not essential to provide the detailed methods in the abstract. But I think the authors need to strengthen the novelty of the work in the field here.

      The strength of our study was that all traits were measured directly in our species, including estimates of hibernation duration. Prior studies used various proxies, categorial classification or datasets assembled from multiple sources. To us, this seemed like a sufficiently important advance in the field to mention it, but considering the reviewer’s comment, we have now removed it.

      Line 28: "protracted brumation reduces brain size and instead promotes reproductive investments", as a correlative study, it is much more precise to change this sentence to a similar description as "protracted brumation is negatively correlated with brain size but is positively correlated with reproductive investments" here and related statements throughout the whole text.

      We agree that, strictly speaking, a path analysis can only point toward possible causality and not provide hard evidence as experimental manipulation might. The wording may have been a bit too strong here in our attempt to minimize wordiness and because all our analyses combined very strongly pointed in this direction. However, we have now changed this as suggested even though it now reads almost as if we had done no more than conducting a simple correlation. We have further paid attention to the wording of our interpretations throughout the paper.

      Line 32-33: it needs a bigger ending linking your main findings with the implication in understanding species response to the sustained environment change.

      We have reworded the ending of the abstract to: “Our results provide novel insights into resource allocation strategies and possible constraints in trait diversification, which may have important implications for the adaptability of species under sustained environmental change.”

      Line 63-68: this sentence is too long to understand and please simplify it.

      We have split the sentence into two sentences.

      Line 125-130: it is known that there are various frog reproductive modes (Crump et al. 2015) such as trade-offs between clutch size and egg size, different number of breeding during one year, etc. These different reproductive forms may also influence the brain size evolution with food availability and seasonal variations. Please clarify it.

      Yes, anurans do have varying reproductive modes, but to us, there is no a priori reason to assume that such variation would have a direct effect on brain evolution. Rather, in our opinion, different reproductive modes would have indirect effects by affecting the environment in which reproduction occurs. For example, larvae developing under different environmental conditions (substrate, larval density, egg provisioning etc.) might affect developmental trajectories that could influence how resources are available and allocated to different organs, including the brain. Alternatively, reproductive modes could influence the choice of environment for reproduction, thereby possibly affecting mating strategies and ultimately trait investments associated with these strategies. Given we were asked to shorten our paper, we believe that ‘environmental effects’ remains broad enough to encompass such variation, thereby not necessitating disentangling the different, and likely primarily indirect, ways that reproductive modes could be linked to brain evolution. However, if the reviewer would find it important to go into such detail in the paper, we will be happy to do so.

      Line 186-187: it is necessary to mention here that the authors also conducted sensitivity analyses to apply 2{degree sign}C or 4{degree sign}C below their experimentally derived as thresholds to test the robustness of the results to data uncertainty.

      We have added “(details on methodology and various sensitivity analyses for validation in Material and Methods)” to indicate the different types of sensitivity analyses, which included more than simply 2 or 4°C difference.

      Line 188: please change "In phylogenetic regressions" to "after controlling for phylogenetic autocorrelation/pseudo-replication" or similar sentence here.

      Our focus here was the phylogenetically informed GLS model rather than phylogenetic control itself. In the latter case, it would still not be clear what type of model was conducted with such phylogenetic control. To avoid any shorthand, we have reworded for more precision: “We employed phylogenetic generalized least-squares (PGLS) models, …”

      Line 177-287: please provide the exact variance explained by different predictor variables in brumation duration, individual tissue investments, and brain evolution. I also suggest that the authors need consider conducting multi-model inference-based model averaging analysis to test the relative importance of different variables. In addition, the present analyses did not include the interaction terms among variables, which may be more important than the effect of each individual factor.

      There may be some misunderstanding as these models represent separate analyses for each predictor as indicated by the associated λ values (never more than one value per model). We conducted separate models to determine which variables might even play a role in explaining variation in the corresponding response variables. Based on relevant predictors, we then conducted path analyses rather than general multi-predictor analyses. The relative effect sizes are represented by the correlation coefficients (r values) in the tables.

      Reviewer #2 (Recommendations For The Authors):

      Why exactly are the pairwise comparisons positively correlated (fig. S5) and then negatively correlated (fig. 3). What is actually driving this difference? For the phylogenetic path analyses 26 candidate models are chosen without explanation. What theory or hypotheses are these based on?

      We assume the reviewer is referring to the brain-body fat association. The two ‘pairwise’ analyses they mention were not the same. The correlation in Fig. S5 was a standard (albeit phylogenetically informed) partial correlation between the two focal tissues, controlling for SVL. By contrast, as described when introducing the analyses, negative associations were derived when additionally controlling for testes and hindlimb muscles, all of which deviated from isometry against body size. Here, the total mass of the four main tissues was divided by their proportional contribution to that mass in each species, then standardized for comparison across species. Since the total mass of these four tissues scaled directly with body size, larger-bodied species did not invest a proportion of their body to these tissues than smaller-bodied species, thus essentially rendering body size irrelevant for this analysis. However, the relative representation of the four traits changed between species such that more resources devoted to body fat was associated with a smaller brain, hence a negative relationship. Similarly, the multivariate analysis as well as the PCA also suggested similar trends when all four tissues were considered rather than purely pairwise comparisons.

      Regarding the second comment: We indeed used 28 pre-defined predictions for our larger path analysis.

      The authors haven't really provided much additional context either, and the discussion is almost entirely a rehash of the results section. I can't see the analysis code but this may be of use to people performing similar analyses.

      It is true that the traits and core message of the Discussion relate directly to our results, but we believe that our Discussion provides the essential biological context to our findings and to how they are connected. We tried not to go on tangents or too much speculation as the many results provided enough material to discuss, with several different ways that we expanded the prior state-of-the-art in the field. However, we have now expanded the concluding paragraph to place our findings in the context of climate change, given that this could affect anurans and the different traits examined in many ways that are directly related to the current study. Yet, we decided to keep this short because such extrapolation of our findings

      We indeed held off making the code available to the public in case dramatic changes to the paper were requested by the reviewers. However, it will be published.

      Additional recommendations from the Reviewing Editor:

      • One of the reviewers and I found the text a little difficult to follow. I suggest simplifying the paper by being more concise. For example, the introduction could be shortened into a 3-4 paragraphs of relevant text without overwhelming the reader. One of the reviewers wanted a better explanation of statistical models and I agree. The discussion could benefit from some structure - consider adding subheadings that would guide the reader as to the topic. Finally, the figures are difficult to see and should be made larger. For example, the graphs in Figure 1c could be on a panel below A and B so that readers can interpret the graph. In Figure 3 - the legend is far too small - please put above or below the graphs. In summary - I hope you consider a major re-write that would strengthen the accessibility of your paper to a broad audience.

      We have substantially shortened the paper despite adding further details on models and a broader context to the Discussion. We also condensed the Introduction to about two thirds of the original word count. However, we did not think that shortening it even further or splitting it into 3-4 paragraphs would improve readability. We still considered it important to introduce with sufficient context all major hypotheses that were tested against one another, provide at least some information on what was or was not known about the evolution of the focal traits and their links to one another or the environmental variables. We also found it important to touch on the differences between our study organisms and those typically studied in the context of hibernation or brain evolution, as this could affect the predictions. Given the number of hypotheses and traits, cutting the number of paragraphs would have meant merging some of them into very long ones, which we did not consider helpful.

      We further added short subheadings to the Discussion and adjusted the figures as requested.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We very much appreciate the constructive comments provided by the reviewers. We have incorporated many of their suggestions and believe the manuscript is much improved.

      In brief, we updated the text as suggested and have included three additional panels in supplementary fig. S2E-G. This additional data provides further support that the ectopically persisting neuroblasts are actively dividing and that cell cycle defects alone do not account for temporal patterning phenotypes.

      Reviewer #1 (Public Review):

      In this manuscript, the authors are building on their previous work showing Delta-Notch regulates the entrance and exit from embryo-larval quiescence of neural stem cells of the central brain (called CB neuroblasts (NB) (PMID: 35112131)). Here they show that continuous depletion of Notch in NBs from early embryogenesis leads to cycling NBs in the adult. This - cycling NBs in the adult - is not seen in controls. The assumption here is that these Notch-RNAi NBs in adults are those that did not undergo terminal differentiation in pupal development. The authors show that Notch is activated by its ligand Delta which is expressed on the GMC daughter cell and on cortex glia. They determine that the temporal requirement for Notch activity is 0-72 hours after larval hatching (ALH) (i.e., 1st instar through mid-3rd instar at 25C). In NBs/GMCs depleted for Notch, early temporal markers were still expressed at time points when they should be off and late markers were delayed in expression. These effects were observed in ~20-40% of NBs (Figures 5 and 6). Through mining existing data sets, they found that the early temporal factor Imp - an RNA binding protein - can bind Delta mRNA. They state that Delta transcripts decrease over time (without any reference to a Figure or to published work), leading to the hypothesis that Delta mRNA is repressed by the late temporal factors. Over-expressing late factors Syp or E93 earlier in development leads to downregulation of a Delta::GFP protein trap. These results lead to a model in which Notch regulates expression of early temporal factors and early temporal factors regulate Notch activity through translation of Delta mRNA.

      There are several strengths of this study. The authors report rigorous measurements and statistical analyses throughout the study. Their conclusions are appropriate for the results. Data mining revealed an important mechanism - that Imp binds Delta mRNA - supporting the model that early temporal factors promote Delta expression, which in turn promotes Notch signaling.

      There are also several weaknesses:

      1) The activation of Notch in NBs by Delta in GMCs was already shown by this group in their Dev 2022 paper, reducing some of the impact of this study.

      In our previous work, we reported that Delta-expressing GMCs transactivate Notch in neuroblasts during the embryonic to larval transition. In the current manuscript, we show that Delta is expressed in GMCs and cortex glia and both sources transactivate Notch in neuroblasts during later developmental stages. This is in agreement with work published by others and while not novel per se, is a necessary first step for understanding which neighboring cell types control Notch pathway activity. During the embryonic to larval transition, glia do not contribute likely because they have not yet grown to ensheath CB NBs and their recently born progeny.

      2) The authors do not explain their current results in context of their prior paper (2022 Dev) until the Discussion, but this would be useful to read in the Introduction. Similarly, it would be good to mention that in the 2022 paper, they find a significant number of wor>Notch RNAi NBs at 2 AHL that are cycling. Are the adult Notch RNAi in this study descended from those NBs at 2 hours ALH in the 2022 study? In other words, how does the early requirement for Notch between 0-72 hours ALH reported in the current study relate to the Notch-depleted NBs identified in the 2022 paper?

      We have now included the following text in the intro: “We recently reported that Notch signaling regulates CB NB quiescence during the embryonic to larval transition (Sood et al., 2022). When Notch is knocked down, some CB NBs continue dividing during this transition. We also reported that Notch activity becomes attenuated in quiescent CB NBs because CB NBs are no longer dividing and producing Delta-expressing GMC daughters for Notch pathway transactivation. Moreover, low Notch is necessary for CB NBs to reactivate from quiescence in response to dietary nutrients (Sood et al., 2022).

      Here we report that Notch signaling also regulates neurogenesis termination during pupal stages. When Notch is knocked down, CB NBs maintain early temporal factor expression longer resulting in a delay of late temporal factor expression with prolonged neurogenesis into late pupal stages and early adulthood. This defect in temporal patterning (switching from early to late) occurs after CB NB exit from quiescence suggesting that Notch is required at multiple times throughout development in controlling CB NB proliferation decisions.”

      We do not know whether the neuroblasts that fail to enter quiescence are the same that fail to terminate divisions during pupal stages, however there are many more that fail to terminate divisions during pupal stages.

      3) Most of the experiments rely upon continuous depletion of Notch from embryonic stage 8 until adulthood using the wor-GAL4 driver. There is no lineage tracing of this driver and there is no citation about the published expression pattern of this driver. The inclusion of these details is important for a broad audience journal.

      The reference for the driver is included in supplementary data, under the heading “Experimental model:Drosophila melanogaster”. This GAL4 driver is widely used and one of the most accepted in the field.

      4) Most of the experiments utilize a single RNAi transgene for Notch, Delta, Imp, Syp, E93. There are no experiments demonstrating the efficacy of the RNAi lines and no references to prior use and/or efficacy of these lines.

      All RNAi lines used in these studies have been published previously, by our group as well as others and sources for the lines are listed in supplementary data, under the heading “Experimental model:Drosophila melanogaster”. Efficiency of these lines have been verified using antibody labeling (data not shown) and by assaying activity of Notch activity reporters (shown in Fig. 2).

      An appraisal: The authors use temperature shifts with Gal80TS to show that Notch is required between 0-72 hours ALH. They show with the use of known markers of the temporal factors and Delta protein trap, that Imp promotes Delta protein expression and the later temporal factors reduce Delta, although the molecular mechanisms are not clearly delineated. Overall, these data support their model that the reduction of Delta expression during larval development leads to a loss of Notch activity.

      As noted in the Discussion, this study raises many questions about what Notch does in larval CB NBs. For example, does it inhibit Castor or Imp? Is Notch required in certain neural lineages and not others. These studies will be of interest in the community of developmental neurobiologists.

      Reviewer #2 (Public Review):

      Embryonic stem cells extensively proliferate to generate the necessary number of cells that are required for organogenesis, and their proliferation must be timely terminated to allow for proper patterning. Thus, timely termination of stem cell proliferation is critical for proper development. Numerous studies have suggested that cell-extrinsic changes in the surrounding niche environment drive the termination of stem cell proliferation. By contrast, cell-intrinsic mechanisms that terminate stem cell proliferation remain poorly understood. Fruit fly larval brain neuroblasts provide an excellent model for mechanistic investigation of intrinsic control of stem cell proliferation due to the wealth of information on molecular marks, gene functions and lineage hierarchy. Sood et al. conducted a genetic screen to identify genes that are required for the termination of neuroblast proliferation in metamorphosis and found that Notch and its ligand Delta contribute to their exit from cell cycle. They showed that knocking down Notch or delta function in larval neuroblasts allows them to persist into adulthood and remain proliferative when no neuroblasts can be detected in wild-type adult brains. By carrying out a well-designed temperature-shift experiment, the authors showed that Notch is required early during larval development to promote timely exit from cell cycle in metamorphosis. The authors went on to show that attenuating Notch signaling prolongs the expression of temporal identity genes castor and seven-up perturbing the switch from Imp to Syp/E93. Finally, they showed that knocking down Imp function or overexpressing E93 can restore the elimination of neuroblasts in Notch/delta mutant brains.

      Overall, the experiments are well conceived and executed, and the data are clear. However, the data reported in this study represent incremental progress in improving our mechanistic understanding of the termination of neuroblast proliferation.

      We respectfully disagree with this statement. Because Notch signaling is implicated in neurogenesis termination and Notch activity is regulated by GMCs and glia, it strongly suggests that NB proliferation and timing cues are controlled in a non-autonomous manner through direct interactions with NBs and their neighbors. This is in contrast to temporal patterning during embryogenesis which is largely believed to be controlled NB-autonomously. In addition, to our knowledge, no one has yet reported that CB NBs fail to terminate cell divisions on time when Notch activity is reduced during normal development. In fact, reported NB phenotypes associated with Notch loss of function have been surprisingly subtle until now.

      Some of the data seem to represent more careful analyses of previously published observations described in the Zacharioudaki et al., Development 2016 paper while others seem to contradict to the results in this study.

      The Zacharioudaki et al., Development 2016 paper is terrific. One key difference between our work and theirs, is that we look at Notch pathway knockdown and loss of function phenotypes, whereas in the Zacharioudaki 2016 paper, the authors report phenotypes associated with Notch constitutive activation. It has been known for some time that constitutively active Notch leads to tumorigenic phenotypes particularly in type II lineages. Zacharioudaki and colleagues further determined that some of the classically known temporal transcription factors were ectopically expressed in these stem cell tumors.Here we show that under normal developmental conditions, Notch pathway activity controls CB NB temporal patterning.

      Gaultier et al., Sci. Adv. 2022 suggested that Grainyhead is required for the termination of neuroblast proliferation in a neuroblast tumor model, and grainyhead is a direct target of Notch signaling. Thus, Grainyhead should be a key downstream effector of Notch signaling in terminating castor and seven-up expression. Identical to Notch signaling, Grainyhead is also expressed through larval development. Grainyhead can function as a classical transcription factor as well as a pioneer factor raising the possibility that temporal regulation of neurogenic enhancer accessibility might be at play in allowing Notch signaling in early larval development to set up termination of castor and seven-up expression in metamorphosis. Diving deeper into how dynamic changes in chromatin in neurogenic enhancers affect the termination of neuroblast proliferation will significantly improve our understanding of termination of stem cell proliferation in diverse developing tissue.

      Reviewer #3 (Public Review):

      In this study, the authors investigate the effects of Notch pathway inactivation on the termination of Drosophila neuroblasts at the end of development. They find that termination is delayed, while temporal patterning progression is slowed down. Forcing temporal patterning progression in a Notch pathway mutant restores the correct timing of neuroblast elimination. Finally, they show that Imp, an early temporal patterning factor promotes Delta expression in neuroblast lineages. This indicates that feedback loops between temporal patterning and lineage-intrinsic Notch activity fine tunes timing of early to late temporal transitions and is important to schedule NB termination at the end of development.

      The study adds another layer of regulation that finetunes temporal progression in Drosophila neural stem cells. This mechanism appears to be mainly lineage intrinsic - Delta being expressed from NBs and their progeny, but also partly niche-mediated - Delta being also expressed in glia but with a minor influence. Together with a recent study (PMID: 36040415), this work suggests that Notch signaling is a key player in promoting temporal progression in various temporal patterning system. As such it is of broad interest for the neuro-developmental community.

      Strengths

      The data are based on genetic experiments which are clearly described and mostly convincing. The study is interesting, adding another layer of regulation that finetunes temporal progression in Drosophila neural stem cells. This mechanism appears to be mainly lineage intrinsic - Delta being expressed from NBs and their progeny, but also partly niche-mediated - Delta being also expressed in glia but with a minor influence. A similar mechanism has been recently described, although in a different temporal patterning system (medulla neuroblasts of the optic lobe - PMID: 36040415). It is overall of broad interest for the neuro-developmental community.

      Weaknesses

      The mechanisms by which Notch signaling regulates temporal patterning progression are not investigated in details. For example, it is not clear whether Notch signaling directly regulates temporal patterning genes, or whether the phenotypes observed are indirect (for example through the regulation of the cell-cycle speed). The authors could have investigated whether temporal patterning genes are directly regulated by the Notch pathway via ChIP-seq of Su(H) or the identification of potential binding sites for Su(H) in enhancers.

      This is already known for svp and cas and we have now included this information in the discussion.Thank you.

      “Whether Notch pathway activity curtails both Cas and Svp or just Cas remains an open question, however it has been reported that both cas and svp are associated with at least one enhancer that is responsive to Notch activity (Zacharioudaki et al., 2016).”

      A similar approach has been recently undertaken by the lab of Dr Xin Li, to show that Notch signaling regulates sequential expression of temporal patterning factors in optic lobes neuroblasts (PMID: 36040415), which exhibit a different temporal patterning system than central brain neuroblasts in the present study. As such, the mechanistic insights of the study are limited.

      Reviewer #1 (Recommendations For The Authors):

      1) There are missing controls

      a) Fig. 1F and Fig. 6A - The authors should generate and show images of control clones (FRT19A) stained with the same markers as Notch clones.

      Fig. 1F is at 48 hours APF. In control clones, there are no Dpn positive cells present, as stated in the text and therefore no confocal images are shown. Same for Fig. 6A, there are no Dpn positive cells in control clones in the brain at this time, therefore nothing to double label.

      2) This result is incorrectly described in the Results

      a) P. 5 "Ectopically persisting N RNAi CB NBs expressed the NB transcription factor Deadpan (Dpn), the S-phase indicator pcnaGFP, and were small on average, similar in size to control CB NBs at earlier pupal stages (Fig. 1B,C,E)." The Notch RNAi NBs were larger (not smaller) than controls in Fig. 1E at 30, 48, 72 h APF and in adults.

      Thank you for this comment. We have changed the language in the main text as follows:

      “Ectopically persisting N RNAi CB NBs (CB NBs at 48 hours APF and beyond) expressed the NB transcription factor Deadpan (Dpn), the S-phase indicator pcnaGFP, and were small on average compared to control CB NBs during earlier developmental stages (L3 control, average diameter 10-15μms) (Fig. 1B,C,E). However, at 30 hours APF when control CB NBs are still present, N RNAi CB NBs were larger on average (Fig. 1B,C,E).”

      3) This sentence needs clarification/editing

      a) P. 4: " Independent of neurogenesis timing and the mechanism by which CB NB stop divisions, temporal patterning plays a key role". A key role in what?

      Thank you again. We have changed the text to the following:

      “Independent of neurogenesis timing and the mechanism by which CB NB stop divisions, temporal patterning plays a key role in controlling numbers and types of neurons made within each of the NB lineages (Maurange et al., 2008; Tsuji et al., 2008; Bahrampour et al., 2017; Yang et al., 2017; Pahl et al., 2019).”

      4) Some sentences need references or data to support them.

      a) P. 9 Please provide a reference to support the statement that Delta is a known Notch target

      We have included a reference.

      b) P. 9 - please provide a reference or data to support the statement that Delta transcripts decrease over time in larval CB NBs.

      This result is shown in Fig. 7B.

      5) Fig. 7A - it is difficult to appreciate the purple highlighting.

      We have changed the colors as suggested.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 4C, why does late knockdown of delta lead to ectopic persistence of NBs but late knockdown of Notch has no effect?

      This could be due to many things including differences in efficiency of UAS-RNAi lines. The point is that Delta/Notch is required early, but not late. Although some DeltaRNAi CB NBs are still present, the number compared to 48 hours APF is greatly reduced.

      2) It is surprising that Delta expression in NBs/GMCs appears to play a more important role in activating Notch signaling in neuroblasts than Delta expression in cortex glia. Please explain how Delta can cell autonomously activate Notch signaling.

      We are not proposing that Delta activates Notch cell autonomously, but are proposing that Delta in GMCs transactivates Notch in NBs. After NBs divide Delta is partitioned to GMCs. Quiescent NBs have low to no Notch pathway activity, likely because they are not producing Delta expressing GMC daughters (Sood, 2022).

      Please also reconcile the difference in gene expression induced by delta[RNAi] in this study and the delta-mutant allele used in the Zacharioudaki et al study.

      We are unsure what the reviewer is asking here and therefore can not reconcile any differences in gene expression between the dlRNAi line and the mutant allele. What gene expression needs to be reconciled? Zacharioudaki is listed as first author on four manuscripts. Which paper is being referred to?

      3) In Fig. 2J-L, why does knocking down delta in glia lead to loss of Scrib expression in neuroblasts and their surrounding progeny?

      We are not sure if it does or not. We only use Scrib as a membrane marker to identify and locate cells and neuropil regions of interest.

      4) The phrase "Notch is active early" is misleading when multiple labs have shown that Notch signaling is active in neuroblasts throughout larval development.

      Good point! We have rewritten the statement: “Somewhat paradoxically, we find that early Notch activity is required to terminate CB NB divisions late.”

      5) Neuroblasts that persist into adulthood are "smaller and Dpn-positive/PCNA-GFP-positive". Are they really neuroblasts? Can the authors verify the identity of these "persistent neuroblasts" with other molecular markers as well as functional assessment by inducing lineage clones?

      We have no doubt that these cells are NBs. Because we examine brains over time, these cells can be tracked using the markers, Scrib, Dpn, and pcna. These cells also undergo asymmetric cell division (Refer to Fig. S2F) and express other markers characteristic of CB NBs (mir and insc-not shown). We have made clones and see the same phenotype (ectopic persistence) in both MARCM clones and in “flip-out” clones.

      Reviewer #3 (Recommendations For The Authors):

      I have a few issues that need to be addressed to reinforce some of the conclusions:

      1) It is unclear whether NBs that persist in late pupal or adult stages have just failed to differentiate or whether they continue to divide, leading to supernumerary progeny (as shown for NBs that are stalled in temporal patterning like in svp mutant NBs (Maurange et al. 2008)). EdU or PH3 staining could be done in adults to clarify this point.

      In this manuscript, we make use of pcna:GFP, a reporter for E2F activity as an indicator of cell proliferation. We certainly observe Dpn positive cells that only weakly express the reporter, suggesting that these cells are not actively dividing or dividing at a reduced rate. However, by far most of the ectopically persisting CB NBs strongly express the reporter and generate pcnaGFP expressing progeny, indicating that these cells are dividing. We have also stained tissues with PH3 and have included an image of a telophase dlRNAi expressing CB NB at 48 hours APF (Fig. S2F).

      2) It is unclear whether Notch signaling directly or indirectly regulates temporal transitions. One possibility is that knockdown of Notch signaling decreases cell-cycle speed leading to delayed temporal transitions. The authors should test whether Notch KD affects cell cycle speed using EdU incorporation or PH3 staining. This could be done best using Notch mutant MARCM clones as wt NBs can be used as controls.

      We have quantified the number of PH3 positive CB NBs during wandering L3 stages in control and dlRNAi animals. We find that dlRNAi CB NBs are indeed proliferating at reduced rates compared to controls. To test whether reduced cell cycle times are causative for termination delay, we expressed a constitutively active form of PI3-kinase in dlRNAi animals to drive cell growth and proliferation. We found that CB NBs still ectopically persist (Fig. S2E-G).

      We have included the following in the text:

      “Defects in timing of temporal transitions could be due to defects in cell cycle progression, although embryonic NBs still transition independent of cell division (Grosskortenhaus et al., 2005). We used PH3 to assay CB NB mitotic activity. In Delta knock down animals, the percentage of PH3 positive CB NBs was reduced compared to control (Fig. S2E). At 48 h APF however, Delta knock down CB NBs were still dividing based on PH3 expression (Fig. S2F). To determine whether CB NBs ectopically persist due to defects in cell cycle rate, we co-expressed dp110 to constitutively activate PI3-kinase in Delta knock down animals. A significant number of pcnaGFP expressing, Dpn positive CB NBs were still observed, suggesting that defects in cell cycle timing and growth rates alone cannot account for ectopic persistence of CB NBs into later developmental stages and adulthood (Fig. S2G).”

      3) Cas is expressed in NBs either during quiescence and shortly after quiescence. It is possible that the maintenance of Cas in Figure 5D, E is due to NBs that have not re-entered the cell-cycle or have exited quiescence with a strong delay.

      Knockdown of Notch pathway has no effect on CB NB reactivation from developmental quiescence. In fact, low levels of Notch are required for CB NBs to reactivate in response to dietary nutrients (Sood, 2022).

      Indeed, the authors have previously shown that Notch signaling is important for NB cell cycle reentry during early larval stages (PMID: 35112131). Are Cas and Svp also maintained in late larval N-/MARCM clones (MARCM clonew are made after quiescence exit)?

      We have not assayed Cas or Svp expression past 48 hours ALH.

      4) The authors have revisited some previously published RNA-seq data showing that Delta is temporally regulated in NB lineages. This is not clearly shown by the authors that the same is true at the protein level.

      Moreover, they find that mis-expression of late temporal factors or Imp knockdown in early larval brains appear to decrease Delta expression. Such semi-quantitative analysis of gene expression by immunostainings in different conditions can be a bit complicated and not very convincing because variations on intensity levels can be due to slight variations in antibody concentration, or different parameters of image acquisition.

      We totally agree, but in this case the difference compared to controls was so readily apparent, that we felt it was not necessary to carry out experiments in clones. All images were acquired with the same confocal settings, experiments were repeated, and we consistently observed the same results. The data shown in Fig. 7D-G is representative.

      I suggest that the authors use clonal analysis rather than pan-neuroblast manipulation in order to have internal controls. For example, blocking temporal progression in Syp-RNAi clones (MARCM or Flp-out) and/or svp MARCM clones should lead to maintenance of Imp expression in late larval clones and maintenance of high levels of Delta, which would be easily assessed compared to surrounding NBs.

      Minor points:

      Fig 5: the sequential expression of Cas and Svp expression in larval NBs was first described by Maurange et al. 2008. Please cite appropriately.

      We have now added the requested citation to the following:

      “Over time, the percentage of Cas expressing CB NBs declined, while Svp expressing CB NBs modestly increased (Fig. 5B). Less than 1% of CB NBs co-expressed Cas and Svp at any stage and expression of both factors was absent by 48 hours ALH (Fig. 5B,C). This is consistent with work published previously (Isshiki et al., 2001; Tsuji et al., 2008; Chai et al., 2013; Maurange et al., 2008; Ren et al., 2017; Syed et al., 2017).”

      Fig 6A: Please indicate which immunostainings are shown in the overlay panels.

      Good catch! We have modified the figure.

      P9: "Delta co-immunoprecipitated with Imp.": Add "Delta mRNA co-immunoprecipitated with Imp in RIP-seq experiments" Otherwise, it suggests that you are talking about the protein.

      Done

      The scheme in Figure 7H is rather complicated to understand. In my opinion, it does not clearly convey the idea that Notch signaling favors the Imp-to-Syp transition.

      We have made a new model figure.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The precise mechanism of how tetraspanin proteins engage in the generation of discs is still an open question in the field of photoreceptor biology. This question is of significance as the lack of photoreceptor discs or defects in disc morphogenesis due to mutations in tetraspanin proteins is a known cause of vision loss in humans. The authors of this study combine TEM and mouse models to tease out the role of tetraspanin proteins, peripherin, and Rom1 in the genesis of the photoreceptor discs. They show that the absence of Rom1 leads to an increase in peripherin and changes in disc morphology. Further rise in peripherin alleviates some of the defects observed in Rom1 knockout animals leading to the conclusion that peripherin can substitute for the absence of Rom1.

      Strengths:

      A mouse model of Rom1 generated by the McInnes group in 2000 predicted a role for Rom1 in rim closure. They also showed enlarged discs in the absence of Rom1. This study confirmed this finding and showed the compensatory changes in peripherin, maintaining the total levels of tetraspanin proteins. Lack of Rom1 leads to excessive open disks demonstrated by darkly stained tannic acid-accessible areas in TEM. Interestingly, increased peripherin expression can rescue some morphological defects, including maintaining normal disc diameters and incisures. Overall, these observations lead authors to propose a model that ROM1 can be replaced by peripherin.

      Thank you for your kind summary of our work.

      Weaknesses:

      The compensatory increase in peripherin and morphological rescue in the absence of ROM1 is expected, given the previous work from authors showing i) absence of peripherin showing increased ROM1 and ii) "Eliminating Rom1 also increased levels of Prph2/RRCT: mean Prph2/RRCT levels in P30 Prph2+/R retinas were 34% of WT, while levels in Prph2+/R/Rom1−/− retinas were 59% of WT" from Conley, 2019. The current study provides a comprehensive quantitative analysis. However, the mechanism behind the mechanism is unclear and warrants discussion.

      We referenced the result from the 2019 paper by Conley and colleagues in revision. As noted by the reviewer, new information in the current study consists of the precise quantification of the compensatory increase by a technique more accurate than semi-quantitative Western blotting. The nature of these compensatory increases is currently unknown and beyond the scope of experiments described in the current study. While this is an intriguing area for future investigation, we prefer not to speculate on the underlying mechanisms to avoid any appearance of data overinterpretation.

      Photoreceptor morphology appears better when peripherin is overexpressed. Is there a rescue of rod function (assessed by ERG or equivalent measures) in peripherin OE/Rom1-/- mice? Given the extensive work in this area and the implications the authors allude to at the end, it is important to investigate this aspect.

      It is indeed an interesting and potentially translationally relevant direction to address whether PRPH2 overexpression can rescue the long-term degeneration and functional defects of the loss of ROM1. Unfortunately, our work in this direction remains severely hindered by the fact that the current line of ROM1 knockout mice are notoriously poor breeders, allowing us to get only a handful of animals for each year of breeding. Therefore, we decided to limit our current study to addressing the structural roles of ROM1 and PRPH2 in supporting disc formation.

      Reviewer #1 (Recommendations For The Authors):

      Line 210: "ROM1 is able to form disc rims in the absence of PRPH2" is not demonstrated. The data shows that the tetraspanin domains are interchangeable similar to Conley, 2019. Similar concern for lines 225-226.

      We agree with the point regarding the interchangeable tetraspanin domains and clarified it in the text by referring to the tetraspanin body of PRPH2 where applicable. However, the 2019 paper by Conley and colleagues did not show any ultrastructural images of disc rims in a mouse without at least one copy of WT PRPH2 being expressed. The presence of normally looking disc rims in the complete absence of the tetraspanin body of PRPH2 is an original observation of the present study.

      Line 234: it is unclear what is meant by .."they are normally processed in the biosynthetic membranes" How does lack of ER localization lead to this conclusion?

      We clarified this point by replacing “normally processed” with “not trapped”.

      Lines 306-308: it is difficult to follow the rationale. How will a shift in the trafficking pathway affect disulfide bonds since these are formed in ER?

      The reviewer makes a good point that at least the bulk of S-S bridge formation takes place during protein maturation in the ER and the ability of additional intramolecular S-S bond formation in the Golgi is questionable. We, therefore, removed this speculation from Discussion.

      Given the poor development of OS, the authors could provide an estimate of how many OS-like structures were observed, with and without rims, in RRCT animals.

      The gross development of outer segment structures in RRCT homozygous mice was part of the 2019 paper by Conley and colleagues. We prefer to limit repeating experiments from the previous study, but instead wanted to focus specifically on disc rim formation, which was not analyzed in RRCT homozygous mice in the previous study.

      The term "function" is loosely defined throughout this manuscript. Specifically, the excess peripherin can resolve some of the morphological defects observed in Rom1 -/-, and these functional changes in morphology are the focus of this work.

      We removed the word “function” in three occasions where there may be an ambiguity in its meaning, as noted by the reviewer.

      Lines 115/116: Reference is missing for the statement that photoreceptor cell degeneration begins at P30.

      These lines reference Figures 1A,B, which include quantification of the number of photoreceptor nuclei. These results show that ROM1 knockout retinas exhibit a modest but statistically significant degeneration at P30. The text is modified to eliminate any ambiguity.

      Lines 143-144 are speculation and could be moved to the discussion section. "Prolonged delivery of disc membrane delivery to each disc" Any reference or experiments to support this statement?

      We respectfully disagree with moving this short speculative sentence to Discussion. We believe that it helps the reader to follow the flow of the data, while being clearly presented as a potential explanation rather than a conclusion.

      Line 245-246: Results explained in the following paragraph (247-254) do not answer the question "whether disc rim formation in PRPH2 2C150S/C150S knockin mice was driven by disulfide-linked ROM1 molecules", which is a valid and intriguing question. However, the results explained in 247-254 answer the question "if C150S PRPH2 can form discs in the absence of ROM1".

      We changed the text to replace “To address this question” with “To explore whether disc rims can be formed in the absence of any disulfide-linked tetraspanin molecules”, which precisely reflects what was addressed.

      Reviewer #2 (Public Review):

      In this study, Lewis et al seek to further define the role of ROM1. ROM1 is a tetraspanin protein that oligomerizes with another tetraspanin, PRPH2, to shape the rims of the membrane discs that comprise the light-sensitive outer segment of vertebrate photoreceptors. ROM1 knockout mice and several PRPH2 mutant mice are reexamined. The conclusion reached is that ROM1 is redundant to PRPH2 in regulating the size of newly forming discs, although excess PRPH2 is required to compensate for the loss of ROM1.

      This replicates earlier findings while adding rigor using a mass spectrometry-based approach to quantitate the ratio of ROM1 and PRPH2 to rhodopsin (the protein packed in the body of the disc membranes) and careful analysis of tannic acid labeled newly forming discs using transmission electron microscopy.

      In ROM1 knockout mice PRPH2 expression was found to be increased so that the level of PRPH2 in those mice matches the combined amount of PRPH2 and ROM1 in wildtype mice. Despite this, there are defects in disc formation that are resolved when the ROM1 knockout is crossed to a PRPH2 overexpressing line. A weakness of the study is that the molar ratios between ROM1, PRPH2 and rhodopsin were not measured in the PRPH2 overexpressing mice. This would have allowed the authors to be more precise in their conclusion that a 'sufficient' excess of PRPH2 can compensate for defects in ROM1.

      Thank you for these kind comments about our work. Regarding the stated weakness that we did not measure the molar ratios between PRPH2, ROM1 and rhodopsin in the ROM1 knockout line with PRPH2 overexpression: this is one experiment that we really hoped to do but were limited by the poor breeding of the ROM1 knockout line described above. With the current breeding rate, we estimate that we would need to wait for another year to get enough material to do this experiment, which we cannot do in the context of this manuscript revision. We hope, however, that eventually this may be a part of one of our future papers.

      Reviewer #2 (Recommendations For The Authors):

      The p-value for statistical significance is not listed, readers will assume the most commonly used 0.05 value was used but this should still be defined, especially since only asterisks summarizing the p-value range are provided in place of the actual p-values.

      The definitions of various numbers of asterisks of significance (including p<0.05 as a minimal measure of significance) are provided in the Methods section, whereas the exact p-values are stated in figure captions.

      There are 3 phrasing issues that are potentially misleading.

      1) While PRHP2 and ROM1 are the most abundant tetraspanins in photoreceptors they are not the only ones. It would be more precise if for example the Table 1 title was changed to 'molar ratio of outer segment tetraspanins and rhodopsin'.

      We have changed the title of Table 1 to “Quantification of molar ratios between PRPH2, ROM1 and rhodopsin in WT and Rom1-/- outer segments” to be more accurate.

      2) The protein expressed in RRCT mice is described as the 'tetraspanin core' while the cartoon (and original paper) shows the protein as simply being ROM1 with a different cytoplasmic C-terminus (from PRHP2). Tetraspanin core in other places is used to mean just the transmembrane bundle or that bundle with the EC loops.

      We agree that the term “tetraspanin core” may be confusing. We modified the text to not use this term and, when needed, refer to this main part of the tetraspanin molecule as a “body”.

      3) Line 203-205, the 'somewhat restored' qualifier should be removed. If the authors think there is an effect that is different from chance, they should use a different alpha and justify that choice.

      We removed this line, as suggested.

      Reviewer #3 (Public Review):

      In this manuscript, Lewis et al. investigate the role of tetraspanins in the formation of discs - the key structure of vertebrate photoreceptors essential for light reception. Two tetraspanin proteins play a role in this process: PRPH2 and ROM1. The critical contribution of PRPH2 has been well established and loss of its function is not tolerated and results in gross anatomical pathology and degeneration in both mice and humans. However, the role of ROM1 is much less understood and has been considered somewhat redundant. This paper provides a definitive answer about the long-standing uncertainty regarding the contribution of ROM1 firmly establishing its role in outer segment morphogenesis. First, using an ingenious quantitative proteomic technique the authors show PRPH2 compensatory increase in ROM1 knockout explaining the redundancy of its function. Second, they uncover that despite this compensation, ROM1 is still needed, and its loss delays disc enclosure and results in the failure to form incisures. Third, the authors used a transgenic mouse model and show that deficits seen in ROM1 KO could be completely compensated by the overexpression of PRPH2. Finally, they analyzed yet another mouse model based on double manipulation with both ROM1 loss and expression of PRPH2 mutant unable to form dimerizing disulfide bonds further arguing that PRPH2-ROM1 interactions are not required for disc enclosure. To top it off the authors complement their in vivo studies by a series of biochemical assays done upon reconstitution of tetraspanins in transfected cultured cells as well as fractionations of native retinas. This report is timely, addresses significant questions in cell biology of photoreceptors, and pushes the field forward in a classical area of photoreceptor biology and mechanics of membrane structure as well. The manuscript is executed at the top level of technical standard, exceptionally well written, and does not leave much more to desire. It also pushes standards of the field- one such domain is the quantitative approach to analysis of the EM images which is notoriously open to alternative interpretations - yet this study does an exceptional job unbiasing this approach.

      According to my expertise in photoreceptor biology, there is nothing wrong with this manuscript either technically or conceptually and I have no concerns to express.

      Thank you for these incredibly kind comments.

      Reviewer #3 (Recommendations For The Authors):

      I have no recommendations to make.

    1. Author Response

      We would like to thank you and the reviewers for evaluating this manuscript and providing constructive recommendations. Please see our provisional response to the major comments made by the reviewers.

      Reviewer #1 (Public Review):

      "…the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well"

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We agree this is an important experiment which we will do.

      Reviewer #2 (Public Review):

      “…The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions."

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch).

      If our interpretation of the concern is correct, we think this is unlikely to be the case. The first test, and the following HFS protocol, and the second test, (Fig. 1a, top branch) were all performed in the same chamber. For both the first and the second tests, animals received two 30-second recall trials, separated by 2 minutes (the data presented as the average of the two trials). We did not see a difference in freezing between the first and the second recall trials within each session (data not shown). It was only after the HFS protocol that we observed an increase in freezing.

      This shows that in our paradigm the first recall does not impact the next recall in terms of the animals’ freezing levels. It must be noted that in cases where we did not do any testing prior to the HFS protocol, we still observed an increase in freezing after the HFS protocol (ex. Fig. 1a, middle branch and the corresponding data in Fig. 1b, the bar labeled as Wth+HFSth). Also, relevant is the data shown in Fig. 3c. Here, although animals were tested twice (Fig3. a, top branch), there was no increase in freezing in the second test (Fig. 3c, middle panel, Wth+24HFSCtx). That is, in the absence of an effective LTP, there is no significant difference between the two tests.

      To further confirm this, in a new group of mice, 24 hours after weak conditioning, we will induce the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol).

      “The final experiment (Fig. 5a-c, extended data 5c) combines behavioral assessments with in vivo LFP recordings before and 24 h after hetero-HFS. While this experiment is demanding, it seems a bit underpowered”

      We agree with the reviewer that the number of mice used in this experiment is on the lower side. However, this is not unusual for such an experimental configuration. As the reviewer mentioned, this is a demanding experiment for multiple reasons. For example, to confidently demonstrate that our HFS protocol, in addition to long-lasting behavioral changes, produces long-lasting synaptic changes, we must see a significant increase in evoked LFP after the manipulation which is predicted to last at least 24 hours. That is, the change in evoked LFP is not caused by non-related fluctuations, such as movement of the recording probe. For this reason, 3-4 days prior to conditioning, each day we measured evoked LFP. Only those mice that had a stable evoked LFP during this time were used for further conditioning. We will provide exclusion criteria for this experiment in the revised manuscript.

      “ It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS,..”

      We will perform an experiment where mice undergo a weak conditioning protocol and will record the evoked LFP 1-2 hours following the conditioning protocol, as well as the next day.

      “…the slice experiments (Fig. 5d-f) are not well aligned with the in vivo experiments (juvenile animals, electrical vs. opto stimulation, different HFS protocols, timescale of hours).”

      Our aim in this part was to demonstrate that the pathways we chose for our study can undergo heteroLTP. For this purpose, we used an already established protocol, which uses electrical stimulation (Fonseca, 2013). For clarification, I have tried to induce optical LTP with a high-frequency stimulation protocol in slices, but I did not succeed. I am not aware of a work that successfully induced optical LTP with a high-frequency protocol.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank Reviewer 1 for their time reviewing our revised manuscript and appreciate their thoughtful suggestions for further clarity. In regard to the public review statement, "However, parts of the methods (e.g. assessment of blanks and data filtering) and results (e.g. visualization of plant community data) could still be polished, and the figures should be improved to increase the clarity of the manuscript", we have made small modifications in the text and figures during production of the Version of Record to address these important suggestions.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript compiles the colonization of shrubs during the Late Pleistocene in Northern America and Europe by comparing plant sedimentary ancient DNA (sedaDNA) records from different published lake sediment cores and also adds two new datasets from Island. The major findings of this work aim to illuminate the colonization patterns of woody shrubs (Salicaceae and Betulaceae) in these sediment archives to understand this process in the past and evaluate its importance under future deglaciation and warming of the Arctic.

      We greatly appreciate the time and detailed consideration of our manuscript by Reviewer 1. Our responses to individual comments are highlighted in blue, with the original comments provided by the reviewer in black.

      The strength of evidence is solid as methods (sedimentary DNA) and data analyses broadly support the claims because the authors use an established metabarcoding approach with PCR replicates (supporting the replicability of PCR and thereby proving the occurrence of Salicaeae and Betulaceae in the samples) and quantitative estimation of plant DNA with qPCR (which defines the number of cycles used for each PCR amplification to prevent overamplification). However, the extraction methods need more explanation and the bioinformatic pipeline is not well-known and needs also some further description in the main text (not only referring to other publications).

      Thank you for bringing this to our attention. We have now provided greater detail on our extraction methods and bioinformatic pipeline.

      The authors compare their own data with previously published data to indicate the different timing of shrubification in the selected sites and show that Salicaceae occurs always like a pioneer shrub after deglaciation, followed by Betaluaceae with a various time lag. The successive colonization of Salicaceae followed by Betulaceae is explained by its differences in environmental tolerance, the time lag of colonization in the compared records is e.g. explained by varying distance to source areas.

      However, there are some weaknesses in the strength of evidence because full sedaDNA plant DNA assessment, quality of the sedaDNA data (relative abundance and richness of sedaDNA plant composition) and results from Blank controls (for sedaDNA) are not fully provided. I think it is important to show how the plant metabarcoding in general worked out, because it is known that e.g. poor richness can be indicative of less preserved DNA and a full plant assessment (shown in the supplement) would be more comprehensive and would likely attract a larger readership.

      Thank you for bringing these important points to our attention. The DNA dataset including the full taxa assemblage will be included with the manuscript upon publication and apologize for not including it during the review process. This dataset will also include information on positive and negative blanks used for quality control. Following suggestions from Reviewer 2, we have now also calculated some recently proposed DNA quality metrics (Rijal et al., 2021), which collectively support our earlier conclusions that our record is of sufficient quality to draw the current conclusions. We hope that the inclusion of the complete DNA dataset will indeed draw a larger readership.

      Further, it would allow us to see the relative abundance in changes of plants and would make it easier to understand if the families Salicaeae and Betulaceae are a major component of the community signal. Further, the possibility to reach higher taxonomic resolution with sedaDNA compared to pollen or to facilitate a continuous record (which is different from macrofossils) is not discussed in the manuscript but should be added. Also, the taxonomic resolution within these families in the discussed datasets would be of interest, also on the sequence type level if tax. assignments are similar.

      Thank you for these suggestions. We have focused on these two families as it is known from numerous pollen records and floras that they are the major component of the vascular plant communities in the regions investigated. Betula (birch) and Salix (willow) are indeed the most dominant woodland shrubs of the tundra biome, which covers expansive areas of the Arctic. For example, in Iceland natural woodlands, which cover 1.5% of the total land area, are composed of 80% birch shrubs (Snorrason et al. 2016, Náttúrufræðingurinn 86). Salix mixes in with Betula, especially around wet sites. Species from both genera are common and wide-spread throughout Iceland, but dwarf and cold tolerant species thrive best on the highland or at glacial sites, while shrub-like species are more common on the lowland, coastal area and in sheltered valleys. Flora of Iceland (http://www.floraislands.is/PDF-skjol/Checklist-vascular.pdf) lists Betula as the only genus of Betulaceae native to Iceland (page 79/80) and Salix as the major genus of Salicaceae (page 82-85), although Populus tremula (Salicaceae) exists in the wild but is rare (perhaps just a countable number of trees/shrubs in the whole country). The point is that, for Iceland, Betulaceae is Betula and Salicaceae is Salix, meaning that our sedaDNA method has the taxonomic resolution at the genus level. And with the help of pollen analysis of the site near Stóra Viðarvatn (the novel sedaDNA work of the present paper), i.e., Ytri-Áland site (Karlsdóttir et al. 2014), it is possible to interpret our results even to the species level, which we have only mention in the discussion. It has been suggested that matching sedaDNA results with botanical knowledge about the study site and the vegetation history (local reference database) is one way to increase taxonomic resolution of the sedaDNA approach (e.g. Elliott et al. 2023, Quaternary 6,7). In the same way we find our sedaDNA analysis having sufficient resolution to answer the questions asked in the present study. For the future, although we do not include it in the discussion this time, it should be possible to increase the taxonomic resolution of plant metabarcoding by priming multiple genes simultaneously like that is described as a proof of concept by Foster et al. (2021, Front Ecol Evol 9: 735744). In the revised version of the manuscript, we have now expanded on the power of sedaDNA in terms of increased taxonomic resolution and application in continuous lake sediment records in the introduction of the manuscript. Following Reviewer 2’s suggestion, we have now included the sequences used for taxonomic assignment in the supplement information.

      Another important aspect is how the abundance/occurrence of Salicaceae is discussed. Many studies on sedaDNA confirm an overrepresentation of this family due to better preservation in the sediment, far-distance transport along rivers, or preferences of primers during amplification etc. As this family is the major objective of this study, such discussion should be added to the manuscript and data should be presented accordingly.

      Thank you for raising this point. The reviewer is indeed correct that Salicaceae is typically overrepresented in read abundance compared to other vascular plant taxa in sedaDNA studies. However, as we mention in the Results and Interpretation section for Stóra Viðarvatn “As PCR amplification results in sequence read abundances that may not reflect original relative abundances in a sample (Nichols et al., 2018), we focus our discussion on taxa presence/absence,” we do not place weight on the indeed greater relative abundance of Salicaceae in our own dataset. As such, this different relative abundance of plant taxa reads should not influence the conclusions drawn in the manuscript.

      I also miss more clarity about how the authors defined the source areas (refugia) of the shrubs. If these source areas are described in other literature I suggest to show them in a map or so. Further, it should be also discussed and explained more in detail which specific environmental preferences these families have, this is too short in the introduction and too unspecific. Also, it would be beneficial to show relative abundances rather than just highlighted areas in the Figures and it would allow us to see if Salicaeae will be replaced by Betulaceae after colonizing or if both families persist together, which might be important to understand future development of shrubs in these areas.

      Thank you for allowing us to clarify. As the regions studied with the lake sediment records shown in this manuscript were all covered by extensive ice sheets during the Last Glacial Maximum (LGM, Fig. 1), plant refugia and source areas must have been located somewhere south of the ice sheet margins. Thus, we calculate our distance to source as the minimum distance from a lake site to land beyond the extent of the ice sheet during the LGM. This has now been clarified in the text and highlighted in Fig. 1. We have also added in the discussion molecular results from Thórsson et al. (2010, J Biogeogr 37) on possible source origins of Betula in Iceland. Details on taxa environmental preferences have now been expanded upon in the Discussion section where we explore the various trait-based factors that may influence the relative differences in colonization timing between Salicaceae and Betulaceae. We have now also edited Figs. 3 and 4 to include PCR replicates instead of highlighted bars to better compare the DNA and pollen datasets from Iceland.

      The author started a discussion about shrubification in the future, but a more defined evaluation and discussion of how to use such paleo datasets to predict future shrubification and its consequences for the Arctic would give more significance to the work.

      Thank you for this suggestion and allowing us to expand on potential future changes. We have now edited this final section of the paper to provide a little more detail on how we envision these records being used to predict future shrubification and climate change.

      Reviewer #1 (Recommendations For The Authors):

      I list some more specific details here.

      You speak about "read counts", I guess you used relative abundance of read counts, you should state it like this.

      Thank you for allowing us to clarify. The data that we refer do in terms of read counts is from the previously published studies in the circum North Atlantic. The data provided from these studies is raw read counts, and not relative abundance.

      Line 100: What do you mean here: "temperature changes in prior warm periods"?

      Thank you for allowing us to clarify. We have rephrased to sentence to “higher temperature in prior warm periods”, which we hope is clearer for the reader.

      Line 134: How is DNA diluted by minerogenic sediment? Did the sedimentation rate increase? Typically minerogenic input should be beneficial for DNA preservation.

      Thank you for allowing us to clarify. These samples were primarily comprised of tephra glass with minimal organic content. While we agree that minerogenic sediment is generally beneficial for DNA preservation, the predominance of inorganics (tephra) that fell from the sky, rather than being washed into the lake from the landscape, would not carry organic sediment with it. We have rephrased the sentence to make this clearer.

      I would suggest adding more citations to the text (for example statements in lines 106, 110, 368)

      Thank you for the suggestion. The manuscript has been edited accordingly.

      Better divide your discussion part: discussion about dispersal mechanisms occur in both sections. Maybe you could divide it into environmental factors for colonization and traitbased factors (only an idea).

      Thank you for the suggestion. We have now edited the second dispersal section to “Environmental dispersal mechanisms” to be clearer about our focus on factors such as wind, sea ice, and birds that may transport the seeds across the North Atlantic. The previous section retains the trait-based factors that may influence relative timing in colonization between Salicaceae and Betulaceae.

      Which type of sequencing did you use, paired-end 76bp is unknown to me.

      Methods have now been edited to clarify this, along with details related to extraction methods as requested in the Public Review.

      Reviewer #2 (Public Review):

      Harding et al have analysed 75 sedaDNA samples from Store Vidarvatn in Iceland. They have also revised the age-depth model of earlier pollen, macrofossil, and sedaDNA studies from Torfdalsvatn (Iceland), and they review sedaDNA studies for first detection of Betulaceae and Salicaceae in Iceland and surrounding areas. Their Store Vidarvatn data are potentially very interesting, with 53 taxa detected in 73 of the samples, but only results on two taxa are presented. Their revised age-depth model cast new light on earlier studies from Torfdalsvatn, which allows a more precise comparison to the other studies. The main result from both sedaDNA and the review is that Salicaceae arrives before Betulaceae in Iceland and the surrounding area. This is a well-known fact from pollen, macrofossil, and sedaDNA studies (Fredskild 1991 Nordic J Bot, Birks & Birks QSR 2014, Alsos et al. 2009, 2016, 2022) and as expected as the northernmost Salix reach the Polar Desert zone (zone A, 1-3oC July temperature) whereas the northernmost Betula rarely goes beyond the Southern Tundra (zone D, 8-9 oC July temperature, Walker et al. 2005 J. Veg. Sci., Elven et al. 2011 http://panarcticflora.org/ ).

      We greatly appreciate the time and detailed consideration of our manuscript by Reviewer 2. Our responses to individual comments are highlighted in blue, with the original comments provided by the reviewer in black.

      While we agree that previous studies have indeed indicated a relative delay in Betula colonization relative to Salix, most of these have relied on pollen and macrofossil evidence, which are complicated to use as proxies for the first appearance of a given taxa (see our Introduction in the main manuscript). A few studies have shown this also with sedaDNA (e.g., Alsos et al., 2022), which is a more robust proxy for a plant taxa’s presence, but these have been limited geographically (e.g., northern Fennoscandia). In our study, we show that this pattern is reflected in 10 different lakes across the North Atlantic, emphasizing the broad nature of Betula’s delayed colonization relative to other woody shrubs, such as Salix.

      My major concern is their conclusion that lag in shrubification may be expected based on the observations that there is a time gap between deglaciation and the arrival of Salicaceae and between the arrival of Salicaceae and Betulaceae. A "lag" in biological terms is defined as the time from when a site becomes environmentally suitable for a species until the species establish at the site (Alexander et al. 2018 Glob. Change Biol.). The climate requirement for Salicaceae highly depends on species. In the three northernmost zones (A-C), it appears as a dwarf shrub, and it only appears as a shrub in the Southern Tundra (D) and Shrub Tundra (E) zone, and further south it is commonly trees. Thus, Salicaceae cannot be used to distinguish between the shrub tundra and more northern other zones, and therefore cannot be used as an indicator for arctic shrubification. Betulaceae, on the other hand, rarely reach zone C, and are common in zone D and further south. Thus, if we assume that the first Betulaceae to arrive in Iceland is Betula nana, this is a good indicator of the expansion of shrub tundra. Thus, if they could estimate when the climate became suitable for B. nana, they would have a good indicator of colonisation lags, which can provide some valuable information about time lags in shrub expansion (especially to islands). They could use either independent proxy or information from the other species recorded in sedaDNA to reconstruct minimum July temperature (see e.g. Parducci et al. 2012a+b Science, Alsos et al. 2020 QSR).

      We appreciate the reviewer’s insight into the implications of our use of the word “lag”. Indeed, as we do not have site-specific climate timeseries for each lake record, we have adjusted our wording to “delay”, which we believe is more general and descriptive of our observations. We recognize the importance of independent paleotemperature records for each lake, but these are not yet available for all records, so we prefer to keep our study focused on the delay instead. In addition, we prefer not to derive temperature records from the vegetation sedaDNA records, as these are not independent and will incorporate changes driven by additional factors, such as soil and light (e.g., Alsos et al., 2022). We have added some text to the final section on Future Outlook that elaborates on the need for complimentary records of past climate to pair with paleoecological records of colonization. We hope that this motivates the community to pursue these lines of research that we agree are needed.

      The study gives a nice summary of current knowledge and the new sedaDNA data generated are valuable for anyone interested in the post-glacial colonisation of Iceland. Unfortunately, neither raw nor final data are given. Providing the raw data would allow re-analysing with a more extensive reference library, and providing final data used in their publication will for sure interest many botanists and palaeoecologist, especially as 73 samples provide high time resolution compared to most other sedaDNA studies.

      Finally, the raw and final data, including blank controls, used in our study for Stóra Viðarvatn will ultimately be provided with the manuscript’s publication. We apologize for not including it with the original submission.

      Reviewer #2 (Recommendations For The Authors):

      Line 112-113: Difference in northward expansion rate is not the same as lag. Thus, your conclusion "As a result, the biospheres role in future high latitude temperature amplification may be delayed." does not derive directly from the data you present.

      Thank you for allowing us to clarify our wording. We have rephrased the sentence to align with our results more closely as stated in the Abstract of the manuscript.

      .Line 133: From Figure S3, it looks like three or possibly four samples failed.

      Thank you for pointing this out. First, we realized that the DNA reads originally included in Figure S3 were from after filtering. We have now updated the figure to include the total raw reads, which is a better indicator of DNA reliability (Rijal et al., 2021). Based on the total raw reads, only two samples failed with total reads of 2 and 5.

      Line 141: You say you focus on presence/absence, but you do show quantitative results for Salix and Betula (0-5 PCR repeats) in Figure 2.

      Thank you for allowing us to clarify. Fig 2 shows the number of replicates that meet our criteria for taxa presence, where a higher number of replicates corresponds to a higher likelihood of presence.

      Line 142: Where are the other 51 taxa shown?

      We are providing the full DNA record in the supplement, which will be published alongside the main manuscript. We have also now included a plot of species richness against sample depth in Fig. S2.

      Line 178-179: Note that the revised date of first detection is close to what has been previously published (Salix ~10300 vs. 10227, Betula ~9500 vs 9680), so it does not make any changes to previous interpretation.

      Yes, this is true. However, we still believe it is important to always consider improvements in age models to best correlate the timing of events between different paleo records.

      Line 191-194 and Figure S2: I leave the evaluation of revised age-depth model to the geologist.

      As this aspect was not commented on, we assume that both reviewers are satisfied.

      Line 197: "Delay" is a more correct word than "lag".

      Thank you, edited.

      Line 210: Where do 1700 and 2500 come from? If your revised age of ice retreat is 11 800, and your revised date of Salix and Betula arrival are ~10 300 and ~9500, I make this 1500 and 2300.

      Yes, this is correct. Thank you for pointing out this error.

      Line 215-217: To be more certain about any bias caused by low DNA quality, I suggest you explore your data using the tools presented in Rijal et al. 2021 Science Advances. As you do not provide your data, I cannot evaluate the quality of them.

      Thank you for the suggestion. We have now calculated the various DNA quality indices developed by Rijal et al. (2021). This has been added to the methods and results section for the Stóra Viðarvatn record, as well as in Fig. S3. The MTQ and MAQ scores are known to correlate with species richness when richness is low (n<30, Rijal et al., 2021), which is likely an artifact of the requirement that the 10 best represented barcode sequences are required to calculate these scores. As this correlation is observed in our dataset and given that our species richness is low (n<30, Fig. S2), the low MTQ and MAQ score are not likely indicative of low-quality DNA. We therefore judge the quality of our DNA on total raw reads and CT values, which remain relatively constant through time (Fig. S2).

      Line 226: Do you mean TDV?

      We intended to omit unnecessary abbreviations throughout the manuscript, such as lake names, in our original manuscript. We have now changed TORF, which we use as the lake’s abbreviation, to the full lake name, Torfdalsvatn.

      Line 282-283: Given that the basal sediments of Nordivatnet are marine (Brown et al. 2022 PNAS Nexus), even a low detection may be a strong indication of local presence.

      Thank you for this point. However, to standardize the records and compare across a wide range of geographical and depositional settings, we prefer to apply the same criteria for the taxa’s presence to each lake as outlined in our Methods.

      Line 289: See the definition of "lag"

      Changed to “delayed” per your earlier suggestion. Thank you.

      Line 298-303: I agree that the late appearance of Betula at Langfjordvatnet (10 000 cal BP) is anomalously long and a bit unexpected given that it is found at five other lakes in the region 13000-10200 cal BP (Alsos et al. 2022). However, a likely explanation is the lack of area with stable soil - B. nana requires a greater degree of soil development compared to other heath shrubs (Whittaker 1993) and Langfjordvatnet is surrounded by steep scree slopes (Otterå 2012 master thesis Univ. Bergen). At Jøkelvatnet, Salix appears in the four available samples from 10453 to 9811 whereas Betula arrives 9663. Here, the arrival of Betula is just at the drop of local glacier activity and at the temperature rise, suggesting that it arrives immediately after the climate becomes suitable (Elliott et al. 2023 Quaternary). Thus, based on N Fennoscandia where we have more data available, it does not show lags and does not support delayed shrubification (which contrasts with what we have shown for many other species including common dwarf shrubs, see Alsos et al. 2022). Would be very interesting to have similar data from Iceland, which has a large dispersal barrier.

      Thank you for these further considerations. We have incorporated those related to Langfjordvannet into the manuscript accordingly. We also appreciate the point regarding Jøkelvatnet. However, as stated in our Methods section for “Published sedaDNA datasets”, we do not include Jøkelvatnet in our comparison due to the impact of glacier activity as the reviewer notes: “Finally, both Jøkelvatnet and Kuutsjärvi were impacted by glacial meltwater during the Early Holocene when woody taxa are first identified (Wittmeier et al., 2015; Bogren, 2019), and thus the inferred timing of plant colonization is probably confounded in this unstable landscape by periodic pulses of terrestrial detritus.” Due to the glacier’s presence in the lake catchment, it is not possible to discern whether delay in Betulaceae would have occurred if the glacier were not present. Therefore, we prefer to keep this record excluded from our comparisons.

      Line 316-319 and 344: Based on contemporary genetic patterns, Alsos et al. analyse the relative importance of adaptation to dispersal compared to other factors.

      Thank for you bringing up this important point. We have now expanded our discussion to include these analyses from Alsos et al. (2022).

      Line 342+350: Original publication is Alsos et al. 2007 Science

      Thank you, edited.

      Line 343: Alsos et al. 2009 Salix study is the wrong citation here. Eidesen et al. 2015 Mol. Ecol. shows phylogeography of Greenland population but not Baffin - I am not aware of any contemporary genetic studies of Betula from Baffin.

      Thank you for pointing this out. We will also include the Eidesen et al. (2015) citation for reference to Greenland. However, there is one data point included for southern Baffin Island in Alsos et al. (2009), so we will retain this citation here as well.

      Line 351-353: See comment about Betula from Baffin above. Also, I am not sure I follow here - what do you mean by "these populations" - the Svalbard ones or Iceland? Eidesen et al. 2015 is the wrong citation for Salix - use Alsos et al. 2009. Alsos et al. 2009 suggest Iceland (and E Grenland) was colonized from north Scandinavia, although this was uncertain as no data were available from Faroe/Shetland. Svalbard was colonized from N Fennoscandia (Alsos et al. 2007).

      Regarding Baffin Island sources, we refer the reviewer to our response to their previous comment. We have clarified the wording of our sentence from “these populations” to “the modern populations from these locations [Baffin Island, Greenland, and Svalbard]”. We have removed reference to Eidesen et al. (2015), as this is for Betula rather than Salix. Finally, we have added a citation for Alsos et al. (2007) here for Svalbard.

      Line 354-355: AFLP suggest that Baffin and W Greenland were colonised from a refugia south of the Wisconsin Ice Sheet, see Alsos et al. 2009.

      Yes, we are aware, thank you. Our reference to “mid-latitude North America” in the sentence acknowledges this refugia, but we have now added “south of the Laurentide Ice Sheet” for further clarification.

      Line 363-381: See comment above; your Store Vidarvatn data do currently not demonstrate a lag between environmental suitability and climate, but using the rest of the DNA record, potentially it could. Would also be good to reflect on the distance to the source area for shrubs Late Glacial/Early Holocene compared to now.

      Thank you for these suggestions. We have edited this section of the manuscript to elaborate on the need for independent climate reconstructions as well as the fact that distances to plant refugia are shorter now than during the last postglacial period.

      Line 396-416: I am not an expert on tephra so I will not comment on this part.

      As this aspect was not commented on, we assume that both reviewers are satisfied.

      Line 459-457: Please provide results of how much data is lost at each step of filtering.

      We added the read loss following each filtering step as a table in the supplemental information (Table S4).

      Throughout the manuscript, you go only to species level although DNA in most cases is able to distinguish to genus level within Salicaceae and Betulaceae - which sequences did you identify?

      Sequences are now provided in the supplemental for Salicaceae and Betulaceae. Based on our bioinformatic pipeline, reference library and requirement for 100% match between sequence and taxonomy, we were only able to distinguish between species level.

      Figure 2: The detection of Betulaceae is very sporadic in Stóra Vidarvatn with occurrence in only seven samples and hardly ever in all 5 repeats, suggesting that if you apply a statistical model to estimate first arrival (see Alsos et al. 2022), you will have a large confidence interval. Thus, these uncertainties should be considered when estimating the delayed arrival of Betula compared to Salix. The data from Torfdalsvatn (which I assume are from Alsos et al. 2021 although not specified in the figure legend), shows detection in all samples from the first appearance and mostly in 8 of 8 repeats (shown in the original publication - you could to the same here), thus providing a more accurate estimate for the time gap between arrival of Salix and Betula.

      Thank you for bringing up this important point. The detection of Betulaceae is indeed sporadic, but we believe it reflects the genuine nature of its presence/absence during the Holocene in Northeast Iceland. This is supported by Betula pollen from a nearby peat record that shows a similar history (Fig. 4, Karlsdóttir et al., 2014), which we have now elaborated on in the Results and Interpretation section. As for the timing of Betulaceae colonization at this site, the first appearance in the DNA record should be a close minimum estimate as shown with modern DNA and plant survey comparisons (e.g., Sjögren et al., 2017; Alsos et al., 2018). The true first appearance could be biased by small amounts of plants being present in the early stages of colonization and not registering the sedimentary record until enough dead plant material is transported to the depocenter of the lake. However, this is likely less than age model uncertainties and therefore not likely relevant on geologic timescales as in this study. In this sense, our age models and those published for the other records indicate this is generally on the order of several hundred years. In addition, we have now added the Alsos et al. (2021) reference for Torfdalsvatn. Unfortunately, this Torfdalsvatn study does not provide number of PCR repeats so we will keep the figure as is as it best represents the available data.

      Figure 5: I suggest adding lake names to the figure. Is there a dot missing for lake 5 for Salicaceae?

      Thank you for the suggestion, we have added lake names to the figure. There is a dot marked for Salicaceae for lake 5, however, not for Betulaceae as this taxon was not identified. We refer the reviewer to the Discussion Section “Postglacial sedaDNA records from the circum North Atlantic” and the lake’s original publication (Volstad et al., 2020).

      Figure 6: I find it more relevant to plot colonization time versus distance to LGM sheetice margin - lake number is just an arbitrary number.

      We appreciate the suggestion and have modified the figure accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the present manuscript, Abele et al use Salmonella strains modified to robustly induce one of two different types of regulated cell death, pyroptosis or apoptosis in all growth phases and cell types to assess the role of pyroptosis versus apoptosis in systemic versus intestinal epithelial pathogen clearance. They demonstrate that in systemic spread, which requires growth in macrophages, pyroptosis is required to eliminate Salmonella, while in intestinal epithelial cells (IEC), extrusion of the infected cell into the intestinal lumen induced by apoptosis or pyroptosis is sufficient for early pathogen restriction. The methods used in these studies are thorough and well-controlled and lead to robust results, that mostly support the conclusions. The impact on the field is considered minor as the observations are somewhat redundant with previous observations and not generalizable due to cited evidence of different outcomes in other models of infection and a relatively artificial study system that does not permit the assessment of later time points in infection due to rapid clearance. This excludes the study of later effects of differences between pyroptosis and apoptosis in IEC such as i.e. IL-18 and eicosanoid release, which are only observed in the former and can have effects later in infection.” We thank the reviewer for their time and effort in assessing our manuscript.

      We agree with the reviewer’s overall assessment. One minor clarification is that the engineering used does not express the proteins in “all growth phases”, but rather only when the SPI2 T3SS is expressed; we used the sseJ promoter, which is a SPI2 effector.

      Reviewer #2 (Public Review):

      In this study, Abele et al. present evidence to suggest that two different forms of regulated cell death, pyroptosis and apoptosis, are not equivalent in their ability to clear infection with recombinant Salmonella strains engineered to express the pro-pyroptotic NLRC4 agonist, FliC ("FliC-ON"), or the pro-apoptotic protein, BID ("BID-ON"). In general, individual experiments are well-controlled, and most conclusions are justified. However, the cohesion between different types of experiments could be strengthened and the overall impact and significance of the study could be articulated better. ”

      We thank the reviewer for their time and effort in assessing our manuscript. We agree with the reviewer’s overall assessment.

      Reviewer #1 (Recommendations For The Authors):

      Abstract: While new terms are sometimes useful for the visualization of concepts and I appreciate the "bucket list" analogy, it is not yet an accepted term in cell death research, and using it twice in the abstract seems out of order. ”

      We opted to keep the term, but reduce its use to once in the abstract with a specific comment on the recent coining of the term: “We recently suggested that such diverse tasks can be considered as different cellular “bucket lists” to be accomplished before a cell dies.” We recently coined this term in a review in Trends in Cell Biology, where three reviewers had quite positive comments about the concept. Time will tell whether this is a useful term for the cell death field or not.

      “In figure 2C-F Caspase 1 and Gsdmd deficient animals have higher levels of vector control strain than WT or Nlrc4. Could this be due to the redundancy with Nlrp3 in systemic infection described by Broz et al? Please mention in the description of the results.”

      The reviewer correctly points out a trend in the data. However, our experiments are not powered to show that this difference is statistically significant. Nevertheless, we now make note of the trend, and cite prior papers that have observed NLRC4 and NLRP3 redundancy against non-engineered S. Typhimurium strains.

      “The observation that apoptosis does not affect Salmonella systemically would be strengthened if the experiments using the BIDon strain could be taken out to a later time point, i.e. 72 or 96 h.”

      Indeed, we wanted to extend our studies to these timepoints. However, although expression of the SspH1 translocation signal is benign for 48 h, by 72 h this causes mild attenuation (regardless of whether the BID-BH3 domain is attached as cargo). We think that the degree of difficulty for SPI2 effectors to reprogram the vacuole increases over time, and that only beyond 48 h does SPI2 need to function at peak efficiency. This observation will be reported in a second manuscript that is written and will be submitted within this month. We are happy to supply this manuscript to reviewers if they would like to see the results. We also added text to the discussion to alert the reader to the caveats of engineering S. Typhimurium at later timepoints.

      “Discussion: The authors claim that pyroptotic and apoptotic signaling in IEC have the same outcome and IEC only has extrusion as a task. However, upon pyroptosis, IEC also releases IL-18 and eicosanoids, which is not the case during apoptosis. While the initial extrusion makes all the difference in early infection, Mueller et al 2016 showed that lack of IL-18 has an effect on salmonella dissemination at a 72h time point. The FlicON model can not test later time points as the bacteria will be cleared by then, but this caveat should be discussed.”

      We revised the text in the discussion to make it clear that extrusion is not the only bucket list item for IECs, and that IL-18 and eicosanoids are included in the bucket list for IECs after caspase-1 activation, and add the citation to Muller et al.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript is written in a rather colloquial style. Additional editing is recommended. ”

      We edited the abstract to limit the use of the bucket list term and to make more clear that this is a new term that our lab has proposed in a recent review in Trends in Cell Biology. The managing editor for the current manuscript at eLife commented that the prose was lively and thoughtful. We would be happy to make edits if the reviewer has more specific suggestions.

      2) It is not obvious from the Results section that all mouse infections were, in fact, mixed infections. This should be stated more clearly. Additionally, there is a minor concern regarding in vivo plasmid loss over time.

      We added text to the results to make this clearer at the beginning of each in vivo figure in the paper. Our experiments are intentionally blind to any Salmonella that have lost the plasmid. These bacteria essentially convert to a wild type phenotype, and thus are no longer representative of FliCON or BIDON bacteria. We also verify the long established equal competition between pWSK29 (amp) and pWSK129 (kan) in Supplemental Figure 2A-B. Prior experiments from the laboratory of Sam Miller and others in the 1990s showed that plasmid loss occurs at a rate of less than 1%.

      3) Results shown in Figure 4 are difficult to interpret. Essentially, the experiment is aimed at comparing the two engineered Salmonella strains (FliC-ON and BID-ON). However, these strains are very different from one another, which may have a confounding effect on the interpretation of the data.”

      The reviewer has interpreted the experiment correctly. We wanted to make clear to the reader that the two strains induce apoptosis under different kinetics. Indeed, it would be very surprising if two different engineering methods created strains that caused apoptosis with identical kinetics. We make two text edits to the results to make this clearer, concluding with “Overall, both ways of achieving apoptosis are successful in vitro, but with slightly different kinetics.”.

      4) What new insights into mechanisms of bacterial pathogenesis and host response are gained by using recombinant Salmonella (over)expressing a pro-apoptotic protein is not clearly stated.”

      We modify the introduction to make this more clear, stating: “Here, we investigate whether apoptotic pathways could be useful in clearing intracellular infection. Because S. Typhimurium likely evades apoptotic pathways, we again use engineering in order to create strains that will induce apoptosis. This allows us to study apoptosis in a controlled manner in vivo.”

      5) The Discussion section, while provocative, seems speculative and should be revised. Concepts of "backup apoptosis" and crosstalk between pyroptosis and apoptosis are intriguing, but it seems implausible to this reviewer that a cell might "know" that it will die, might "choose" how to die, and might aim to complete a "bucket list" before it loses all functional capacity. The usage of these types of terms does not help bolster the authors' central conclusions. ”

      We agree that cells do not “choose” pathways for regulated cell death. We had over-anthropomorphized the concepts surrounding these interconnected cell death pathways that are created by evolution. We edited the introduction and discussion to remove the “choose” term. However, we kept the second phrase using “know” in the discussion with an added clarifier: “Once a cell initiates cell death signaling, it “knows” that it will die (or rather evolution has created signaling cascades that are predicated upon the initiation of RCD).”. Sometimes anthropomorphizing scientific concepts can be a useful tool to facilitate understanding of complex scientific concepts. For example, the “Red Queen hypothesis” clearly anthropomorphizes the concept of continuous evolution to maintain an evolutionary equilibrium. We have found that scientists in the cell death field often think that modes of cell death are or should be interchangeable. We hope that the idea of the “bucket list” will help to crystalize the idea that distinct processes leading up to different types of regulated cell death can have very different consequences during infection.

      Additional Comments from the Reviewing Editor:

      1) The authors show that FliC-ON is not cleared from the spleen of Casp1 KO or Gsdmd KO mice. The conclusion is that the backup apoptosis pathways that should be present in these mice are insufficient to clear the bacteria from the spleen. However, although it is shown that bone marrow macrophages undergo apoptosis in vitro, I believe it is not shown that the apoptotic pathways are actually activated in the spleen. This seems like an important caveat. Could it be shown (or has it previously been shown) that the cells infected in the spleens of Casp1 KO or Gsdmd KO are activating apoptosis? If not, it seems possible that the reason the bacteria are not cleared is due to a lack of apoptosis activation rather than an ineffectiveness of apoptosis, and the authors could consider explicitly acknowledging this.”

      We agree, and added to the discussion “A final possibility is that our engineered strains are not successfully triggering apoptosis within splenic macrophages. This could be due to intrinsic differences between BMMs and splenic macrophages or could be due to bacterial virulence factors that fail to suppress apoptosis only in vitro. It is quite difficult to experimentally prove that apoptosis occurs in vivo due to rapid efferocytosis of the apoptotic cells.”

      2) Both reviewers were somewhat unhappy about some of the new terminology/metaphors that are introduced in the manuscript. I understand the reviewers' concerns but also feel that the writing is lively and thoughtful. It is up to the authors to decide whether to retain their new terminology, but the response of two expert reviewers might give the authors some pause. At a minimum, to address the concern about an unfamiliar term being used in the abstract, perhaps explicitly state that you are introducing "bucket list" as a new concept to help explain the results. The introduction of this concept may indeed be one of the novel contributions of the manuscript.”

      We opted to keep the term, but reduce its use to once in the abstract with a specific comment on the recent coining of the term: “We recently suggested that such diverse tasks can be considered as different cellular “bucket lists” to be accomplished before a cell dies.” We recently coined this term in a review in Trends in Cell Biology, where three reviewers had quite positive comments about the concept. Time will tell whether this is a useful term for the cell death field or not.

      3) Perhaps this is implied in the discussion already, but it might make sense to state the obvious difference between IECs and splenic macrophages which is that the death of the former results in the removal of the cell and its contents (i.e., Salmonella) from the tissue, whereas the death of the latter does not. This seems like the simplest explanation for why apoptosis restricts bacterial replication in IECs but not macrophages, and I am not sure if introducing the concept of a "bucket list" improves the explanation or not.”

      We agree that this narrative nicely distills the differences between these cell types. We edited the final paragraph of the discussion to include this narrative.

      4) Lastly, some minor comments

      -- p.2 "hyperactivate" instead of "hyperactive"?”

      Corrected.

      -- the authors may also want to mention Shigella, as it might provide another example that apoptotic C8dependent backup protects IECs”

      Yes, indeed, this is a good comparison to make. We added this to the discussion.

      -- p.8, in case readers are unfamiliar with the concept of a PIT, the authors should perhaps cite their own work when they first mention this concept (at the top of the page)”

      Indeed, citation added.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Note to reviewer and editor:

      In the previous version of the manuscript, we referred to ‘prevalent’ disease at baseline (e.g., prevalent cardiovascular disease). We have since changed this throughout the manuscript to ‘past or prevalent’ disease. This is a more accurate description as we ascertained diseases which occurred prior to baseline but may have been resolved by the time of the accelerometry study.

      Responses to reviewer 1:

      • I assume that not every participant provided data on all 7 nights. Did the authors exclude those who had fewer number of nights with accelerometer data (e.g., only 2-3 days), as the SRI based on fewer nights may not reliably reflect sleep regularity compared with SRI based all 7 consecutive nights?

      It is correct that not every participant provided complete accelerometry data. Most participants (88%) provided complete data. We only included participants who provided at least 2 valid measurements of the SRI (requiring valid data for at least 2 pairs of contiguous 24-hour periods). This is described in the appendix, but we have additionally now added this detail to the main text:

      “Most participants (88%) provided complete accelerometry data. Participants with fewer than two valid SRI measurements (i.e., less than 2 contiguous 24-hour wear periods; <1%) were excluded.”

      We would also like to note that our statistical analysis accounted, to some extent, for the lower reliability of SRI estimates in those with fewer nights of data. In those with sparse data, their estimated average SRI value would be pulled towards the overall sample average relatively more than for those with complete data. This is a consequence of the ‘partial pooling’ of the linear mixed effects model.

      • The primary analysis and results were based on restricted cubic spline models that allow assessment of nonlinearity. This is different from the usual strategy that starts with the simpler linear relationship and further explores potential nonlinear relationships. Did the authors have a strong rationale for a nonlinear dose-response relationship between sleep regularity and mortality, so that the assessment of linear relationships was skipped?

      We chose to model the SRI with a restricted cubic spline for two reasons. Firstly, we did expect non-linearity to be present a-priori. Partly this was because other sleep exposures (especially sleep time) have known non-linear relationships with health outcomes. We also thought that it is was plausible that a ‘plateau’ might be present, which we wanted to capture. Secondly, we decided that our primary model should be sufficiently flexible from the outset in order that we did not need to make data-driven adjustments to our model specification (e.g., adding non-linear terms depending on the results of hypothesis tests). This approach we believe to be generally safer as making data-driven changes can undermine the validity of standard errors and p-values.1

      • Was the proportional hazards assumption violated in the Cox modeling? Were discrete-time hazard models used to address the violation of the modeling assumption? Please clarify.

      Yes, the proportional hazards assumption was violated for all models except for the cardiovascular disease death model. This was the rationale for the use of the discrete time hazards model. They allowed for the inclusion of a flexible time by SRI interaction, allowing the hazard ratio to vary over the follow-up period. We have made this clearer in our revision. The following text has been added to the statistical methods:

      “In addition to Cox models, discrete-time hazards models, including an interaction between SRI and time (aggregated into 3-month intervals and modeled with a restricted cubic spline with knots at the 5th, 35th, 65th, and 95th percentiles), were fitted to relax the assumption of proportionality and allow hazard ratios (HRs) to vary over time. The SRI by time interaction in this model provided a test of proportionality (a small p value would indicate strong evidence against the proportional hazards assumption).”

      • Please provide correlations between different sleep regularity measures. Although different measures lead to the same conclusion, it is interesting that SRI appeared to provide stronger signals with mortality than the other two SD measures. In addition to what was discussed by the authors, another possibility is that SRI also captures the regularity of napping during the day which is common in older populations.

      Thank you for this helpful suggestion. We have added a correlation matrix for the different sleep regularity measures (Table S1). We have additionally added the following text to the Results:

      “The SRI was modestly negatively correlated with the sleep duration SD (-0.32) and sleep onset time SD ( 0.42; see correlation matrix in Table S1).”

      Regarding napping during the day, the algorithm we used to make determinations of sleep and wake unfortunately is not able to identify napping. This is because, in the absence of a sleep diary, it is very difficult to distinguish napping from inactivity in accelerometry data. The algorithm that we used requires the estimation of a ‘sleep period time window’, defining the period from the beginning to the end of the main sleep bout, in which sleep can be identified. Any sleep outside of this window is treated as inactivity. While some methods have been developed to estimate napping time from accelerometry without a sleep diary, we are not aware of any that are validated for adults using wrist worn accelerometers.

      This is something that was not sufficiently clear from the current manuscript. We have had added the following text to ensure this is clear in the revised version.

      Methods:

      “To distinguish sleep from sustained periods of inactivity without reference to a sleep diary (not available in the UKB), GGIR uses an algorithm to determine a daily ‘sleep period time window’ for each participant.11 This defines the time window between the onset and end of the main daily sleep period, during which periods of sustained inactivity are interpreted as sleep. The algorithm does not, by default, detect bouts of sleep outside of this window and hence is not able to identify naps.”

      Discussion:

      “In addition, sleep diaries in the UKB were not available. Consequently, the algorithm we used to determine sleep and wake relied on the identification of a main ‘sleep period time window’ and did not identify napping..”

      • Table 1 - I would suggest adding additional columns showing the variable distributions across quantiles of the SRI, which can help understand the confounding structure and the covariate associations with SRI.

      We agree that this is a good idea and we have adjusted Table 1 accordingly.

      • Figure 1 and related supplemental Figures: it would be good to label in the figure the specific HR estimate and 95% CI mentioned in the manuscript.

      Thank you for this suggestion. We agree that this would be helpful. After some consideration, we have decided to leave the figures as they are for one primary reason. This is that we want to avoid over-emphasising the 5th and 95th quantiles. As discussed above, we chose to present HRs for these quantiles as they would provide a complement to the Figures which would assist in communication (for some readers, the key results might be easier to glean from these numeric summaries than from the Figures). However, we don’t wish to overemphasise these quantiles when the full ‘dose-response’ function we believe to be of the greatest interest.

      • Additional stratified analyses by main sociodemographic factors (age, sex, SES, etc) and sleep variables (sleep duration and sleep quality) would be informative to understand the population heterogeneity in the association between sleep regularity and mortality

      Thank you for this suggestion. We have assessed effect modification across a range of key background variables (age, sex, household income, sleep duration, moderate to vigorous physical activity, prevalent CVD, and prevalent cancer). This has been added to the results. Where meaningful evidence of effect modification was noted, we have presented results within strata of the effect modifier.

      • Some brief discussion on socioeconomic aspects of sleep is needed (the authors focused on the biological mechanisms underlying the observed association), as emerging evidence suggests that sleep health is not only a biological but also a social construct. For example, a recent study in the US found that sleep regularity is the most important contributor to racial/ethnic disparities in sleep health (see PMID: 34498675).

      We agree that sleep health is both a biological and social construct. We have added the following text to the discussion to address this comment:

      Discussion:

      “Furthermore, identifying the determinants of poor sleep regularity may be of import, not only considering biological factors, but broader social determinants that impact circadian rhythmicity (e.g., racial/ethnic disparities32, neighbourhood factors33) and consequently overall health.”

      References

      1. Harrell FE. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. vol 608. Springer; 2001.
  3. Oct 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the critical review of our manuscript. We believe that we have addressed the questions and concerns raised by the reviewers to the best of our ability. As part of the revision, we conducted two new experiments to enhance the rigor of the conclusions and to provide more insights into the mechanism of STEAP proteins, and we reorganized the Results section, as suggested by the reviewers, following to a clearer logical thread. The new data are briefly summarized below.

      1) Reduction of L230G STEAP1 by reduced FAD. We made Leu230Gly STEAP1 mutant and measured the rate of heme reduction by reduced FAD. We found that the rate of heme reduction in L230G STEAP1 is slower than that in the wild type STEAP1. Since Leu230 is solvent accessible only from the intracellular side, this result supports the conclusion that reduced FAD binds to STEAP1 on the intracellular side and reduces the heme. This result also indicates that leucine, which is found at the equivalent position in STEAP1, 2 and 3, and Phe359 in STEAP4, has a significant role in mediating electron transfer from FAD to the bound heme.

      2) Reduction of STEAP2 by reduced FAD. We showed that STEAP2 can be reduced when supplied with reduced FAD, and that the rate of heme reduction is significantly slower than that of reduction of STEAP1 by reduced FAD. This result is consistent with presence of the oxidoreductase domain (OxRD)† in STEAP2, which hampers direct entrance of the isoalloxazine ring of FAD to its binding pocket in the transmembrane domain (TMD). On the other hand, the rate of heme reduction by reduced FAD is much faster than that of heme reduction in the presence of NADPH and FAD, indicating that reduction of FAD by NADPH is rate-limiting in the electron transfer chain in STEAP2.

      †: To be consistent with literature, we adopted the nomenclature “oxidoreductase domain (OxRD)” for the N-terminal soluble domain in STEAP proteins. We used the term “reductase domain (RED)” in the previous version of our manuscript.

      Reviewer #1 (Public Review):

      This important study reveals the structure of human STEAP2 for the first time and suggests the electron transport pathway, but some questions remain regarding the interpretation of the in vitro electron transport experiments, the lack of available redox couples, and potential alternative hypotheses that would if addressed, strengthen the claims in the manuscript.

      Strengths

      One of the clear strengths of the manuscript that stands out is the determination of the structure of human STEAP2. The structures of some other homologs are known, but STEAP2's structure was not, and STEAP2 seems to have an unusually low activity towards certain metal chelates. The approach of producing the human STEAP2 in insect cells with the supplementation of cofactor biogenesis components nicely results in cofactor-replete protein. The structure of STEAP2 reveals a domain-swapped trimer, with the NADPH-binding domain of the neighboring protomer within electron-transport distance of the FAD-heme axis. The FAD has an interesting and somewhat unusual extended conformation and abuts a Leu residue that may regulate electron transport. Another strength of the manuscript is the demonstration that STEAP1, which does not have the internal NADPH binding domain, can interact modestly and shuttle electrons to the heme in STEAP1 through FAD. These experiments nicely expand information on the function of STEAP1 and provide a structural basis for electron transport in STEAP2.

      Weaknesses

      A major weakness in the manuscript lies with the kinetics data and how the data, as presented, are unclear to the reader regarding their impact and their connection to the purported electron transport scheme. While multiple sets of data are reported, the analysis in all cases is simply the reduction of a hexacoordinate heme and its related spectra and kinetic parameters. In most cases, it's unclear to the reader which part of the electron pathway is being tested in which experiment. Simple diagrams would be helpful in each case. However, it's also unclear if all of the potential order of addition experiments were actually performed; i.e., flavin but no NADPH; NADPH but no flavin; flavin before NADPH; flavin after NADPH, etc. As there are multiple permutations that should be tested to demonstrate the electron transport pathway, presenting the data in a way that makes it clear to the reader is challenging. Particularly missing are the determined redox potentials of the hemes in both STEAP1 and STEAP2. Could differences in these heme redox potentials be drivers of the difference in metal reduction rates?

      We re-structured the manuscript to follow a clearer logical thread. We provided explanations for which electron transfer steps are being examined in each experiment.

      We cannot reliably determine EM due to insufficient amount of purified proteins. We are inclined to think that the bound heme on STEAP1 and STEAP2 have similar EM, based on their similar coordination geometry and nearly identical UV-Vis and MCD spectra. Thus, different rates of Fe3+-NTA reduction by STEAP1 and STEAP2 are likely due to differences in substrate binding site rather than different EM.

      Also, the text indicates that STEAP2 does not show a reduction rate dependence on the [Fe3+NTA], but Figure 1A shows a difference in rates dependent on [Fe3+-NTA], and the shape of the reduction curve also changes based on [Fe3+-NTA]. This discrepancy should be rectified.

      We fixed this error. The reduction of Fe3+-NTA by ferrous STEAP2 shows multiple phases and the reaction rates within the initial 2 seconds are weakly dependent on [Fe3+-NTA].

      A second major weakness is the lack of any verification of the relevance of the STEAP2 oligomerization to its in vivo function. Is the same domain-swapped trimer known to exist in vivo? If the protein were prepared in other detergents, is the oligomerization preserved? It is alluded to in the text that another STEAP protein is also a trimer. Was this oligomerization verified in vivo?

      The domain-swapped assembly is an interesting phenomenon, and it seems to provide a solution for bringing the FAD closer to heme. The same domain swapped trimeric assembly is also observed in the structure of STEAP4, which was purified in a different detergent (Nat Commun (2018), 9, page 4337). It is likely that this feature is shared by STEAP2, 3, and 4, and preserved during the purification process.

      Could this oligomerization be disrupted to impede or abrogate electron transport to underscore the oligomerization relevance? This point is germane, as it would further suggest that the domain-swapped trimer observed in the STEAP2 cryo-EM structure is physiologically relevant, especially given how far the distance between the NADPH and the FAD would otherwise be to support electron transport.

      We agree with the reviewer’s reasoning that the oligomeric assembly is required for proper function of STEAPs and thus could potentially be utilized for functional regulation. However, we are not aware of any physiologically relevant stimuli or treatment that would allow regulation of STEAP functions by inducing or forming different oligomeric states. Our experience with STEAP proteins is that the trimeric assembly is stable and well-preserved during the purification process and can only be disrupted under denaturing conditions such as SDS-PAGE.

      Beyond these two areas in which the manuscript could be improved there are a couple of minor considerations. First, the modest resolution of the STEAP2 structure prevents assigning the states of NADP+/NADPH and FAD/FADH2 with confidence. An orthogonal measure would be useful for modeling the accurate states in the structure.

      We agree. We clarified the ambiguity and stated in the main text that the cryo-EM structure of STEAP2 was determined in the presence of NADP+ and FAD.

      Finally, the BLI b5R/STEAP1 binding/unbinding fits seem somewhat poor, especially at high concentrations of b5R in the dissociation regime, which likely influences the derived value of Kd. A different fitting equilibrium might yield better agreement between the experimental and theoretical results. Moreover, whether this binding strength is influenced by the reduction state of the NADPH would be helpful in understanding and contextualizing the weak binding affinity.

      We think that non-specific binding likely causes deviations from the simple binding model at higher b5R concentrations. We made a comment on this in the main text. We agree with the reviewer that the interactions between b5R and STEAP1 could be redox dependent, for example, a reduced FAD on b5R may enhance the affinity. We could implement this by performing the binding experiments in an anaerobic chamber, but this is beyond the scope of the current study.

      Reviewer #2 (Public Review):

      The manuscript provides new insight into a family of human enzymes. It demonstrates that STEAP2 can reduce iron and STEAP1 can be promiscuous regarding the source of electron donors that it can use. The quality of the kinetics experiment and the structural analysis is excellent. I am less enthusiastic about the interpretation of data and the take-home message that the manuscript intends to deliver. Above all, the work combines data on STEAP2 and STEAP1 and it remains unclear which questions are ultimately addressed. A second critical point is about the interpretation of the experiment demonstrating that STEAP1 can be reduced by cytochrome b5 reductase. The abstract states that "We show that STEAP1 can form an electron transfer chain with cytochrome b5 reductase" whereas the main text discusses the data by suggesting that "we speculate that FAD on b5R may partially dissociate to straddle between the two proteins". The two statements seem to be partly contradictory. Cytochrome b5 reductases do not easily release FAD but rather directly donate electrons to heme-protein acceptors (PMID: 36441026). According to the methods section, no FAD was added to the reaction mix used for the cytochrome b5 reductase experiment. Overall, the data seem to indicate that the reductase might reduce the heme of STEAP1 directly. Would it be possible to check whether FAD addition affects the kinetics of the process?

      We agree with the reviewer on this point. We do not have evidence indicating that FAD fully or partially dissociates from b5R to interact with STEAP1. We removed the statement in the revision.

      We have not tried to add free reduced FAD to the mixture of STEAP1/b5R/NADH, because we feel that this would increase the complexity of the system and complicate data interpretation. We are working on determining the structure of b5R in complex with STEAP1 to visualize the electron transfer pathway, and we hope that such a structure would provide a framework for understanding electron transfer between the two proteins.

      And to perform a control experiment to check that NAD(P)H does not directly reduce the heme of STEAP1 (though unlikely)?

      We did the control experiment and included data in Fig. S3A. No reduction of heme by NADH alone.

      A final point concerns the "slow Fe3+-NTA reduction by STEAP2". The reaction is not that slow as the initial phase is 2 per second. The reaction does not show dependence on the substrate concentration suggesting "poor substrate binding". I am not convinced by this interpretation. Poor substrate binding would give rise to substrate dependency as saturation would not be achieved. A possible interpretation could be that substrate binding is instead tight and the enzyme is saturated by the substrate. Can it be that the reaction is limited by non-productive substrate-binding and/or by interconversions between active and non-active conformations? We re-analyzed the data and revised the Results and Discussion.

      We agree with the reviewer on this point. We re-analyzed the data and found that the reaction rates within the first 2 seconds are weakly dependent on [Fe3+-NTA] while the rates beyond 2 seconds do not show dependence on [Fe3+-NTA]. More studies are required to unravel the mechanism that leads to the complicated kinetic data.

      Reviewer #3 (Public Review):

      The six-transmembrane epithelial antigen of the prostate (STEAP) family comprises four members in metazoans. STEAP1 was identified as integral membrane protein highly upregulated on the plasma membrane of prostate cancer cells (PMID: 10588738), and it later became evident that other STEAP proteins are also over expressed in cancers, making STEAPs potential therapeutic targets (PMID: 22804687). Functionally, STEAP2-4 are ferric and cupric reductases that are important for maintaining cellular metal uptake (PMIDs: 16227996, 16609065). The cellular function of STEAP1 remains unknown, as it cannot function as an independent metalloreductase. In the last years, structural and functional data have revealed that STEAPs form trimeric assemblies and that they transport electrons from intracellular NADPH, through membrane bound FAD and heme cofactors, to extracellular metal ions (PMIDs: 23733181, 26205815, 30337524). In addition, numerous studies (including a previous study from the senior authors) have provided strong implications for a potential metalloreductase function of STEAP1 (PMIDs: 27792302, 32409586).

      This new study by Chen et al. aims to further characterize the previously established electron transport chain in STEAPs in high molecular detail through a variety of assays. This is a wellperformed, highly specialized study that provides some useful extra insights into the established mechanism of electron transport in STEAP proteins. The authors first perform a detailed spectroscopic analysis of Fe3+-NTA reduction by STEAP2 and STEAP1, confirming that both purified proteins are capable of reducing metal ions. A cryo-EM structure of STEAP2 is also presented. It is then established that STEAP1 can receive electrons from cytochrome b5 reductase, and the authors provide experimental evidence that the flavin in STEAP proteins becomes diffusible.

      The specific aims of the study are clear, but it is not always obvious why certain experiments are performed only on STEAP2, on STEAP1, or on both isoforms. A better justification of the performed experiments through connecting paragraphs and proper referencing of the literature would improve readability of the manuscript. Experimentally, the conclusions are appropriate and mostly consistent with the experimental data, although one important finding can benefit from further clarification. Namely, the observation that STEAP1 can form an electron transfer chain with cytochrome b5 reductase in vitro is an exciting finding, but its physiological relevance remains to be validated. The metalloreductase activity of STEAP1 in vitro has been described previously by the authors and by others (PMIDs: 27792302, 32409586). However, when over expressed in HEK cells, STEAP1 by itself does not display metal ion reductase activity (PMID: 16609065), and it was also found that STEAP1 over expression does not impact iron uptake and reduction in Ewing's sarcoma (cancer) cells (PMID: 22080479). Therefore, the physiological relevance of metal ion reduction by STEAP1 remains controversial. The current work establishes an electron transfer chain between STEAP1 and cytochrome b5 reductase in vitro with purified proteins. However, the conformation of this metalloreductase activity of the STEAP1-cytochrome b5 complex will be required in a cell line to prove that the two proteins indeed form a physiological relevant complex and that the results are not just an in vitro artefact from using high concentrations of purified proteins.

      The work will be interesting for scientists working within the STEAP field. However, some of the presented data are redundant with previous findings, moderating the study's impact. For instance, the new structural insights into STEAP2 are limited because the structure is virtually identical to the published structures of STEAP4 and STEAP1 (PMIDs: 30337524, 32409586), which is not surprising because of the high sequence similarity between the STEAP isoforms. Moreover, the authors provide experimental evidence to prove the previous hypothesis (PMID: 30337524) that the flavin in STEAP proteins becomes diffusible, but the molecular arrangement of a STEAP protein, in which the flavin interacts with NADPH, remains unknown. Based on the manuscript title, I would also expect the in-depth characterization of STEAP1/STEAP2 hetero trimers (first identified by the authors), but this is only briefly mentioned. When taken together, this study by Chen et al. strengthens and supports previously published biochemical and structural data on STEAP proteins, without revealing many prominent conceptual advances.

      We thank the reviewer for information and the broader context. We have revised the manuscript to have a clearer logical thread.

      Reviewer #1 (Recommendations For The Authors):

      Please see the "Public Review" for recommendations.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions

      -The introduction should more clearly state which questions are being addressed and why STEAP1 and STEAP2 are investigated.

      We have revised the Introduction to make that clearer.

      -The manuscript should discuss more extensively and provide possible explanations for the substrate-independent kinetics of iron-reduction by STEAP2.

      We re-analyzed the data and found the rate constants of the reactions before 2 s are weakly [Fe3+NTA]-dependent. We ascribe the weak [Fe3+-NTA]-dependence to the partial rate-limiting by substrate binding. However, we do not have a good interpretation for the reaction kinetics after 2 s which does not show [Fe3+-NTA]-dependence.

      -"The rate of STEAP1(Fe(II)) oxidation by Fe3+-NTA is similar to those by Fe3+-EDTA or Fe3+-citrate, but the rates are significantly faster than STEAP2(Fe(II)) re-oxidation by Fe3+NTA (Fig. 1B)." The rates for STEAP1 should be given to substantiate this statement.

      We added Table S1 in the supplementary materials that includes the 2nd order association (kon) and the 1st order dissociation rate constants (koff) of iron substrates in STEAP1 and STEAP2. Data on Fe3+-EDTA or Fe3+-citrate by STEAP1 are from our previous study (Biochemistry, 2016). We also calculated the KDs of different iron substrates for STEAP1 and STEAP2.

      • "Our results indicate that STEAP2 can supply reduce FAD to initiate electron transfer in STEAP1." As discussed above, this statement should be discussed and analyzed

      We mixed 0.9 μM STEAP1, 1.1 μM STEAP2, and 2.2 μM FAD and added 60 μM NADPH to the system and found that the heme on both STEAP1 and STEAP2 are reduced. Since adding NADPH to STEAP1 plus FAD alone does not reduce the heme (Fig. S3B), we reasoned that reduction of the heme on STEAP1 is achieved by the reduced FAD produced on STEAP2. The reduced FAD likely dissociates from STEAP2 and then bind to STEAP1.

      -Experiments on "STEAP1 reduction by STEAP2" The experiments show that "STEAP2 can supply reduce FAD to initiate electron transfer in STEAP1." Could these results be explained by heterotrimer formation in agreement with the previous data published by the authors?

      In our experience, STEAP1 and STEAP2 homotrimers are stable and do not form heterotrimers when mixed. STEAP1/2 heterotrimers form only when the two proteins are co-expressed in cells (Biochemistry (2016) 55, 6673-6684).

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      1) As a very general point: the order in which the results are presented could be greatly improved to increase the readability for non-experts. To elaborate: The manuscript starts with the spectroscopic characterization of STEAP2, then suddenly the reductase activities of STEAP1 and STEAP2 are compared; subsequently, experiments are described involving STEAP1 and cytochrome b5 reductase; then the cryo-EM structure of STEAP2 is presented etc. As a non-expert reader, this presentation of the results is confusing, especially because the paragraphs are not always connected well, and there is a lot of switching between STEAP1 and STEAP2 data. A more logical order would be to first present the STEAP2 spectroscopy and structural data, then write a connecting paragraph on why it is important to also study the electron transfer chain in STEAP1, followed by the comparison of the STEAP isoforms and the data on STEAP1 alone. The authors should include sentences on why they performed each experiment. For example, why did they determine the structure of STEAP2. What were they after that they could not retrieve from the homologous STEAP4 and STEAP1 structures? Justification of the performed experiments will make it easier for the reader, and will establish a better connection between the paragraphs.

      We reorganized the data presentation in Results per the reviewer’s suggestions.

      2) The physiological relevance of metal ion reduction by STEAP1 remains controversial. Because the current work establishes an electron transfer chain between STEAP1 and cytochrome b5 reductase, could the authors perform an easy experiment where they over express both STEAP1 and cytochrome b5 reductase in a cell line? If such an experiment would reveal STEAP1-dependent metal-ion reduction, it would greatly improve this part of the manuscript. If no activity is observed, the established electron transfer chain could just represent an in vitro artifact from using high concentrations of purified proteins.

      This is an excellent point. We are not set up to perform the proposed experiment but will do so in the future.

      3) The manuscript states that metal ion reduction of purified STEAP2 is slow, and the authors explain this by the absence of density for the extracellular region between helices 3 and 4 that are present in the structures of STEAP4 and STEAP1, resulting in a less-well defined substratebinding site. Can the authors exclude that the less-well defined substrate-binding site is a result of the detergent extraction of STEAP2? Would it be possible to measure the reductase activity of STEAP2 in purified membranes?

      Detergent mostly interacts with the transmembrane domains and since the TMD region of STEAP2 aligns well with those of STEAP1 and STEAP4, we do not think that the disordered substrate binding region in STEAP2 is a consequence of detergent solubilization. It is difficult to conduct pre-steady state kinetic experiments using STEAP2 in membrane fractions.

      4) The manuscript would greatly benefit from citing the literature more comprehensively to acknowledge insightful findings from authors in the field; for example, the important work by the Lawrence lab from 2015 (PMID: 26205815), which biochemically proved that STEAPs bind a single heme and that FAD bridges the TMD and RED, is not cited. The authors also mention that STEAP proteins belong to the same family as NOX proteins and cite some NOX structure papers. However, they fail to cite the first NOX structure paper (PMID: 28607049), as well the manuscript that structurally compares NOXs and STEAPs (PMID: 32815713). Similarly, the authors use SerialEM for their cryo-EM data collection but cite an old paper instead of the more recent (and relevant) SerialEM publication (PMID: 31086343).

      We agree and added the references.

      5) Generally, the data presented in the manuscript appear of good technical quality. However, a 'Table 1' with all relevant cryo-EM data collection and refinement statistics is completely missing as far as I can see. The authors should definitely add this to allow for the judgement of structural data quality. Without it, the manuscript is not suitable for publication.

      We added Table S2 that includes relevant cryo-EM statistics.

      Minor points:

      6) The authors write in the abstract: 'STEAP2 - 4, but not STEAP1, have an intracellular domain that binds to NADPH and FAD'. This is not correct, because it has clearly been established that FAD also majorly binds to the transmembrane domain (this is even shown by the authors in the current manuscript as well).

      Agree, we corrected that in the revision.

      7) Sentence from the abstract and introduction state: 'It is also unclear whether STEAP1 has metal ion reductase activity' and 'it is unclear whether STEAP1 can form a competent electron transfer chain from NADPH'. The authors should definitely add "physiologically relevant" to these sentences. Namely, the senior authors themselves concluded in their 2016 Biochemistry paper (PMID: 27792302) that STEAP1 is capable of reducing metal ion complexes. Further indications that the transmembrane domain of STEAP1 displays metalloreductase activity was published by the Gros lab (PMID: 32409586), and it was also shown that fusing the RED of STEAP4 to the TMD of STEAP1 yields a functional protein in cells that reduces metal ions.

      Good point and we revised the text and included the references.

      8) Why is scheme 1 not just a summarizing figure?

      We could change Scheme 1 to a Figure if required by the copy editor.

      9) What is the purpose of Fig. 6? A larger depiction of Fig. 5e would likely be more relevant and should be considered as a replacement. Alternatively, the structure of STEAP1 (pdb 6y9b) could be shown in combination with Fig. 7, as the mutation is performed in STEAP1.

      We agree and made changes to the structural figures to enhance clarity.

      10) The manuscript now contains many, single panel figures. Certain main figures could easily be combined, for example, Fig. 1 and 2 and/or Fig. 3 and 4.

      We agree and made changes to reduce single panel figures.

      11) In Fig. 2, 3 and 4, the spectra show changes in peak heights as a result of the ferric to ferrous heme transition. However, a time component is missing in the legend. How long do these transitions take?

      We added the reaction times to the figure legends.

      12) The last part of the discussion states: 'The assembly of an intracellular RED with a membrane-embedded TMD observed in NOX, DUOX, and STEAPs naturally led to the notion that NADPH, FAD, and heme form an uninterrupted rigid electron-transfer chain that shuttles electron from the intracellular cellular NADPH to the extracellular substrates. While this may be true for NOX and DUOX, in which rapid supply of electrons to their extracellular substrates are essential to their biological functions, it may not apply similarly to STEAPs since it has only one heme in the TMD, and their electron transfer relies on shuttling of FAD.' The authors should mention here that the activity of NOX and DUOX is tightly regulated by accessory proteins, Ca2+ etc. Similarly, do the authors expect that the large distance between NADPH and FAD in the structures could represent a way to regulate/dampen the metal ion reduction rates of STEAPs in vivo?

      We agree. We mentioned the regulation of NOX and DUOX in Discussion. We remain puzzled by the large distance between NADPH and FAD in STEAPs and are in pursuit of a structure in which the two cofactors are “in touch” for electron transfer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript represents an elegant bioinformatics approach to addressing causal pathways in vascular and liver tissue related to atherosclerosis/coronary artery disease, including those shared by humans and mice and those that are specific to only one of these species. The authors constructed co-expression networks using bulk transcriptome data from human (aorta, coronary) and mouse (aorta) vascular and liver tissue. They mapped human CAD GWAS data onto these modules, mapped GWAS SNPs to putatively causal genes, identified pathways and modules enriched in CAD GWAS hits, assessed those shared between vascular and liver tissues and between humans and mice, determined key driver genes in CAD-associated supersets, and used mouse single-cell transcriptome data to infer the roles of specific vascular and liver cell types. The overall approach used by the authors is rigorous and provides new insights into potentially causal pathways in vascular tissue and liver involved in atherosclerosis/CAD that are shared between humans and mice as well as those that are species-specific. This approach could be applied to a variety of other common complex conditions.

      The conclusions are largely supported by the analyses. Some specific comments:

      1) It appears that GWAS SNPs were mapped to genes solely through the use of eQTLs. Current methods involve a number of other complementary approaches to map GWAS SNPs to effector genes/transcripts and there is the thought that eQTLs may not necessarily be the best way to map causal genes.

      We agree with the reviewer that multiple approaches can be used to map GWAS SNPs to genes, and eQTLs is only one way to do so. We focused on eQTLs mainly because we aim to address tissue-specificity of eQTLs and the relative higher abundance of eQTLs compared to other tissue-specific functional genomics data, such as pQTLs and epiQTLs. We now acknowledge this limitation in the discussion section in our revised manuscript and point to future studies utilizing other approaches to map GWAS signals to downstream effectors.

      2) Given the critical causal role of circulating apoB lipoproteins in atherosclerosis in both mice and humans and the central role of the liver in regulating their levels, perhaps the authors could use the 'metabolism of lipids and lipoproteins' network in Fig 3B as a kind of 'positive control' to illustrate the overlap between mice and humans and the driver genes for this network.

      We appreciate the reviewer’s excellent suggestion and now elaborate the findings in Fig 3B as a positive control in the results section.

      3) Is it possible to infer the directionality of effect of key driver genes and pathways from these analyses? For example, ACADM was found to be a KD gene for a human-specific liver CAD superset pathway involving BCAA degradation. Are the authors able to determine or predict the effect of genetically increased expression of ACADM on BCAA metabolism and on CAD? Or the directionality of the effect of the hepatic KD gene OIT3 on hepatic and plasma lipids and atherosclerosis.

      The Bayesian networks only have information on which genes likely regulate the others but not the up or down-regulation direction, and the network key driver analysis only considers the enrichment of GWAS candidate genes in the neighborhood of each key driver. Therefore, it is not possible to directly infer whether increasing or decreasing a key driver will lead to up or downregulation of the downstream pathways based on our current analysis. We could, however, examine correlations of key driver genes with downstream genes, or disease traits in relevant datasets. For instance, we checked the mouse atherosclerosis HMDP datasets for the correlations between select key drivers and clinical traits and found various key drivers shared and species-specific in aorta and liver significantly correlate with aortic lesion area and other traits of interest such as LDL levels, and inflammatory cytokines. We have added these new findings into the results section and supplemental tables.

      4) While likely beyond the scope of this manuscript, the substantial amount of publicly available plasma proteomic and metabolomic data could be incorporated into this multiomic bioinformatic analysis. Many of the pathways involve secreted proteins or metabolites that would further inform the biology and the understanding of how these pathways relate to atherosclerosis.

      We appreciate the reviewer’s valuable suggestion. Here we focused on liver and aorta gene regulatory networks to understand the tissue-specific mechanisms at the gene level. Indeed, plasma proteomic and metabolomic data could be further incorporated in future studies to understand the pathways captured in the circulation that can capture cross-tissue interactions mediated by secreted proteins and metabolites from different tissues. We have addressed this as a future direction in the discussion section.

      The findings here will motivate the community of atherosclerosis investigators to pursue additional potential causal genes and pathways through computational and experimental approaches. It will also influence the approach around the use of the mouse model to test specific pathways and therapeutic approaches in atherosclerosis. In addition, the computational approach is robust and could (and likely will) be applied to a variety of other common complex conditions.

      Reviewer #2 (Public Review):

      Summary:

      Mouse models are widely used to determine key molecular mechanisms of atherosclerosis, the underlying pathology that leads to coronary artery disease. The authors use various systems biology approaches, namely co-expression and Bayesian Network analysis, as well as key driver analysis, to identify co-regulated genes and pathways involved in human and mouse atherosclerosis in artery and liver tissues. They identify species-specific and tissue-specific pathways enriched for the genetic association signals obtained in genome-wide association studies of human and mouse cohorts.

      Strengths:

      The manuscript is well executed with appropriate analysis methods. It also provides a compelling series of results regarding mouse and human atherosclerosis.

      Weaknesses:

      The manuscript has several weaknesses that should be acknowledged in the discussion. First, there are numerous models of mouse atherosclerosis; however, the HMDP atherosclerosis study uses only one model of mouse atherosclerosis, namely hyperlipidemic mice, due to the transgenic expression of human apolipoprotein ELeiden (APOE-Leiden) and human cholesteryl ester transfer protein (CETP). Therefore, when drawing general conclusions about mouse pathways not being identified in humans, caution is warranted. Other models of mouse atherosclerosis may be able to capture different aspects of human atherosclerosis.

      We appreciate the reviewer’s valuable insight! Indeed, the specific HMDP atherosclerosis model may miss important mouse pathways that could have overlapped with the human pathways. We have added this important point to the limitations section under the discussion to caution the interpretation of the human-specific pathways, as they could be present in mice but are missed by the specific HMDP atherosclerosis dataset used.

      Second, the mouse aorta tissue is atherosclerotic, whereas the atherosclerosis status of the GTEX aorta tissues is not known. Therefore, it is possible that some of the human or mouse-specific gene modules/pathways may be due to the difference in the disease status of the tissues from which the gene expression is obtained.

      We agree with the reviewer that GTEx vascular tissues have unclear atherosclerosis status. However, in addition to GTEx, we also included the human STARNET dataset which contains vascular tissues from human patients with CAD. Therefore, we believe the comparability of the human and mouse vascular tissue datasets is reasonable.

      Third, it is unclear how the sex of the mice (all female in the HMDP atherosclerosis study and all male in the baseline HMDP study) and the sex of the human donors affected the results. Did the authors regress out the influence of sex on gene expression in the human data before performing the co-expression and preservation studies? If not, this should be acknowledged.

      We acknowledge that the effect of sex in the mouse and human datasets were not regressed out in our analysis. We have added this under the limitations section.

      Fourth, some of the results are unexpected, and these should be discussed. For example, the authors identify that the leukocyte transendothelial migration pathway and PDGF signaling pathway are human-specific in their vascular tissue analysis. These pathways have been extensively described in mouse studies. Why do the authors think they identified these pathways as human-specific? Similarly, the authors identified gluconeogenesis and branched-chain amino acid catabolism as human and mouseshared modules in the vascular tissue. Is there evidence of the involvement of these pathways in atherosclerosis in vascular cells?

      We agree with the reviewer that these unexpected findings warrant further discussion. As pointed out by this reviewer, it is possible that the mouse HMDP atherosclerosis dataset cannot fully represent all mouse atherosclerosis biology and therefore missed the leukocyte migration and PDGF pathways that were identified in the human datasets. Regarding the surprising findings of pathways such as BCAA catabolism in vascular tissues, we acknowledge that future studies will need to further investigate such pathway predictions but also highlight that these pathway terms have many shared genes with more commonly known pathways such as the TCA cycle, which may indicate the involvement of energy metabolism in vascular tissues in CAD development. We have added these points to the discussion section under limitations and concluding remarks.

      Overall, acknowledging these drawbacks and adding points to the discussion will strengthen the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) Could the authors comment on why MEGENA produces so many more co-expression modules per tissue than WCGNA?

      As described in the methods section, MEGENA uses a multi-scale clustering structure to generate network modules at different scales, with each scale representing a different compactness level of the modules. At lower compactness scales larger modules are generated; at higher compactness scales, smaller modules are generated. By using all modules obtained from different scales, the total number of modules is much larger than WGCNA which only generates a network at one scale.

      2) Much of the results section involves repeating in the text lists of pathways, modules, and genes that are also listed in Figures 2 and 3. The text in this part of the results could be used more productively to focus on illustrative examples or potential new biology.

      We have revised the results section to reduce repeating long lists of pathways, modules, and genes as suggested.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the weaknesses I mentioned in the public review comments, there are a few minor issues that I outline below:

      1) The authors should introduce atherosclerosis as the underlying cause of CAD in the Introduction. In fact, I believe there are many places in the manuscript where the authors mean atherosclerosis instead of coronary artery disease, especially when presenting and discussing mouse results since the HMDP study did not examine the coronary arteries of mice. I believe the authors should make the appropriate changes throughout the manuscript.

      We have made the changes as suggested.

      2) The authors state in the introduction, "For example, mice tend to develop atherosclerotic lesions in the aorta and carotids, while humans often develop lesions in coronary arteries (Ma et al., 2012)." This is not entirely correct, so this sentence should be revised. Several models of mice show coronary artery atherosclerosis development, but most researchers study lesions in larger aortas. Further, humans develop lesions throughout the arterial tree, but perhaps what the authors meant was the most consequential plaque development is in the coronary arteries. Please rephrase.

      We have rephrased the statement as suggested.

      3) Last line of page 5 should read "...which will drive modules and pathways that are more likely..." not "derive"

      Typo corrected.

    1. Author Response

      We appreciate the editor's and reviewers' time to review our manuscript. We will work on the suggestions and have provided an initial assessment of what we can do for our revised submission.

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsilesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We considered two approaches initially. The first approach was to look at specific projections to the motor regions, focusing on the MLR. The second approach was to utilize a whole-brain analysis that is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that a reasonable starting point was to examine the full connectome. The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are known to be complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were from the Allen Brain Atlas terminology and were presented as abbreviations. We have looked at other ways to present it, including a greater emphasis on raw numbers and highlighting motor-related subareas. We will rewrite the connectomics section to make it more accessible, reflecting the change in the figures.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point and could help simplify the whole-brain results. We can present the motor-related inputs and outputs as part of a new figure in the main paper and add accompanying text in the results section. This will help highlight possible therapeutic pathways. We can also enhance our discussion of these motor-related pathways. We will retain the entire dataset and present it in a supplementary table for those who are interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice, and as pointed out, Kleinfeld’s group had a nice, focused presentation of their data. For the connectomic piece, we can certainly adopt their reporting style, which, as you point out, may highlight key motor-related regions. There are a few ideas here that we can explore further, as mentioned above.

      2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and even kinematic aspects during stimulation could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We will revise as suggested. We will provide additional and/or updated data in revised figures and text. We will also move Supplementary Figures S1 and S2, which present additional locomotor data, into the main text to partly address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we will add this reference. It is useful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. The area targeted by Chen et al. (2023) is in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while useful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters. However, this is in contrast to recent work showing a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in Figure 5C and G. As mentioned in comments for Reviewer 1, we will also present data in a revised Figure 5 and/or a new figure that focuses specifically on motor-related pathways to provide information on possible therapeutic pathways. As suggested, absolute values will be shared in a supplemental table.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. Our aim is to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important and important to document.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      We acknowledge that our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner. While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020).

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we will be highlighting motor-related A13 pathways in a revised Figure 5 and/or a new figure. We hope that our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We will provide more details regarding numbers we can identify as a table and text.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      We agree with the reviewer that this aspect needs to be highlighted more. Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. We can say that the lack of locomotion observed in 6-OHDA models can be reversed by A13 region stimulation. We have discussed some aspects of the gain of function possibility but will augment this in other areas of the paper as well, as suggested.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the useful comments - we will update our discussion accordingly.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019)). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Reviewer #1 (Recommendations For The Authors):

      1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      Agreed - we will add quantification and create graphs to present the data in Figure 2.

      2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Fgure 2A-E but it should be replaced with 3A-E. Please do that.

      Will be done

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      We believe that overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents. Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). We will update our paper to reflect these references.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. We will update the introduction.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA. Please correct through the text.

      We will correct this.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Thank you, we will correct this.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      We will correct this.

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. As mentioned previously, we will provide more information on viral spread and optrode location. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150 nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier and will provide c-Fos quantification to illustrate the extent of co-localization with TH.

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Will correct.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Will correct.

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Will correct.

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1, 1-way RM ANOVA: F5,25 = 0.486, P = 0.783)). We will update this.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. We will correct the error in the legend. Great suggestion for F-I - we will move them ahead of the summary figures.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We will do this.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      Data did not follow a normal distribution and thus, were plotted as box and whiskers with the horizontal line through the box indicating the group median, interquartile range indicated by the limits of the box, and group minimum and maximum indicated by the whiskers. And indeed, a non-parametric equivalent of paired t-test (Wilcoxon signed-rank test) was used.

      Fig. S2B: add the statistical marker.

      Will do

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Will do

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Will do

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      We will add that time frame. Agreed, it is shorter than the behavioral work, which was started 3 weeks after 6-OHDA injection.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Will do

      Page 8, para 3, line 4. Double-check the reference.

      Will correct and update

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We will explore alternative methods to present the data.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      As mentioned earlier, we will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. We will mention analysis time after 6-OHDA and update Figure 1a to include this.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We will explore alternative methods of presenting the data, as suggested in a previous comment. Should we retain the correlation matrix, we will incorporate the reviewer’s suggestions.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Will do

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Will do

      10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Will do

      Page 10, para 2: the section should be written in the past tense.

      Will do

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Will do

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5C and G. As mentioned in comments for Reviewers 1 and 2, we will revise Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values will be shared in a supplemental table.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a very small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labeling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each individual animal.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we will provide absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Agree - it is too simplistic. We will remove it and replace it with one outlined in comments to Reviewer 1.

      Discussion

      Although interesting, the discussion is too long.

      We will make it more concise in the revised paper.

      Page 12: para 2. If the A13 region has a pro-locomotor effect and has therapeutical potential; the claim about its plasticity relies on Fig. 4 and 5, which have a limited scope in the current analysis and presentation (see comments above).

      We will revise the paper per the comments above and then update this accordingly.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      We will include this information.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” iScience 26 (7). https://doi.org/10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October. https://doi.org/10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January): 1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December): 102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M. Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April): 144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv : The Preprint Server for Biology, June. https://doi.org/10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study provides a framework bearing on the role of Eph-Ephrin signaling mechanisms in the clinically condition of amyotrophic lateral sclerosis. It provides compelling evidence for the roles of glial cells in this condition. This novel astrocyte-mediated mechanism may help identify future therapeutic targets.

      Drs. Huang and Zaidi: Thank you for considering this revision of our manuscript for potential publication in eLife. We have addressed the excellent comments of the two reviewers, including the addition of new data. We have included detailed response-to-reviewer comments below to address each specific point, and we have highlighted all the changes in the manuscript text (using a red font color) made in response to these comments. Based on the reviewers’ critiques, we feel our re-working of the manuscript has made for a greatly improved study.

      Reviewer #1 (Public Review):

      In the manuscript by Urban et al., the authors attempt to further delineate the role which non-neuronal CNS cells play in the development of ALS. Toward this goal, the transmembrane signaling molecule ephrinB2 was studied. It was found that there is an increased expression of ephrinB2 in astrocytes within the cervical ventral horn of the spinal cord in a rodent model of ALS. Moreover, the reduction of ephrinB2 reduced motoneuron loss and prevented respiratory dysfunction at the NMJ. Further driving the importance of ephrinB2 is an increased expression in the spinal cords of human ALS individuals. Collectively, these findings present compelling evidence implicating ephrinB2 as a contributing factor towards the development of ALS.

      We thank Reviewer #1 for the very helpful critique. We address each of the specific comments below (in the “Recommendations for the Authors” section of this Response to Reviewer Comments document), and have made changes to the manuscript based on these excellent points.

      Reviewer #2 (Public Review):

      The contribution of glial cells to the pathogenesis of amyotrophic lateral sclerosis (ALS) is of substantial interest and the investigators have contributed significantly to this emerging field via prior publications. In the present study, authors use a SOD1G93A mouse model to elucidate the role of astrocyte ephrinB2 signaling in ALS disease progression. Erythropoietin-producing human hepatocellular receptors (Ephs) and the Eph receptor-interacting proteins (ephrins) signaling is an important mediator of signaling between neurons and non-neuronal cells in the nervous system. Recent evidence suggests that dysregulated Eph-ephrin signaling in the mature CNS is a feature of neurodegenerative diseases. In the ALS model, upregulated Eph4A expression in motor neurons has been linked to disease pathogenesis. In the present study, authors extend previous findings to a new class of ephrinB2 ligands. Urban et al. hypothesize that upregulated ephrinB2 signaling contributes to disease pathogenesis in ALS mice. The authors successfully test this hypothesis and their results generally support their conclusion.

      Major strengths of this work include a robust study design, a well-defined translational model, and complementary biochemical and experimental methods such that correlated findings are followed up by interventional studies. Authors show that ephrinB2 ligand expression is progressively upregulated in the ventral horn of the cervical and lumbar spinal cord through pre-symptomatic to end stages of the disease. This novel association was also observed in lumbar spinal cord samples from postmortem samples of human ALS donors with a SOD1 mutation. Further, they use a lentiviral approach to drive knock-down of ephrinB2 in the central cervical region of SOD1G93A mice at the presymptomatic stage. Interestingly, in spite of using a non-specific promoter, authors note that the lentiviral expression was preferentially driven in astrocytes.

      Since respiratory compromise is a leading cause of morbidity in the ALS population, the authors proceed to characterize the impact of ephrinB2 knockdown on diaphragm muscle output. In mice approaching the end stage of the disease, electrophysiological recordings from the diaphragm muscle show that animals in the knock-down group exhibited a ~60% larger amplitude. This functional preservation of diaphragm function was also accompanied by the preservation of diaphragm neuromuscular innervation. However, it must be noted that this cervical ephrinB2 knockdown approach had no impact on disease onset, disease duration, or animal survival. Furthermore, there was no impact of ephrinB2 knockdown on forelimb or hindlimb function.

      We thank Reviewer #2 for the very helpful critique. We address each of the specific comments below, and have made changes to the manuscript based on all of these excellent points.

      The major limitation of the manuscript as currently written is the conclusion that the preservation of diaphragm output following ephrinB2 knockdown in SOD1 mice is mediated primarily (if not entirely) by astrocytes. The authors present convincing evidence that a reduction in ephrinB2 is observed in local astrocytes (~56% transduction) following the intraspinal injection of the lentivirus. However, the proportion of cell types assessed for transduction with the lentivirus in the spinal cord was limited to neurons, astrocytes, and oligodendrocyte lineage cells. Microglia comprise a large proportion of the glial population in the spinal grey matter and have been shown to associate closely with respiratory motor pools. This cell type, amongst the many others that comprise the ventral gray matter, have not been investigated in this study. Thus, the primary conclusion that astrocytes drive ephrinB2-mediated pathogenesis in ALS mice is largely correlative.

      This is an excellent point. While the majority of transduced cells were astrocytes, we did not identify the lineage of a portion of the transduced cells, which could consist of cell types such as microglia, endothelial cells and others, some of which have been linked to ALS pathogenesis. Nevertheless, we find that the cells expressing high levels of ephrinB2 in ventral horn of SOD1G93A mice are all astrocytes (as seen in Figure 1O-Q), strongly suggesting – though not definitively demonstrating – that astrocyte ephrinB2 is the pathogenic source in this model (even if our viral transduction did not solely target astrocytes).

      In the revised version of the manuscript, we now include an extensive paragraph in the Discussion section dedicated to this point.

      Importantly, we have toned down our conclusion by modifying the title by removing “…in spinal cord astrocytes…”. We changed the title from “EphrinB2 knockdown in spinal cord astrocytes preserves diaphragm innervation in a mutant SOD1 mouse model of ALS" to “EphrinB2 knockdown in cervical spinal cord preserves diaphragm innervation in a mutant SOD1 mouse model of ALS”.

      Further, it is interesting to note that no other functional outcomes were improved in this study. The C3-C5 region of the spinal cord consists of many motor pools that innervate forelimb muscles. CMAP recordings conducted at the diaphragm are a reflection of intact motor pools. This type of assessment of neuromuscular health is hard to re-capitulate in the kind of forelimb task that is being employed to test motor function (grip strength). Thus, it would be interesting to see if CMAP recordings of forelimb muscles would capture the kind of motor function preservation observed in the diaphragm muscle.

      We did perform forelimb grip strength analysis on these animals and found no effect of focal ephrinB2 knockdown. However, this functional assay is impacted more by distal forelimb muscle groups controlled by motor neuron pools located at more caudal locations of the spinal cord (i.e. low cervical and high thoracic), likely explaining the lack of effect on grip strength.

      Unfortunately, we did not perform this CMAP recording on forelimb muscle, and these mice have all already been sacrificed. We have added discussion of this point to the revised manuscript.

      On a similar note, the functional impact of increased CMAP amplitude has not been presented. An increase in CMAP amplitude does not necessarily translate to improved breathing function or overall ventilation. Thus, the impact of this improvement in motor output should be clearly presented to the reader.

      This is a very important point. While CMAP recording is a powerful assay of functional innervation of diaphragm muscle by phrenic motor neurons, it does not directly measure respiratory function. There are assays to test outcomes such as ventilatory behavior and gas exchange (e.g. whole-body plethysmography; blood gas measurements, etc.). We did not however perform these analyses. Respiratory function involves contribution of a number of other muscle groups, and these muscles are innervated by various motor neuron pools located across a relatively-large expanse of the CNS neuraxis. As we focally targeted ephrinB2 knockdown to only a small area, we would not expect effects on these other functional assays, which is why we restricted our testing to CMAP recording since this can be used to specifically study the phrenic motor neuron pool (and can be combined with detailed histological analyses in the cervical enlargement and at the diaphragm NMJ).

      Importantly, this is why we chose to use “preserves diaphragm innervation” in the manuscript title, as opposed to wording such as “preserves diaphragm function” in the title. In addition, have added this point to the Discussion section in the revised manuscript.

      Further, to the best of my knowledge, expression of Eph (or EphB) receptors has not been explicitly shown at the phrenic motor pool. It is thus speculative at best that the mechanism that the authors suggest in preserving diaphragm function is in fact mediated through Eph-EphrinB2 signaling at the phrenic motor pool. This aspect of the study would warrant a deeper discussion.

      We address this important comment with multiple pieces of data showing that Eph receptors are expressed in the phrenic motor neuron pool. EphrinB2 binds and activates EphBs, as well as EphAs such as EphA4. Importantly, previous work has linked expression of EphA4 in motor neurons to the rate of ALS progression (Van Hoecke, et al. Nature Medicine. 2012). Consistent with these studies, single-nucleus RNAseq on mouse cervical spinal cord shows that alpha motor neurons of cervical spinal cord express various EphA and EphB receptors (http://spinalcordatlas.org/; Blum et al., Nature Neuroscience, 2021; Alkaslasi et al., Nature Communications, 2021). In addition, this dataset identifies a phrenic motor neuron-specific marker (ErbB4); when we specifically look at the expression profile of only the ErbB4-expressing alpha motor neurons, the data reveal that phrenic motor neurons express a number of EphA and EphB receptors, including EphA4.

      To validate expression specifically of EphA4, we performed IHC for phosphorylated EphA4 (a marker of activated EphA4) on C3-C5 spinal cord sections from SOD1G93A mice injected with shRNAephrinB2 or control vector. We find that large ventral horn neurons are positive for phosphorylated EphA4. The ventral horn at these cervical spinal cord levels includes motor neuron pools in addition to just phrenic motor neurons; therefore, this result by itself does not conclusively show that phrenic motor neurons express EphA4, though they likely do since we find EphA4 expression in most ventral horn neuron cell bodies in C3-C5. A representative image is included in Supplemental Figure 1.

      In the revised manuscript, we added a paragraph to the Discussion section to address this important comment from the reviewer, including describing these data on Eph receptor expression.

      Lastly, although authors include both male and female animals in this investigation, they do not have sufficient power to evaluate sex differences. Thus, this presents another exciting future of investigation, given that ALS has a slightly higher preponderance in males as compared to females.

      As the reviewer notes, our studies are under-powered with respect to examining possible sex-specific effects. We now include a brief discussion of this issue in the revised manuscript.

      In summary, this study by Urban et al. provides a valuable framework for Eph-Ephrin signaling mechanisms imposing pathological changes in an ALS mouse model. The role of glial cells in ALS pathology is a very exciting and upcoming field of investigation. The current study proposes a novel astrocyte-mediated mechanism for the propagation of disease that may eventually help to identify potential therapeutic targets.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors.

      Both reviewers were enthusiastic about your paper. Reviewer (1) had some technical queries (see his/her items 2 and 4). Reviewer (2) had some questions about principles (items 1 and 2) with the remaining points being technical queries.

      We have addressed all comments of both reviewers. We detail our responses in this Response to Reviewer Comments document and have made the associated modifications to the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Questions and/or Recommendations:

      There is convincing evidence that there is increased expression of ephrinB2 over time in the mouse model of ALS. Is there a corresponding increase in astrocytes in this animal model?

      We previously published data showing quantification of astrocyte numbers within the spinal cord of this same SOD1G93A mouse model. Specifically, we performed this quantification in the ventral horn of the lumbar spinal cord following disease onset. We found that there was a modest increase in the number of GFAP+ astrocytes at this location and disease time point.

      [ Lepore et al. Selective ablation of proliferating astrocytes does not affect disease outcome in either acute or chronic models of motor neuron degeneration. Experimental Neurology. 211 (2): 423-32, 2008. ]

      One could speculate that the increase in ephrinB2 expression we observe across the ventral horn in the mutant SOD1 mice was solely due to this modest increase in astrocyte number. However, this is highly unlikely to be the case, as in wild-type mice and in mutant SOD1 mice prior to disease onset astrocytes (and all other cell types) express very low levels of ephrinB2. Throughout disease course in these mutant SOD1 mice, the ephrinB2 expression level in individual astrocytes dramatically increases (including across most or all astrocytes), suggesting that the total increase in ephrinB2 expression across the ventral horn was not due to just this modest increase in astrocyte numbers but was instead due to the dramatically elevated eprhinB2 expression in most/all astrocytes. We have added this point to the Discussion section in the revised manuscript.

      It would help the reviewer and readers to show a lower magnification image of Figure 2, panels O and P to demonstrate the reduction of ephrin B2 in the ventral horns.

      We have added the lower magnification images to Figure 2.

      It is commended that not all data was "positive". Figure 4 especially shows some of the limitations of eprhinB2 knockdown. This provides a realistic image - strengths and limitations - of this approach. Very well done!

      Thank you! In future work, we could employ alternative vector-based strategies to restrict transduction/knockdown to only astrocytes. With such experiments, it’s possible that the impact of ephrinB2 knockdown would not be the same, if ephrinB2 actions in non-astrocytes also plays a role in disease pathogenesis. We have added discussion of this same point to the revised manuscript in response to a similar comment above from Reviewer #2.

      Reviewer comment 4: Fig 6 - if possible can you please add demographic (age/sex) with each band?

      We have added this information to the Legend. For aesthetic reasons, we chose not to add this information directly to the figure itself and instead included all of this information for each sample/band in the Legend.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the manuscript addresses a novel aspect of the role of astrocytes in mediating ALS pathogenesis. I commend the authors for a well thought-out and clearly presented study. However, a few concerns limit the enthusiasm and deserve attention to improve the clarity of the report.

      The biggest limitation of this study is that microglia or other cell types (endothelial cells) have not been explored in this study. They constitute a big proportion of cell types in the spinal cord and to conclude that only astrocytes mediate ephrinB2 signaling in the ALS model would be a stretch without the proper stains.

      Please see our comments above to address this same point from Reviewer #2.

      A clear premise for the investigation of EphrinB2 ligands has not been presented in the introduction. The authors provide a good background on the emerging role of EphEphrin interactions in neurodegenerative diseases. But it is unclear how the authors landed on this sub-class of ephrins.

      We have added this premise to the Introduction section of the revised manuscript. In published work, ephrinB2 has been shown to be upregulated in reactive astrocytes and to be involved in disease pathogenesis in other neurological disease models (e.g. traumatic spinal cord injury).

      There are several acronyms that have not been defined in the manuscript, e.g. GPI.

      We now define “GPI” and all other abbreviations in the revised manuscript.

      While the authors state that males and females had been included in the study, their individual n's for various outcomes have not been presented in the results section. Further, n's are missing from the figure legends, which will aid the clarity of the presentation. Further, please clarify the ages of the mice in the methods section.

      (1) We now provide the n’s for males versus females for all analyses in the figure legends. (2) We also now include the total n for each experimental condition in all of the figure legends. (3) We also now state the ages of the mice for the various analyses in the Methods section.

      It appears that several statistical interactions have been summarized in the results section but inconsistently reported on figures.

      We now provide the exact n’s for each analysis in all figure legends. We include all of the details of the statistical analysis in the text of the Results section and do not include this text in the Legends; we do this for all figures to maintain consistency.

      I presume that when the authors write "the number of neurons with somal diameter greater than 200 μm and with an identifiable nucleolus was determined", the 200 was a typo. Mouse motor neurons do not have a diameter of 200 μm and perhaps the authors meant an area of 200μm2?

      We have corrected this: 200 μm2

      Authors should consider adding a quantification for the human tissue immunoblots.

      We have added the quantification of these human tissue data for ephrinB2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for their overall positive assessment of this work. We have carefully revised the manuscript and implemented near all reviewers’ public and confidential recommendations. We believe these modifications have strengthened the manuscript and hope it will further convince the editors and reviewers.

      We below provide a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      To further understand the plasticity of vestibular compensation, Schenberg et al. sought to characterize the response of the vestibular system to short-term and partial impairment using gaze stabilization behaviors. A transient ototoxic protocol affected type I hair cells and produced gain changes in the vestibulo-ocular reflex and optokinetic response. Interestingly, decreases in vestibular function occurred in coordination with an increase in ocular reflex gain at frequencies where vestibular information is more highly weighted over visual. Moreover, computational approaches revealed unexpected detriment from low reproducibility on combined gaze responses. These results inform the current understanding of visual-vestibular integration especially in the face of dysfunction.

      Strengths

      The manuscript takes advantage of VOR measurements which can be activated by targeted organs, are used in many species including clinically, and indicate additional adverse effects of vestibular dysfunction. The authors use a variety of experimental procedures and analysis methods to verify results and consider individual performance effects on the population data. The conclusions are well-justified by current data and supported by previous research and theories of visuo-vestibular function and plasticity.

      The authors thank reviewer 1 for emphasizing these positive aspects of the work.

      Weaknesses

      The manuscript describes the methodology as inducing reversible changes (lines 44, 67,) but the data shows a reversible effect only in hair cell histology (Fig 3A-B) not in function as demonstrated by the persistent aVOR gain reduction in week 12 (Fig 1C) and increase of OKR gain in weeks 6-12 (Fig 4C/D).

      Rodents exposed to IDPN in the drinking water show from complete to null reversibility of the function loss depending on the IDPN concentration and duration of exposure, and the relationship between exposure and effect varies as a function of species, strain and sex of the exposed animals (Llorens and Rodríguez-Farré, Neurotoxicol. Teratol., 1997; Seoane et al., J. Comp. Neurol. 2001; Sedó-Cabezón et al., Dis. Model. Mech., 2015; Greguske et al., Arch. Toxicol., 2019). In addition, there is individual variability. The concentration of IDPN and time of exposure used in this study were selected to result in a loss followed by complete reversion but, as noted by the referee, the reversion was complete on Hair cells, while the gaze stabilizing reflexes show differential degrees of recovery depending on the functional tests (complete recovery on OCR; partial on aVOR and OKR). These make the IDPN subchronic protocol an interesting methodology to study the long term consequences of partial/reversible inner ear impairment. To be more accurate in the description of the reversibility, we have now introduced the following changes:

      Lines 43: Subchronic exposure to IDPN in drinking water at low doses allowed for progressive ototoxicity, leading to a partial and largely reversible loss of function.

      Lines 67-68: We demonstrate that despite the significant recovery in their vestibulo-ocular reflexes, the visuo-vestibular integration remains notably impaired in some IDPN-treated mice

      Lines 578: Previous experiments (Greguske et al., 2021) had demonstrated that at these concentrations, ototoxic lesions produced by IDPN are largely reversible.

      Reviewer 1: The manuscript begins with the mention of fluctuating vestibular function clinically, but does not connect this to any specific pathologies nor does it relate its conclusions back to this motivation.

      Thank you. We have now added a conclusion (lines 525-552) to discuss the results in a clinical perspective.

      Reviewer 1: The conclusions of frequency-specific changes in OKR would be stronger if frequency-specific aVOR effects were demonstrated similar to Figure 4D.

      We have presented the frequency-selective effects in Figure 1 supplement and related text; changes observed in aVOR are mostly (see below) comparable for all frequencies >0.2Hz. However, we have edited the text to better highlight when the IDPN differentially affect aVOR tested at different frequencies (see lines 97-99).

      Reviewer #2 (Public Review):

      This is a very nice study showing how partial loss of vestibular function leads to long term alterations in behavioural responses of mice. Specifically, the authors show that VOR involving both canal and otolith afferents are strongly attenuated following treatment and partially recover. The main result is that loss of VOR is partially "compensated" by increased OKR in treated animals. Finally, the authors show that treatment primarily affects type I hair cells as opposed to type II. Overall, these results have potentially important implications for our understanding of how the VOR Is generated using input from both type I and type II hair cells. As detailed below however, more controls as well as analyses are needed.

      The authors thank reviewer 2 for positive evaluation regarding the potential implication of the work.

      Major points:

      Reviewer 2: The authors analyze both canal and otolith contributions to the VOR which is great. There is however an asymmetry in the way that the results are presented in Figure 1. Please correct this and show time series of fixations for control and at W6 and W12. Moreover, the authors are plotting table and eye position traces in Fig. 1B but, based on the methods, gains are computed based on velocity. So please show eye velocity traces instead. Also, what was the goodness of fit of the model to the trace at W6? If lower than 0.5 then I think that it is misleading to show such a trace since there does not seem to be a significant VOR.

      Figure 1 was modified as suggested. Panel B now shows velocity traces, and goodness of fit is reported in figure legend. Panel E now shows raw OCR traces at W0, W6, W12.

      Reviewer 2: This is important to show that the loss is partial as opposed to total. It seems to me that the treatment was not effective at all for aVOR for at least some animals. What happens if these are not included in the analysis?

      The reviewer is correct, there is indeed variability in the alteration observed during the treatment, as previously described and expected from previous experiments (Llorens and Rodríguez-Farré, Neurotoxicol. Teratol., 1997; Seoane et al., J. Comp. Neurol. 2001; SedóCabezón et al., Dis. Model. Mech., 2015; Greguske et al., Arch. Toxicol., 2019). It was actually one of the goal of the study to compare hair cell loss and VOR responses in individuals. The individual aVOR gain and phase responses during the IDPN treatment are all presented in Figure 1 supplement. aVOR was reduced in all animals, although 2/21 only showed a decrease of less than 10% of their initial gain at W6. If these were excluded from the analysis, the statistical differences between the 2 groups would be reinforced.

      Reviewer 2: Figure 2A shows a parallel time course for gains of aVOR and OCR at the population level. Is this also seen at the individual level?

      Yes, this is seen in individuals. This result is presented in Figure 2C and 2D which illustrate the similar effect of IDPN on aVOR and OCR responses at week 6 and week 12 at the individual level (each symbol represents an individual mouse). The plotted delta gain of both aVOR and OCR represents the relative loss of vestibular function for each individual mouse at W6 and W12, respectively.

      Reviewer 2: Figure 3: please show individual datapoints in all conditions.

      Figure 3 was modified to show individual datapoints in all conditions (see Figure 3 A2, A3, C2 and C3).

      Reviewer 2: Figure 4: The authors show both gain and phase for OKR. Why not show gain and phase for aVOR and OCR in Figure 1. I realize that phase is shown in sup Figures but it is important to show in main figures. The authors show a significant increase in phase lead for aVOR but no further mention is made of this in the discussion. Moreover, how are the authors dealing with the fact that, as gain gets smaller, the error on the phase will increase. Specifically, what happens when the grey datapoints are not included?

      As pointed by the reviewer, we have included all aVOR phase results in Figure 1 supplement and also stated it in the main text (lines 100-102). There is however no phase calculated for the OCR which is a static test, as better illustrated in new Figure 1E. Error in phase calculations increases as gain gets smaller. To take this into account, the phase corresponding to the grey points (VAF<0,5; corresponding to Gains<0.10) were not included in the statistical analysis of the aVOR phase. This point is now made clearer in methods lines 639-640.

      Reviewer 2: Discussion: As mentioned above, the authors should discuss the mechanisms and implications of the observed phase lead following treatment. Moreover, recent literature showing that VN neurons that make the primary contribution to the VOR (i.e., PVP neurons) tend to show more regular resting discharges than other classes (i.e., EH cells), and that such regularity is needed for the VOR should be discussed (Mackrous et al. 2020 eLife). Specifically, how are type I and type II hair cells related to discharge regularity by central neurons in VN?

      We have now added discussion regarding mechanisms and implications of the phase changes in lines 363-371. The authors thank reviewer 1 for pointing at the Mackrous et al. 2020 eLife paper which is now included in the updated discussion. The relations between type I and type II and discharge regularity in afferents and central VN is further discussed 442-449.

      Below we provide answers to specific recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer 1: Were hair cells counted for the whole organ? what was the control for epithelial size differences?

      The effect of the treatment on hair cells was estimated by counting numbers of cells in square area of the central and peripheral parts of the sensory epithelia. The text has been modified to better describe the method, lines 748-751.

      Reviewer 1: The title of the article leads readers to expect more emphasis on hair cell changes, while the content of the manuscript is more functional and encompassing the visual and vestibular systems.

      We have retained the original tittle.

      Reviewer 1: Please provide acronym definitions before they are used. Examples: HC (line 63), W6 etc (line 82-83)

      Done as suggested on lines 63, 82 and 107.

      Reviewer 1: Please describe the ages of animals used in the study.

      The animals used in the study were 6 weeks old at the beginning of the protocol and 20 weeks old at the end. The text has been modified accordingly, line 564.

      Reviewer 1: Consider changing "until" to "through" when describing time ranges (initially line 88), as the following time mentioned is included in the statement. E.g., line 216-217 sounds as if gain was insignificantly different at W12.

      Done as suggested, lines 88 and 219.

      Reviewer 1: Line 162: lower case for "immunostaining".

      Done, line 164.

      Reviewer 1: Consider regrouping or renumbering panels of Figure 3 for more clarity.

      Panels in Figure 3 were regrouped as suggested, with first the canal-related data in panels A-B followed by the utricule-related data in panels C-D.

      Reviewer 1: Lines 222-223: reword as gain increased not frequency.

      Thank you, the text has been reworded, line 224-225.

      Reviewer 1: It is unclear if the two subgroups revealed in CGR analysis (line 288) are relevant to the two groups described in VOR responses (line 137-138). Please clarify if these are the same mice or distinct clusters.

      The two subgroups found in the CGR analysis differ from the clusters revealed by the decrease of the aVOR gain; the text has been modified lines 300-301.

      Reviewer 1: Consider adding that irregular afferents + calyces are relevant specifically to type I HCs (lines 411-426).

      The text has been modified to clarify the contacts between the two types of vestibular afferents and hair cells, lines 431-435.

      Reviewer 1: Line 434: clarify which "scheme" given context before and after this phrase

      In order to clarify this part of the discussion, the text has been modified and this term no longer appears.

      Reviewer 1: Please indicate the time gap from surgery to treatment.

      The time gap from the surgery to treatment, at least 72h, has been updated in the methods, lines 575.

      Reviewer 1: Line 619-620: It is unclear if VOR and OKR sessions were randomized in order or if the authors have considered training or adaptive effects from the initial test session.

      VOR and OKR sessions were performed on different days to limit cross effects, lines 639-640.

      Reviewer 1: Line 688: typo-change ifG to IgG.

      modified, line 744.

      Reviewer 1: Line 692-693: were hair cells counted for the whole organ? what was the control for epithelial size differences?

      The effect of the treatment on hair cells was estimated by counting numbers of cells in square area of the central and peripheral parts of the sensory epithelia. The text has been modified to better explain the method, lines 748-751.

      Reviewer 1: Change decimal indicator to be consistent (commas used in lines 732, 759, 776, Figure 6C),

      Thank you; modified as suggested.

      Reviewer 1: Line 763: "stimulation at 0.5Hz &10{degree sign}/s" is unclear.

      The text has been modified, line 817.

      Reviewer 1: Line 765: bold (E)

      The police format has been updated, line 820.

      Reviewer 1: Line 770-771: A) insert OKR to be "mean delta aVOR and delta OKR gain", B) plot is OKR as a function of VOR.

      Thank you, done as suggested. The text has been modified, line 824. Reviewer 1: Describe Figure 6 delta at initial mention (line 784 instead of 786) Authors: thank you, done as suggested, line 839.

      Reviewer 1: It is unclear why the tables are included if never mentioned in the text.

      The tables are now mentioned, lines 90 and 218.

      Reviewer 1: Figure 1: is the observed gain for Sham group expected value rather than closer to 1?

      Yes, as the value reported on Figure 1 is a mean of the values obtained during aVOR test in the dark at frequencies in range 0.2-1Hz (see also Figure 1 Supplement).

      Reviewer 1: Figure 2: A) give enough space to see error bars at W2. Consider making test data more easily distinguishable. B) is OCR mean or specific stimulation? C/D) move 1Hz label from title to x-axis label as it does not describe OCR test. Figure 5: C) consider making color specific to frequency for better distinction on C+D as symbols previously indicated individual data. D) 1Hz specific to OKR? move to axis label instead of title

      The Figures 2 and 5 have been modified according to reviewer 1 suggestions.

      Reviewer 1: Figure 6: A/B) what time point are these, W12?

      Those points correspond to W6 and W12, the text has been updated to specify the time points, lines 834 and 835.

      Reviewer #2 (Recommendations For The Authors):

      The authors should perform additional analyses that will help strengthen their results.

      We are unsure about the exact implementation of this recommendation. However, we have strengthened our results by following all reviewers’ specific recommendations.

    1. Author Response

      Reviewer #1 (Public Review):

      Assessment:

      The manuscript titled 'Rab7 dependent regulation of goblet cell protein CLCA1 modulates gastrointestinal 1 homeostasis' by Gaur et al discusses the role of Rab7 in the development of ulcerative colitis by regulating the lysosomal degradation of Clca1, a mucin protease. The manuscript presents interesting data and provides a potential molecular mechanism for the pathological alterations observed in ulcerative colitis. Gaur et al demonstrate that Rab7 levels are lowered in UC and CD. However, a similar analysis of Rab7 levels in ulcerative colitis (UC) and Crohn's disease (CD) patient samples was conducted recently (Du et al, Dev Cell, 2020) which showed that Rab7 levels are found to be elevated under these conditions. While Gaur et al have briefly mentioned Du et al's paper in passing in the discussion, they need to discuss these contradictory results in their paper and clarify these differences. Additionally, Du et al are not included in the list of references.

      Strengths:

      The manuscript used a multi-pronged approach and compares patient samples, mouse models of DSS, and protocols that allow differentiation of goblet cells. They also use a nanogel-based delivery system for siRNAs, which is ideal for the knockdown of specific genes in the gut.

      Weaknesses:

      Du et al, Dev Cell 2020 (https://doi.org/10.1016/j.devcel.2020.03.002) have previously shown that Rab7 levels are elevated in a similar set of colonic samples (age group, number etc) from UC and CD patients. Gaur et al have not discussed this paper or its findings in detail, which directly contradicts their results. Clarification regarding this should be provided.

      We thank and appreciate the reviewer for bringing this point.

      The results shown by Du et al, Dev Cell, 2020 depict elevated expression of Rab7 in UC and CD patients compared to controls. In first occurrence, these results appear contradictory, but there may be a few possible explanations for this.

      Firstly, Rab7 expression levels may fluctuate in the tissue depending on the degree of the gut inflammation. This can be concluded from our observations in DSS-mice dynamics model and the human patient samples with mild and moderate UC. Furthermore, Du et al provide no information of the severity of the condition among the patients employed in the study. Our motive, in the current work, was to emphasise this aspect. This point was mentioned in the discussion section of the manuscript. However, in view of the reviewer’s concern, we now intend to add a detailed comment on this in the main text of the revised version of the manuscript.

      Secondly, the control biopsies in our investigation were acquired from non-IBD patients, and not what was done by Du et al., wherein biopsies from the normal para-carcinoma region of the colorectal cancer patients was used. One can not overlook the fact that physiological and molecular changes are apparent even in non-inflamed regions in the gut of an IBD or CRC patient. It is possible that the observed discrepancy arises due to the differences in the sample type used for comparing the Rab7 expression.

      Finally, the main sub-tissue region showing a decrease in Rab7 expression in UC samples, appeared to be the Goblet cells which was not covered by Du et al.

      Keeping these points in mind we do not think that there is a contradiction in our findings with that of Du et al., 2020. In the revised submission some of these explanations will be incorporated. Include Du et al in the reference list and add the comment in main text.

      This was an oversight from our side. We have actually mentioned Du et al., 2020 in the discussion (line number 338) but somehow the reference was missing in the main list. We will ensure that the reference is included in the revised version and that their findings are included both in main text and in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors report a role for the well-studied GTPase Rab7 in gut homeostasis. The study combines cell culture experiments with mouse models and human ulcerative colitis patient tissues to propose a model where, Rab7 by delivering a key mucous component CLCA1 to lysosomes, regulates its secretion in the goblet cells. This is important for the maintenance of mucous permeability and gut microbiota composition. In the absence of Rab7, CLCA1 protein levels are higher in tissues as well as the mucus layer, corroborating with the anticorrelation of Rab7 (reduced) and CLCA1 (increased) from ulcerative colitis patients. The authors conclude that Rab7 maintains CLCA1 level by controlling its lysosomal degradation, thereby playing a vital role in mucous composition, colon integrity, and gut homeostasis.

      Strengths:

      The biggest strength of this manuscript is the combination of cell culture, mouse model, and human tissues. The experiments are largely well done and in most cases, the results support their conclusions. The authors go to substantial lengths to find a link, such as alteration in microbiota, or mucus proteomics.

      Weaknesses:

      There are also some weaknesses that need to be addressed. The association of Rab7 with UC in both mice and humans is clear, however, claims on the underlying mechanisms are less clear. Does Rab7 regulate specifically CLCA1 delivery to lysosomes, or is it an outcome of a generic trafficking defect? CLCA1 is a secretory protein, how does it get routed to lysosomes, i.e. through Golgi-derived vesicles, or by endocytosis of mucous components? Mechanistic details on how CLCA1 is routed to lysosomes will add substantial value.

      We thank the reviewer for the insightful comment. We would like to bring forth the following explanation for each these concerns:

      (a) Our immunofluorescence imaging experiments revealed co-localization of Rab7 protein with CLCA1 and the lysosomes (Fig 7I). In addition, the absence of Rab7 affects the transport of CLCA1 to lysosomes (Fig 7J). This demonstrates that Rab7 may be involved in regulation of CLCA1 transport (presumably along with other cargo), to lysosomes selectively. However, we do recognise that the point raised by the reviewer about possible effect of a generic trafficking defect is valid. (b) As mentioned in the manuscript, the trafficking of CLCA1 protein or CLCA1-containing vesicles within the goblet cell is unknown, with no information on the proteins involved in its mobility. The switching of CLCA1 containing vesicles from the secretory route to lysosomes needs extensive investigation involving overall trafficking of the protein. Taken together, the complete answer to both these important questions will need a series of experiments and those may be interesting avenues for future research.

      (a) Why does the level of Rab7 fluctuate during DSS treatment (Fig 1B)? (b) Does the reduction seen in Rab7 levels (by WB) also reflect in reduced Rab7 endosome numbers?

      This is a very thoughtful point from the reviewer. We detected a distinct pattern of Rab7 expression fluctuation in intestinal epithelial cells after DSS-dynamics treatment in mice. Perhaps, these changes are the result of complex cellular signalling in response to the DSS treatment. Rab7, being a fundamental protein involved in protein sorting pathway, is expected to undergo alteration based on cells requirement. Presently there are no reports suggesting the regulatory mechanisms that govern Rab7 levels in the gut. (b) We observed reduction in Rab7 expression both at RNA and protein levels. To confirm whether this alteration will lead to reduced Rab7 positive endosome numbers may require detailed investigations.

      Are other late endosomal (and lysosomal) populations also reduced upon DSS treatment and UC? Is there a general defect in lysosomal function?

      There are no direct evidences showing reduction in the late endosomal and lysosomal population during gut inflammation, but few studies link lysosomal dysfunction with risk for colitis (doi: 10.1016/j.immuni.2016.05.007).

      The evidence for lysosomal delivery of CLCA1 (Fig 7 I, J) is weak. Although used sometimes in combination with antibodies, lysotracker red is not well compatible with permeabilization and immunofluorescence staining. The authors can substantiate this result further using lysosomal antibodies such as Lamp1 and Lamp2. For Fig 7J, it will be good to see a reduction in Rab7 levels upon KD in the same cell.

      We used Lysotracker red in live cells followed by fixation. So, permeabilization issues were resolved. Lamp1, as suggested by the reviewer, is definitely a better marker for lysosomes in immunofluorescence studies, but is also shown to mark late endosomes (doi: 10.1083/jcb.132.4.565). As Rab7 protein also marks the late endosomes, using Lamp1 may leave the ambiguity of CLCA1 in Rab7 positive late endosomes versus lysosomes. Nevertheless, we will be carrying out this experiment and the data will be shared in revised version of the work.

      In this connection, Fig S3D is somewhat confusing. While it is clear that the pattern of Muc2 in WT and Rab7-/- cells are different, how this corroborates with the in vivo data on alterations in mucus layer permeability -- as claimed -- is not clear.

      The data in Fig. S3D suggest the involvement of Rab7 in packaging of Muc2. The whole idea for doing this experiment was to support our observation in the Rab7KD-mice model where mucus layer was seen to be loose and more permeable in Rab7 deficient mice.

      Overall, the work shows a role for a well-studied GTPase, Rab7, in gut homeostasis. This is an important finding and could provide scope and testable hypotheses for future studies aimed at understanding in detail the mechanisms involved.

      We thank the reviewer for this comment.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Muthana et al. describes the effect of injection of an antibody specific for human CTLA4 conjugated to a cytotoxic molecule (Ipi-DM1) in knock-in mice expressing human CTLA4. The authors show that Ipi-DM1 administration causes a partial decrease (about 50% in absolute number) of mature B cells in blood and bone marrow 9-14 days after the beginning of treatment. Ipi-DM1 also results in a partial decrease in Foxp3+ Tregs (about 40% in absolute number) and a slight increase in activation of conventional T cells (Tconvs) in the blood at D9. Tconv depletion, CTLA4-Ig or anti-TNF mAb partially prevents the effect of ipi-DM1 on B cells. This work is interesting but has the following major limitations:

      1) This work could have been of more interest if the Ipi-DM1 molecule would be used in the clinic. As this is not the case, the intimate mechanism of the effect of this molecule in mice is of reduced interest.

      The goal of the current study is to use Ipi-DM1 ADC as probe to study mechanism of B cell loss observed in Treg-deficient host.

      2) The fact that a partial deletion of Tregs is associated with activation of Tconvs and a decrease in B cells has been published several times and is therefore not new. According to the authors, their work would be the first to show that activation of Tconvs would lead to B cell depletion. However, this is shown in an indirect way and the mechanisms are not really elucidated. Indeed, this work shows a correlation between an increase in Tconv activation and a decrease in the number of B cells in the blood. The experiments to try to show a causal link are of 2 types: deletion of T cells (Fig 4) and blocking T cell activation with CTLA4-Ig (Fig 5) (neutralization of TNF addresses another question). Neither of these 2 experiments is totally convincing. Indeed, the absence of B cell depletion when T cells are deleted can be explained by other mechanisms than the preservation of B cell destruction by activated T cells. The phenomenon could be explained by B cell recirculation to lymphoid tissues or an effect of massive T cell death for example. The experiment shown in Fig. 5 with Belatacept is more convincing because this time the effect is targeted to activated T cells only. However, the prevention of B cell ablation is only partial. Again, since only blood is analyzed, other mechanisms could explain the B cell loss, such as their recirculation in lymphoid tissues.

      While the concept of treg depletion leads to activation of Tconv cells and reduced B cells has been previously published, B cell loss was explained on basis of defective B cell lymphopoiesis due to low production of stroma cell-derived IL-7 or destruction of stromal cells by effector T cells. Our new data established that loss of B cells in the context of Treg depletion was not due to defects in the number of pre-/pro-B cells. Rather it is the death of mature B cells in the bone marrow.

      To address the reviewer’s concern that the B cell loss was merely caused by a change in circulating pattern, we performed a new study on the effect of the ADC on B cells in bone marrow. Our new data reveal loss of mature bone marrow B cells, and that such loss is associated with increased apoptosis of mature B cells. Therefore, the loss of B cells in the peripheral blood is not due to a changed circulation. Furthermore, our data show that B cell progenitor, Pre-B, cells are not changed. Therefore, B cell lymphopoiesis is not the reason for B cell loss in our model system.

      3) It is disappointing that only the blood (and sometimes the bone marrow) was studied in this work. The interest of doing experiments in mice is to have access to many tissues such as the spleen, lymph nodes, colon, lung, and liver. To conclude that there is B cell deletion without showing lymphoid organs (where the majority of B cells reside) is insufficient. As discussed above, the drop in B cells in the blood could be due to their recirculation in lymphoid organs. In addition, there is no measurement of functional B cells activity. Do mice treated with Ipi-DM1 have a decreased ability to develop an antibody response following immunization?

      We have analyzed lymph nodes and spleen at the same time points. Unfortunately, Treg depletion was no longer observed at these time points. As expected, we did not see a clear depletion of B cells (Figure 1-figure supplement 6). In regards to functional B cell activity, we observed an increase of plasma immunoglobulins especially IgE which are now shown in Figure 3-figure supplement 1.

      4) Although it is difficult to study in vivo, there is not a single evidence of increased B cell death after injection of Ipi-DM1.

      Figure 2 & Figure 2-supplement 1 provides B cell death comparisons between IpiDM1 and hIgGFc group for bone marrow, blood, spleen, and lymph nodes. Statistically significant increase in B cell death is observed in mature B cells in bone marrow.

      5) In most of the experiments, B cells are quantified with the B220 marker alone, but this marker, in some cases, can be expressed by other cells. It would have been preferable to use a marker more specific to B cells such as CD19 for example.

      We have added data to support the death of mature B cells using other markers.

      Minor points.

      1) It should be indicated whether human CTLA4 binds normally to mouse CD80 CD86. We do not know if knock-in mice with human CTLA4 have a fully functional immune system.

      We have indicated this point as suggested and cited our previous work line 226-227 (ref 23 & 24)

      2) The manuscript is too long. Some of the data in the figures should be moved to supplemental figures. This is the case, for example, for some trivial stainings (Fig 1F, Fig 4B, 4F, Fig 5A, D, F, G). The figure legends and the Materials and Methods section are far too long. On the other hand, Fig 5-Fig Sup 1 could go into the main figures.

      The figure legends, materials, and methods may be too long, but our intention is to provide as much info as possible for others who may be interested in our model system.

      3) The anti-CTLA4 ADC reagent should be better explained and defined in the text.

      The anti-CTLA-4 ADC reagent synthesis described in materials/methods under “Antibody-drug conjugate preparation.”

      Reviewer #2 (Public Review):

      Despite the fact that CTLA-4 is a critical molecule for inhibiting the immune response, surprisingly individuals with heterozygous CTLA-4 mutations exhibit immunodeficiency, presenting with antibody deficiency secondary to B cell loss. Why the loss of a molecule that regulates T cell activation should lead to B cell loss has remained unclear. In this study, Muthana and colleagues use an anti-CTLA-4 antibody drug conjugate (aCTLA-4 ADC) to delete cells expressing high levels of CTLA-4, and show that this leads to a reduction in B cells. The aCTLA-4 ADC is found to delete a subset of Tregs, leading to hyperactivation of T cells that is associated with B cell depletion. Using blocking antibodies, the authors implicate TNFa in the observed B cell loss.

      The reciprocal regulation of T and B cell homeostasis is an important research area. While it has been shown that Treg defects are associated with B cell loss, the mechanisms at play are incompletely understood. CTLA-4 is not normally expressed in B cells so an indirect mechanism of action is assumed. The authors show that the decrease in Treg following aCTLA-4 ADC treatment is associated with activation of T cells, and that B cell loss is blunted if T cells are depleted. A role for both CD4 and CD8 T cells is identified by selective CD4/CD8 depletion. T cells appear to require CD28 costimulation in order to mediate B cell loss, since the response is partially inhibited in the presence of the costimulation blockade drug belatacept (CTLA-4-Ig). Finally, experiments using the anti-TNFa antibody adalimumab suggest a potential role for TNFa in the depletion of B cells.

      While the manuscript makes a useful contribution, a number of questions remain. Perhaps most important is the extent to which this model mimics the natural situation in individuals with CTLA-4 mutations (or following CTLA-4-based clinical interventions). aCTLA-4 ADC treatment permits acute deletion of Treg expressing high levels of CTLA-4, whereas in patients the Treg population remains but is specifically impaired in CTLA-4 function. Secondly, although the requirement for T cells to mediate B cell loss is convincingly demonstrated, the incomplete reversal by TNFa blockade suggests additional unidentified factors contribute to this effect. Finally, although the manuscript favours peripheral killing of mature B cells over alterations to B cell lymphopoiesis, one concern is that this may simply reflect the model employed: the shortterm (6 day) treatment used here may be too acute to alter B cell development, but this may nevertheless be a feature of prolonged immune dysregulation in humans.

      We appreciate reviewer’s comments and the difference between short-term depletion and permanent inactivation of Treg by genetic mutation is discussed. We would note that apart from mutation, dynamic Treg perturbation does occur under autoimmune conditions. Therefore, our data have significant implications for T-B cell interactions.

      TNF-alpha is implicated in B cell loss as evidenced by the partial rescue with Anti-TNF treatment. We did not try to exclude the possibility that other mechanisms are involved.

      Our data shows loss of circulating B cell in peripheral blood and mature bone marrow B cells. B cell progenitor, Pre-B, cells are not changed due Ipi-DM1 induced treg impairment, therefore B cell lymphopoiesis is not the reason for B cell loss in our model system. Evidence of increased cell death is only observed in mature B cells (Figure 2).

      1) Following aCTLA-4 ADC treatment, it is surprising how subtle the deletion of Treg is (from ~8% to ~7%, Fig 1G), compared to the marked deletion of CTLA-4-expressing CHO cells. Is this a feature of in vivo versus in vitro treatment? If Treg are treated in vitro is deletion more efficient? How does the expression level of CTLA-4 in the CHO cells compare with the Treg in these assays?

      We appreciate reviewer’s comments. The anti-CTLA-4 ADC targets CTLA-4 on cell surface. On average about 5% of Tregs express surface CTLA-4 at given moment while human CTLA-4 expressing CHO cell line stains > 90%. Nevertheless, Treg cell number in peripheral blood is reduced by >40%. Additionally, we have included bone marrow data, which shows a greater percentage of Treg depletion (Figure 1J).

      2) The decrease in CTLA-4 seen after ipi-DM1 is complicated by the fact that the control DM1 conjugate (IgG1-DM1) appears to significantly increase CTLA-4 expression (Fig 1 supplement 2). It would be useful to clarify when hIgGFc is used versus hIgGFc-DM1 given the additional complexity introduced here (comparisons lacking a payload differ in more than one variable, while the hIgGFc-DM1 is clearly not inert).

      We appreciate reviewer’s comments. We agree that the hIgGFc-DM1 control slightly increased CTLA-4 level; nevertheless, it did not alter B cells, T cells or their proliferation capacity when compared to hIgGFc. Our point here is that B cell depletion is not mediated by DM1 payload off target release (new-version Figure 1-Figure supplement 4, old version Figure 1-figure supplement 2). As for the clarification comment when hIgGFc is used versus hIgGFcDM1 is used, the information is clarified in the figure legend. Comparisons are made between (hIgGFc VS Ipi-DM1) or (hIgGFc VS hIgGFc-DM1).

      3) T cell-derived IFNg is another potential contender for influencing B cell homeostasis - have you considered testing whether this also contributes in your model?

      We appreciate reviewer’s suggestion. IFN was reported to induce apoptosis and cell arrest in Pre- B cells, however these studies are invitro studies Garvey et.al Immunology. 1994 Mar; 81(3): 381–388; Grawunder et.al Eur. J. Immunol. 23, 544–551. Since we did not observe any effect on Pre-B cells, we have not followed the literature to investigate the role of IFNy in B cell loss in our model.

      Reviewer #3 (Public Review):

      The co-suppressive molecule CTLA-4 has a critical role in the maintenance of peripheral tolerance, primarily by Treg mediated control of the co-stimulatory molecules CD80 and CD86. As stated by the authors, previous studies have found a variety of effects of anti-CTLA-4 antibody treatment or genetic loss of CTLA-4 on B-cells. These include increased B-cell activation and antibody production, autoantibody production, impairment of B-cell production in the bone marrow and loss of peripheral B-cells. In this article Muthana et al use a CTLA-4 humanized mouse model and examine the effects of drug conjugated CTLA-4 on the immune system. They observe a transient loss of B-cells in the blood of the treated mice. They then use a range of immune interventions such as T-cell depletion and blocking antibodies to demonstrate that this effect is dependent on T-cell activation.

      Since anti-CTLA-4 immunotherapy is in active clinical use exploration of its effects are welcome, this is helped by the use of a humanized CTLA-4 system which should be considered a strength of the paper. However, currently, the central premise of this paper, that B-cells are depleted, seems underexplored. Direct evidence of T-cell killing of B-cells is never presented, rather it is inferred from the reduced numbers of B-cells in the blood. The status of B-cells in sites that contain a large proportion of B-cells such as the spleen and lymph nodes is not examined. Additionally, no examination of B-cell antibody production is performed.

      We appreciate reviewer’s comments. To address the reviewer’s concerns we performed additional experiments to evaluate the impact on B cells in other organs, as detailed in our responses to specific questions.

      1) Examination of B-cell apoptosis/cell death and T-cell mediated cytotoxicity is needed. The authors repeatedly refer to auto destructive T-cells without ever demonstrating their presence or any direct evidence that B-cells are dying. This is particularly important in the context of the blood since an alternative hypothesis would be a change in B cell trafficking and infiltration into tissues.

      We appreciate reviewer’s comments. To address the reviewer’s concern that B cell loss in blood might be caused by a change in B cell trafficking pattern. We performed new study on the effect of the ADC on B cells in bone marrow. Our new data reveal loss of mature bone marrow B cells, and that such loss is associated with increased apoptosis of mature B cells (Figure 2). Therefore, the loss of B cells in the peripheral blood is not due to B cell trafficking and infiltration into tissues.

      2) The authors demonstrate that B-cells are mostly reduced in blood at around days 10 to 15, I believe it is critical to determine if this is also reflected in the lymphoid organs such as the spleen and lymph nodes.

      We appreciate reviewer’s comments. We have analyzed lymph node and spleen at the same time points. Unfortunately, Treg depletion was no longer observed at these time points. As expected, we did not see a clear depletion of B cells (Figure 1-figure supplement 6).

      3) Related to the above point do the authors see evidence of Splenomegaly or lymphadenopathy?

      We appreciate reviewer’s comment. Evidence of splenomegaly and lymphadenopathy is presented in Figure 3-figure supplement 2.

      4) Minimal examination of the status of the B-cells or antibody production is performed. Previous reports would suggest that plasma cell induction and antibody responses may be expected. Do serum antibody levels change in this system?

      We appreciate reviewer’s comment. Increases of plasma immunoglobulins especially IgE are now shown in Figure 3-figure supplement 1.

      5) Its unclear how the authors interpret their experiment with anti-TNFa (figure 6). Are they suggesting that TNFa itself depletes B-cells or that it is part of the inflammatory milieu that contributes to wider T-cell activation and, in turn, B-cell depletion?

      We have discussed these possibilities in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing and assessing our paper. Reviewer2 had only posive comments. Reviewer 1 also had posive comments but included a list of suggesons. The revised version includes text edits to address the suggesons.

      Reviewer 1:

      … First, it is unclear whether the experiments and analyses were set up to be able to rule out more specific candidate funcons of the ZI.

      The list of possible funcons performed by the ZI is broad. Nevertheless, our study considers a rather long list of neural processes related to the behaviors listed below.

      Second, many important details of the experiments and their results are hard to decipher given the current descripons and presentaons of the data.

      The procedures used in the present study have all been used and described in our previous studies (cited). We used the same descripons and presentaons as in the prior studies. We have gone over the Methods and figures to ensure that all details required to understand the experiments are provided, but we also added further details following the suggesons noted below.

      The paper could be significantly strengthened by including more details from each experiment, stronger jusficaons for the limited behaviors and experimental analyses performed, and, finally, a broader analysis of how the recorded acvity in the ZI relates to behavioral parameters.

      The paper studied several behaviors including: 1) spontaneous movement of head-fixed mice on a spherical treadmill, 2) tacle (whisker, and body parts) and auditory (tones and white noise) smuli applied to head fixed mice, 3) spontaneous movement iniaon, change, and turns in freely moving mice, 4) auditory tone (frequency and SPL) mapping in freely behaving mice, 5) auditory-evoked orienng head movements (responses) in the context of several behavioral tasks, 6) signaled acve avoidance responses and escapes (AA1), 7) unsignaled/signaled passive avoidance responses (AA2ITI/AA3-CS2), 8) sensory discriminaon (AA3), 9) CS-US interval ming discriminaon (AA4), and 10) USevoked unsignaled escape responses.

      In freely moving experiments, the behavior is connuously tracked and decomposed into translaonal and rotaonal movement components. Discrete responses are also evaluated (e.g., acve avoids, escapes, passive avoids, errors, intertrial crossings, latencies, etc.). These behavioral procedures evaluate many neural processes, including decision making (Go/NoGo in AA1-3), response control/inhibion (unsignaled and signaled passive avoidance in AA2/3), and smulus discriminaon (AA3). The applied smuli, discrete responses, and tracked movement are always related to the recorded ZI acvity using a variety of techniques (e.g., cross-correlaons, PSTHs, event-triggered me extracons, etc.), which relate the discrete and me-series parameters to the neural acvity. We do not think all this qualifies as, “limited behaviors”.

      (1) Anatomical specificaon: The ZI contains many disnct subdivisions--each with its own topographically organized inputs/outputs and putave funcons. The current manuscript doesn't reference these known divisions or their behavioral disncons, and one cannot tell exactly which poron(s) of the ZI was included in the current study. Moreover, the elongated structure of the ZI makes it very difficult to specifically or completely infect virally. The data could be beter interpreted if the paper included basic informaon on the locaons of recordings, the extent of the AAV spread in the ZI in each viral experiment, and what fracon of infected neurons were inside versus outside ZI.

      Our experiments employed Vgat-Cre mice to target ZI neurons. In this line, GABAergic neurons from the enre ZI express Cre, including the dorsal and ventral subdivisions (see (Vong et al., 2011; Hormigo et al., 2020)). Consequently, AAV injecons in Vgat-Cre mice produce restricted expression in the ZI that can fully delineate the nucleus as shown in the papers referenced above (including ours). There is nil expression in structures above or below ZI because they do not express Cre in these mice (e.g., thalamus and subthalamic nucleus), which allows for selecve targeng of ZI. Our optogenec manipulaons and photometry recordings were not aimed at specific ZI subdivisions. We targeted the area of ZI indicated by the stereotaxic coordinates (see Methods), which are aimed at the center of the structure to maximize success in recording/manipulang neurons within ZI. While all the animals included in the study expressed opsins and GCaMP within ZI that in many animals fully delineated the nucleus, there was normal variability in the locaon of opcal fibers, but we did not detect any differences in the results related to these variaons.

      Fiber photometry and optogenecs experiments are performed with rather large diameter opcal probes, which record/manipulate relavely large areas of the targeted structure. This is useful because our goal was to idenfy funconal roles of the enre ZI, which could then be parsed. In the present study, we did not perform experiments to target specific ZI populaons (e.g., retrograde Cre expression from target areas), which may have revealed differences atributed to their projecon sites. However, in the last experiment, we selecvely excited ZI fibers targeng three different areas (midbrain tegmentum, superior colliculus, and posterior thalamus), which revealed clear differences on movement. Thus, future experiments should explore these different populaons (e.g., using retrograde/anterograde expression systems), which may be in different subdivisions.

      We have enhanced the Methods secon to clarify these points, including the addion of these references.

      (2) Electrophysiological recording on the treadmill: The authors are commended for this technically very difficult experiment. The authors do not specify, however, how they knew when they were recording in ZI rather than surrounding structures, parcularly given that recording site lesions were only performed during the last recording session. A map of the locaons of the different classes of units would be valuable data to relate to the literature.

      We have added details about this procedure in the Methods secon. These recordings are performed based on coordinates, and categorizing neurons as belonging to ZI is obviously an esmate based on the final histological verificaon. Nevertheless, the marking lesions revealed that the electrodes were on target, which likely resulted from the care taken during the surgical procedure to define reference points used later during the recording sessions (see Methods). Regarding a map of the unit locaons, we performed several analyses that did not reveal clear differences based on site. For example, we compared depth vs cell class, “There was no difference in recording depth between the four classes of neurons (ANOVA F(3,337)= 1.06 p=0.3676)”. Future experiments that employ addional methods (labelling, opto-tagging, etc.) would be more appropriate to address mapping quesons. Finally, as we state in the paper, “However, these recordings do not target GABAergic neurons and may sample some neurons in the tissue surrounding the zona incerta. Therefore, we used calcium imaging fiber photometry to target GABAergic neurons in the zona incerta”.

      (3) The raonale of the analysis of acvity with respect to “movement peak”: It is unclear why the authors did not assess how ZI acvity correlates with a broad set of movement parameters, but rather grouped heterogeneous behavioral epochs to analyze firing with respect to “movement peaks”.

      The reviewer is referring to movement peaks on the spherical treadmill. On the treadmill, we used the forward locomotor movement of the animal because this is the main acvity of the mice on the treadmill. We considered “all peaks” (or movements) and “>4 sec peaks”, which select for movement onsets. Compared to the treadmill, in freely movement condions during various behavioral tasks, there is a richer behavioral repertoire, which was analyzed in more detail (i.e., translaonal, and rotaonal components during spontaneous ongoing movement and movement onsets, movement related to various behaviors such as orienng, acve and passive avoidance, escape, sensory smulaon, discriminaon, etc.). Thus, we focused on a broader set of movement parameters in the Cre-defined ZI cells of freely behaving mice.

      (4) The display of mean categorical data in various figures is interesng, however, the reader cannot gather a very detailed view of ZI firing responses or potenal heterogeneity with so litle informaon about their distribuons.

      The PCA performs the heterogeneity classificaon in an unbiased manner, which we feel is a thoughul approach. The firing rates and correlaons with movement for each category of neurons are detailed in the results. Furthermore, the sensory responses for these neurons are also detailed. Together, we think this provides a detailed view of the units we recorded in awake/head-fixed mice. As already stated, further study would benefit from an addional level of cell site verificaon.

      (5) Somatosensory firing responses in ZI: It is unclear why the authors chose the specific smuli used in the study. How oen did they evoke reflexive motor responses? What was the latency of sensory-evoked responses in ZI acvity and the latency of the reflexive movement?

      These are broad quesons, and we assume that the reviewer is asking about somatosensory evoked responses on the spherical treadmill. We used air-puffs applied to the whiskers and on the back (le vs right) because the whiskers represent an important sensory representaon for mice, and the back is a part of the body (trunk), which we oen use to movate the animals to move forward on the treadmill. Regarding the latency of the somatosensory evoked responses, in this case, we did not correct them based on the me it takes the air-puff to travel to the whiskers or body part, and therefore we did not provide latencies. Moreover, air-puffs are not a very good method to quanfy whisker-evoked latencies, which are beter measured using other methods (whisker deflecons of single/mulple whiskers using piezo-devices or other mechanical devices, as we and others have done in many studies). We are not sure what the reviewer means by “reflexive behavior”; we did not measure any reflexive behavior under these condions. We have gone over the Methods and Results to ensure that sufficient details are provided about these experiments.

      (6) It would be valuable to see example traces in Figure 3 to get a beter sense of the me course and contexts under which Ca signals in ZI tracks movement. What is the typical latency? What is the typical range of magnitudes of responses? Does the Ca signal track both fast and slow movements? How are the authors sure that there are no movement arfacts contribung to the calcium imaging? It seems there is more informaon in the dataset that could be valuable.

      As is well known, fiber photometry calcium imaging is a slow populaon signal. We do not think it would be valuable to get into ming issues beyond what is already detailed in the study (i.e., magnitudes measured as areas or peaks, and ming as me-to-peaks). Regarding “movement arfacts”, these signals are absent (flat) in animals that do not express GCAMP. We agree that there must be addional valuable informaon in our datasets (as in most me-series). However, the current paper is already rather extensive. We will connue to peruse our datasets and report addional findings in new papers.

      (7) Figure 4: The raonale for quanfying the F/Fo responses over a 6-second window, rather than with respect to discrete movement parameters, is not well explained. What types of movement are binned in this approach and might this broad binning hinder the ability to detect more specific relaonships between acvity and movement?

      Figure 4 is focused on characterizing the relaonship between turns (ipsiversive and contraversive) during movement and ZI acvity. We tested different binning windows to find differences, including the 6 sec window in figure 4 for populaon measures (-3 to 3 sec around the turns). This binning approach is effecve at revealing differences where they exist (e.g., superior colliculus) as shown in our previous studies (e.g. (Zhou et al., 2023)). Moreover, the turns in the different direcons can be considered discrete responses at their peak, and the ming of the related acvaons (e.g., me to peaks), which we evaluated, are rather sensive and would have revealed differences, but we did not find them.

      (8) Separaon of sensory and motor responses in Figure 5: The current data do not adequately differenate whether the responses are sensory or motor given the high correlaon of the sensory inputs driving motor responses. Because isoflurane can diminish auditory responses early in the auditory pathway, this reviewer is not convinced the isoflurane experiments are interpretable.

      The reviewer is referring to Fig. 5C,D. Indeed, the point of this experiment was to show that it is difficult to differenate whether neural responses are sensory or motor in awake and freely moving condions. As we stated in the Results secon, “Although arousal and movement were not dissected in the present experiment (this would likely require paralyzing and ventilating the animal), the results indicate that activation of zona incerta neurons by sensory stimulation is primarily associated with states when sensory-evoked movement is also present”. This is followed in the Discussion by, “…as already noted, the suppression of sensory responses may be due to changes in arousal (Castro-Alamancos, 2004; Lee and Dan, 2012) and not caused by the abolishment of the movements per se”.

      (9) Given the broad duraon of the mean avoidance response (Fig. 6 C, botom), it would be useful to know to what extent this plot reflects a prolonged behavior or is the result of averaging different animals/trials with different latencies. Given that the shapes of the F/Fo responses in ZI appear similar across avoids and escapes (Fig. 6D), despite their apparent different speeds and movement duraons (Fig 6C), it would be valuable to know how the ming of the F/Fo relates to movement on a trial-by-trial basis.

      The duraon of the avoidance response cannot be ascertained from CS onset (panel 6C botom) and avoids are not wide but rather sharp. We have now made this clearer when Fig. 6C is first menoned (“note that since avoids occur at different latencies after CS onset they are best measured from their occurrence as in Fig. 6D”). Like other related condioned and uncondioned responses, avoids and escapes are similar, varying in the noted parameters. Regarding ming, as already menoned above, we think that the characteriscs of the populaon calcium signal make it unsuitable for further ming consideraons than what we included, parcularly for movements occurring at the fast speeds of avoids and escapes.

      (10) Lesion quanficaon: One cannot tell what rostral-caudal extent of ZI was lesioned and quanfied in this experiment. It would be easier to interpret if also ploted for each animal, so the reader can tell how reliable the method is. The mean ablaon would be beter shown as a normalized fracon of cells. Although the authors claim the lesions have litle impact on behavior, it appears the incompleteness of the lesions could warrant a more conservave interpretaon.

      The lesion experiment was a complement to the optogenecs inacvaon experiments we performed in our preceding ZI paper and in the present paper. Thus, the finding that the lesions had litle impact on behavior is supporve of the optogenecs findings. Regarding cell counts, we did not select any parts of the ZI to quanfy the number of neurons in either control or lesion mice. We considered the full rostrocaudal extent in our measurements. We are not sure what “fracon” the reviewer is suggesng, considering that these counts are from two different groups of mice (control vs lesion). Note that the red-marked neurons, as shown in Fig. 8A, reveal healthy non-Vgat-Cre neurons outside ZI that mark the extent of the AAV diffusion, which as shown spanned the full extent of the ZI in the coronal plane (and in other planes as the AAV spreads in all direcons).

      (11) Optogenecs: the locaon of infected neurons is poorly described, including the rostral-caudal extent and the fracon of neurons inside and outside of ZI. Moreover, it is unclear how strongly the optogenec manipulaons in this study are expected to affect neuronal acvity in ZI.

      We discussed the first point in (1) above. Regarding, how optogenec manipulaons are expected to affect neuronal acvity in ZI and its targets, we have conducted extensive electrophysiological recordings in slices and in vivo to detail the effects of our manipulaons on GABAergic neurons (e.g. (Hormigo et al., 2016; Hormigo et al., 2019; Hormigo et al., 2021a; Hormigo et al., 2021b), including ZI neurons (Hormigo et al., 2020). In fact, we never use an opsin we have not validated ourselves using electrophysiology. Moreover, our experiments employ a spectrum of optogenec light paterns (including trains/cont at different powers) that trate the optogenec effects within each session/animal. As shown in fig. 11 and 12, these paterns produce different behavioral effects related to the different levels of neural firing they induce. For ChR2-expressing neurons in ZI, firing is frequency dependent and maximal during Cont blue light (at the same power). For Arch-expressing neurons only Cont is used, and inhibion is a funcon of the green light power. When blue light is applied in ZI fibers targeng different areas, this relaonship changes. Blue light trains (1-ms pulses) at 40-66 Hz become the most effecve means of inducing sustained postsynapc inhibion compared to Cont or low frequencies.

      References

      Castro-Alamancos MA (2004) Dynamics of sensory thalamocorcal synapc networks during informaon processing states. Progress in Neurobiology 74:213-247.

      Hormigo S, Vega-Flores G, Castro-Alamancos MA (2016) Basal Ganglia Output Controls Acve Avoidance Behavior. J Neurosci 36:10274-10284.

      Hormigo S, Zhou J, Castro-Alamancos MA (2020) Zona Incerta GABAergic Output Controls a Signaled Locomotor Acon in the Midbrain Tegmentum. eNeuro 7.

      Hormigo S, Zhou J, Castro-Alamancos MA (2021a) Bidireconal control of orienng behavior by the substana nigra pars reculata: disnct significance of head and whisker movements. eNeuro. Hormigo S, Vega-Flores G, Rovira V, Castro-Alamancos MA (2019) Circuits That Mediate Expression of Signaled Acve Avoidance Converge in the Pedunculoponne Tegmentum. J Neurosci 39:45764594.

      Hormigo S, Zhou J, Chabbert D, Shanmugasundaram B, Castro-Alamancos MA (2021b) Basal Ganglia Output Has a Permissive Non-Driving Role in a Signaled Locomotor Acon Mediated by the Midbrain. J Neurosci 41:1529-1552.

      Lee SH, Dan Y (2012) Neuromodulaon of brain states. Neuron 76:209-222.

      Vong L, Ye C, Yang Z, Choi B, Chua S, Jr., Lowell BB (2011) Lepn acon on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron 71:142-154.

      Zhou J, Hormigo S, Busel N, Castro-Alamancos MA (2023) The Orienng Reflex Reveals Behavioral States Set by Demanding Contexts: Role of the Superior Colliculus. J Neurosci 43:1778-1796.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for their very useful and constructive comments. We went through the list and gladly received all their suggestions. The reviewers mostly pointed to minor revisions in the text, and we acted on all of those. The one suggestion that required major work was the one raised in point 13, about the processing pipeline being unconvincingly scattered between different tools (R → Python → Matlab). I agree that this was a major annoyance, and I am happy to say we have solved it integrating everything in a recent version of the ethoscopy software (available on biorxiv with DOI https://www.biorxiv.org/content/10.1101/2022.11.28.517675v2 and in press with Bioinformatics Advances). End users will now be able to perform coccinella analysis using ethoscopy only, thus relying on nothing else but Python as their data analysis tool. This revised version of the manuscript now includes two Jupyter Notebooks as supplementary material with a “pre-cooked” sample recipe of how to do that. This should really simplify adoption and provides more details on the pipeline used for phenotyping.

      Please find below a point-by-point description of how we incorporated all the reviewers’ excellent suggestions.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      1) Line 38: "collecting data simultaneously from a large number of individuals with no or limited human intervention" is a bit misleading, as the entire condition the individuals are put in are highly modified by humans and most times "unnatural". I understand the point that once the animals are placed in these environments, then recording takes place without intervention, but it would be nice to rephrase this so that it reflects more accurately what is happening.

      We have now rephrased this into the following (L39):

      Collecting data simultaneously from a large number of individuals, which can remain undisturbed throughout recording.

      2) Line 63: please add a reference to the Ethoscopes so that readers can easily find it.

      Done.

      2b) And also add how much they cost and the time needed to build them, as this will allow readers to better compare the proposed system against other commercially available ones.

      This information is available on the ethoscope manual website (http://lab.gilest.ro/ethoscope). The price of one ethoscope, provided all necessary tools are available, is around ~£75 and the building time very much depends on the skillset of the builder and whether they are building their first ethoscope or subsequent ones. In our experience, building and adopting ethoscopes for the first time is not any more time-expensive than building a (e.g.) deeplabcut setup for the first time. We have added this information to L81

      Ethoscopes are open source and can be manufactured by a skilled end-user at a cost of about £75 per machine, mostly building on two off-the-shelf component: a Raspberry Pi microcomputer and a Raspberry Pi NoIR camera overlooking a bespoke 3D printed arena hosting freely moving flies.

      3) Line 88: The authors describe that in the current setting, their system is capable of an acquisition rate of 2.2 frames per second (FPS). Would reducing the resolution of the PiCamera allow for higher FPS? I raise this point because the authors state that max velocity over a ten second window is a good feature for classifying behaviors. However, if animals move much faster than the current acquisition rate, they could, for instance, be in position X, move about and be close to the initial position when the next data point is acquired, leading to a measured low max velocity, when in fact the opposite happened. I think it would be good to add a statement addressing this (either data from the literature showing that the low FPS does not compromise data acquisition, or a test where increasing greatly FPS leads to the same results).

      We have previously performed a comparison of data analysed using videos captured at different FPSs, which is published in Quentin Geissman’s doctoral Thesis (2018, DOI: https://doi.org/10.25560/69514 ) in chapter 2, section 2.8.3, figure 2.9 ). We have now added this work as one of the references at L95 (reference 19).

      4) Still on the low FPS, would a Raspberry Pi 4 help with the sampling rate? Given that they are more powerful than the RPi3 used in the paper?

      It would, but it would be a minor increase, leading from 2.2 to probably 3-5 FPS. A significantly higher number of FPSs would be best achieved by lowering the camera’s resolution, as the reviewer’s suggested, or by operating offline. I think the interesting point being implied by the reviewers is that, for Drosophila, the current limits of resolution are more than sufficient. For other animals, perhaps moving more abruptly, they may not. The reviewer is right that we should add a line of caveat about this. We now do so in the discussion, lines 215-224.

      Coccinella is a reductionist tool, not meant to replace the behavioural categorization that other tools can offer but to complement it. It relies on raspberry PIs as main acquisition devices, with associated advantages and limitations. Ethoscopes are inexpensive and versatile but have limitations in terms of computing power and acquisition rates. Their online acquisition speed is fast enough to successfully capture the motor activity of different species of Drosophilae28, but may not be sufficient for other animals moving more swiftly, such as zebrafish larvae. Moreover, coccinella cannot apply labels to behaviour (“courting”, “lounging”, “sipping”, “jumping” etc.) but it can successfully identify large behavioural phenotypes and generate unbiased hypothesis on how behaviour – and a nervous system at large – can be influenced by chemicals, genetics, artificial manipulations in general.

      5) Along the same line of thought, would using a simple webcam (with similar specs to the PiCamera - ELP has cameras that operate on infrared and are quite affordable too) connected to a more powerful computer lead to higher FPS? - The reason for the question about using a simple webcam is that this would make your system more flexible (especially useful in the current shortage of RPi boards on the market) lowering the barrier for others to use it, increasing the chances for adoption.

      Completely bypassing ethoscopes would require the users to setup their own tracking solution, with a final result that may or may not match what we describe here. If a greater temporal resolution is necessary, the easiest way to achieve more FPSs would be to either decrease camera resolution or use the Pis to take videos offline and then process those videos at a later stage. The combination of these two would give FPS acquisition of 60 fps at 720p, which is the maximum the camera can achieve. We now made this clear at lines 83-92.

      The temporal and spatial resolution of the collected images depends on the working modality the user chooses. When operating in offline mode, ethoscopes are capable to acquire 720p videos at 60 fps, which is a convenient option with fast moving animals. In this study, we instead opted for the default ethoscope working settings, providing online tracking and realtime parametric extraction, meaning that images are analysed by each raspberry Pi at the very moment they were acquired (Figure 1b). This latter modality limits the temporal resolution of information being processed (one frame every 444 ms ± 127 ms, equivalent to 2.2 fps on a Raspberry Pi3 at a resolution of 1280x960 pixels with each animal being constricted in an ellipse measuring 25.8 ± 1.4 x 9.85 ±1.4 pixels - Figure 1a) but provides the most affordable and high-throughput solution, dispensing the researcher from organising video storage or asynchronous video processing for animals tracking.

      6) One last point about decreasing use barrier and increasing adoption: Would it be possible to use DeepLabCut (DLC) to simply annotate each animal (instead of each body part) and feed the extracted data into your current analysis with coccinella? This way different labs that already have pipelines in place that use DLC would have a much easier time in testing and eventually switching to coccinella? I understand that extracting simple maximal velocity this way would be an overkill, but the trade-off would again be a lowering of the adoption barrier.

      It would certainly be possible to calculate velocity from the whole animal pose measurement and then use this with HCTSA or Catch22, thus mimicking the coccinella pipeline, but it would be definitely overkilled, as the reviewers correctly points out. Given that we are trying to make an argument about high-throughput data acquisition I would rather not suggest this option in the manuscript.

      7) Line 96: The authors state that once data is collected, it is put through a computational frameworkthat uses 7700 tests described in the literature so that meaningful discriminative features are found. I think it would be interesting to expand a bit on the explanation of how this framework deals multiple comparison/multiple testing issues.

      We always use the full set of features on aggregate to train a classifier (e.g., TS_Classify in HCTSA) and that means no correction is necessary because the trained classifier only ever makes a single prediction (only one test is performed), so as long as it is done correctly (e.g., proper separation of training and test sets, etc.) then multiple hypothesis correction is not appropriate. This has been confirmed with the HCTSA/Catch22 author (Dr Ben Fulcher, personal communication). We have added a clarifying sentence about this to the methods (L315-318)

      8) It would be nice to have a couple of lines explaining the choice of compounds used for testing and also why in some tests, 17 compounds were used, while in others 40, and then 12? I understand how much work it must be in terms of experiment preparation and data collection for these many flies and compounds, but these changes in the compounds used for testing without a more detailed explanation is suboptimal.

      This is another good point. We have now added this information to the methods, in a section renamed “choice, handling and preparation of drugs” L280-285, which now reads like this:

      The initial preliminary analysis was conducted using a group of 12 compounds “proof of principle” compounds and a solvent control. These compounds were initially used to compare both the video method and ethoscope method. After testing these initial compounds, it was found that the ethoscope methodology was more successful, and then the compound list was expanded to 17 (including the control) only using the ethoscope method. As a final test, we included additional compounds for a single concentration, bringing up the total to 40 (including control), also for the ethoscope method.

      9) Line 119 states: "A similar drop in accuracy was observed using a smaller panel of 12 treatments (Supplementary Figure 2a)". It is actually Supplementary Figure 1c.

      Thank you for noticing that! Now corrected. The Supplementary figures have also been renamed to obey eLife’s expected nomenclature (both Figure 1 – Figure supplements)

      10) In some places the language seems a little outlandish and should either be removed or appropriately qualified. a- Lines 56-59 pose three questions that are either rhetorical or ill-posed. For example, "...minimal amount of information...behavior" implies there is a singular response but the response depends on many details such as to what degree do the authors want to "classify behavior".

      Yes, those were meant as rhetorical questions indeed, but we prefer to keep them in, because we are hoping to generate this type of thoughts with the readers. These are concepts that may not be so obvious to someone who is just looking to apply an existing tool and may spring some reflection about what kind of data do they really want/need to acquire.

      b) Some of the criticisms leveled at the state-of-the-art methods are probably unwarranted because the goals of the different approaches are different. The current method does not yield the type of rich information that DeepLabCut yields. So, depending on the application DeepLabCut may be the method of choice. The authors of the current manuscript should more clearly state that.

      In the introduction and discussion we do try to stress that coccinella is not meant to replace tools like DLC. We have now added more emphasis to this concept, for instance to L212:

      [tools like deeplabcut] are ideal – and irreplaceable – to identify behavioural patterns and study fine motor control but may be undue for many other uses.

      And L215:

      Coccinella is a reductionist tool not meant to replace the behavioural categorization that other tools can offer but to complement it

      11) The application to sleep data appears suddenly in the manuscript. The authors should attempt to make with text change a smoother transition from drug screen to investigation into sleep.

      I agree with this observation. We have now tried to add a couple of sentences to contextualise this experiment and hopefully make the connection appear more natural. Ultimately, this is a proof-ofprinciple example anyway so hopefully the reader will take it for what it is (L169).

      Finally, to push the system to its limit, we asked coccinella to find qualitative differences not in pharmacologically induced changes in activity, but in a type of spontaneous behaviour mostly characterised by lack of movement: sleep. In particular, we wondered whether coccinella could provide biological insights comparing conditions of sleep rebound observed after different regimes of sleep deprivation. Drosophila melanogaster is known to show a strong, conserved homeostatic regulation of sleep that forces flies to recover at least in part lost sleep, for instance after a night of forceful sleep deprivation.

      11b) Additionally, the beginning section of sleep experiments talks about sleep depth yet the conclusion drawn from sleep rebound says more about the validity of the current 5 min definition of sleep than about sleep depth. If this conclusion was misunderstood, it should be clarified. If it was not, the beginning text of the sleep section should be tailored to better fit the conclusion.

      I am afraid we did not a good job at explaining a critical aspect here: the data fed to coccinella are the “raw” activity data, in which we are not making any assumption on the state of the animal. In other words, we do not use the 5-minutes at this or any other point to classify sleep and wakening. Nevertheless, coccinella picks the 300 seconds threshold as the critical one for discerning the two groups. This is interesting because it provides a full agnostic confirmation of the five minutes rule in D. melanogaster. We recognise this was not necessarily obvious from the text and now added a clarification at L189-201:

      However, analysis of those same animals during rebound after sleep deprivation showed a clear clustering, segregating the samples in two subsets with separation around the 300 seconds inactivity trigger (Figure 3d). This result is important for two reasons: on one hand, it provides, for the third time, strong evidence that the system is not simply overfitting data of nought biological significance, given that it could not perform any better than a random classifier on the baseline control. On the other hand, coccinella could find biologically relevant differences on rebound data after different regimes of sleep deprivation. Interestingly enough, the 300 seconds threshold that coccinella independently identified has a deep intrinsic significance for the field, for it is considered to be the threshold beyond which flies lose arousal response to external stimuli, defining a “sleep quantum” (i.e.: the minimum amount of time required for transforming inactivity bouts into sleep bouts23,24,28). Coccinella’s analysis ran agnostic of the arbitrary 5-minutes threshold and yet identified the same value as the one able to segregate the two clusters, thus providing an independent confirmation of the fiveminutes rule in D. melanogaster.

      12) Line 227: (standard food) - please add a link to a protocol or a detailed description on what is "standard food". This way others can precisely replicate what you are using. This is not my field, but I have the impression that food content/composition for these animals makes big changes in behaviour?

      Yes, good point. We have now added the actual recipe to the methods L240:

      Fly lines were maintained on a 12-hour light: 12-hour dark (LD) cycle and raised on polenta and yeast-based fly media (agar 96 g, polenta 240 g, fructose 960 g and Brewer’s yeast 1,200 g in 12 litres of water).

      13) Data acquisition and processing: please add links to the code used.

      Both the code and the raw data used to generate all the figures have been uploaded on Zenodo and available through their repository. Zenodo has a limit of 50GB per uploaded dataset so we had to split everything into two files, with two DOIs, given in the methods (L356, section “code and availability” - DOIs: 10.5281/zenodo.7335575 and 10.5281/zenodo.7393689). We have now also created a landing page for the entire project at http://lab.gilest.ro/coccinella and linked that landing page in the introduction (L64).

      13b) Also your pipeline seems to use three different programming languages/environments... Any chance this could be reduced? Maybe there are R packages that can convert csv to matlab compatible formats, so you can avoid the Python step? (nothing against using the current pipeline per se, I am just thinking that for usability and adoption by other labs, the smaller amount of languages, the better?

      This is a very important suggestion that highlights a clear limitation of the pipeline. I am happy to say that we worked on this and solved the problem integrating the Python version of Catch22 into the ethoscopy software. This means the two now integrate, and the entire analysis can be run within the Python ecosystem. HCTSA does not have a Python package unfortunately but we still streamlined the process so that one only has to go from Python to Matlab without passing through R. To be honest, Catch22 is the evolution of HCTSA and performs really well so I think that is what most users will want to use. We provide two supplementary notebooks to guide the reader through the process. One explains how to go from ethoscope data to an HCTSA compatible mat file. The other explains how ethoscope data integrate with Catch22 and provides many more examples than the ones found in the paper figures.

      14) There are two sections named "References" (which are different from each other) on the manuscript I received and also on BioRxiv. Should one of them be a supplementary reference? Please correct it. I spent a bit of time trying to figure out why cited references in the paper had nothing to do with what was being described...

      The second list of references actually applied only to the list of compounds in the supplementary table 1. When generating a collated PDF this appeared at the end of the document and created confusion. We have now amended the heading of that list in the following way, to read more appropriately:

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for reviewing our manuscript. We do find that the reviews are constructive and meaningful. Accordingly, we incorporated most suggestions into our revision. We provided a point-by-point responses to the reviews below.

      Reviewer #1 (Public Review):

      The evolution of dioecy in angiosperms has significant implications for plant reproductive efficiency, adaptation, evolutionary potential, and resilience to environmental changes. Dioecy allows for the specialization and division of labor between male and female plants, where each sex can focus on specific aspects of reproduction and allocate resources accordingly. This division of labor creates an opportunity for sexual selection to act and can drive the evolution of sexual dimorphism.

      In the present study, the authors investigate sex-biased gene expression patterns in juvenile and mature dioecious flowers to gain insights into the molecular basis of sexual dimorphism. They find that a large proportion of the plant transcriptome is differentially regulated between males and females with the number of sex-biased genes in floral buds being approximately 15 times higher than in mature flowers. The functional analysis of sex-biased genes reveals that chemical defense pathways against herbivores are up-regulated in the female buds along with genes involved in the acquisition of resources such as carbon for fruit and seed production, whereas male buds are enriched in genes related to signaling, inflorescence development and senescence of male flowers. Furthermore, the authors implement sophisticated maximum likelihood methods to understand the forces driving the evolution of sexbiased genes. They highlight the influence of positive and relaxed purifying selection on the evolution of male-biased genes, which show significantly higher rates of nonsynonymous to synonymous substitutions than female or unbiased genes. This is the first report (to my knowledge) highlighting the occurrence of this pattern in plants. Overall, this study provides important insights into the genetic basis of sexual dimorphism and the evolution of reproductive genes in Cucurbitaceae.

      Thank you for your positive comments. Greatly appreciated.

      There are, however, parts of the manuscript that are not clearly described or could be otherwise improved.

      • The number of denovo-assembled unigenes seems large and I would like to know how it compares to the number of genes in other Cucurbitaceae species. The presence of alternatively assembled isoforms or assembly artifacts may be still high in the final assembly and inflate the numbers of identified sex-biased genes.

      The majority of unigenes were annotated by homologs in species of Cucurbitaceae (63%), including Momordica charantia (16.3%), Cucumis melo (11.9%), Cucurbita pepo (11.9%), Cucurbita moschata (11.5%), Cucurbita maxima (10.1%) and other species of Cucurbitaceae (Fig. S1C). We admit that in the final assembly, transcripts may be still overestimated due to the unavoidable presence of isoforms, although we have tried our best to filter it by several strategies of clustering methods. Additionally, we assessed the transcripts using BUSCOv5.4.5 and embryophyta_odb10 database with 1,614 plant orthologs assessment. Some 95.0% of these orthologs were covered by the unigenes, in which 1447 (89.7%) BUSCO genes were “Complete BUSCOs”, 85 (5.3%) were “Fragmented BUSCOs”, and only 82 (5.0%) were “Missing BUSCOs” (Table S2). Overall, our assessment suggested that we have generated high-quality reference transcriptomes in the absence of a reference genome. Subsequently, we revised the manuscript (lines 175-181).

      • It is interesting that the majority of sex-biased genes are present in the floral buds but not in the mature flowers. I think this pattern could be explored in more detail, by investigating the expression of male and female sex-biased genes throughout the flower development in the opposite sex. It is also not clear how the expression of the sex-biased genes found in the buds changes when buds and mature flowers are compared within each sex.

      Thank you for your advice for further understanding of this interesting pattern. In the near future, we would like to study these issues through more development stages of flowers in each sex, probably with the aid of single-cell techniques and a reference genome. We have revised the manuscript to reflect these in Results, in the section "Tissue-biased/stage-biased gene expression" (lines 202216).

      • The statistical analysis of evolutionary rates between male-biased, female-biased, and unbiased genes is performed on samples with very different numbers of observations, therefore, a permutation test seems more appropriate here.

      Thank you for your suggestion. However, all comparisons between sex-biased and unbiased genes were tested using Wilcoxon rank sum test in R software, which is more commonly used. Additionally, we tested some datasets, which were consistent with Wilcoxon rank sum test.

      • The impact of pleiotropy on the evolutionary rates of male-biased genes is speculative since only two tissue samples (buds and mature flowers) are used. More tissue types need to be included to draw any meaningful conclusions here.

      Thank you for your advice for further understanding of the impact of pleitropy. In the near future, we would like make further investigations through more development stages of flowers and new technologies in each sex to consolidate the conclusion.

      Reviewer #2 (Public Review):

      Summary:

      This study uses transcriptome sequence from a dioecious plant to compare evolutionary rates between genes with male- and female-biased expression and distinguish between relaxed selection and positive selection as causes for more rapid evolution. These questions have been explored in animals and algae, but few studies have investigated this in dioecious angiosperms, and none have so far identified faster rates of evolution in male-biased genes (though see Hough et al. 2014 https://doi.org/10.1073/pnas.1319227111).

      Strengths:

      The methods are appropriate to the questions asked. Both the sample size and the depth of sequencing are sufficient, and the methods used to estimate evolutionary rates and the strength of selection are appropriate. The data presented are consistent with faster evolution of genes with male-biased expression, due to both positive and relaxed selection.

      This is a useful contribution to understanding the effect of sex-biased expression in genetic evolution in plants. It demonstrates the range of variation in evolutionary rates and selective mechanisms, and provides further context to connect these patterns to potential explanatory factors in plant diversity such as the age of sex chromosomes and the developmental trajectories of male and female flowers.

      Weaknesses:

      The presence of sex chromosomes is a potential confounding factor, since there are different evolutionary expectations for X-linked, Y-linked, and autosomal genes. Attempting to distinguish transcripts on the sex chromosomes from autosomal transcripts could provide additional insight into the relative contributions of positive and relaxed selection.

      Thank you for your meanful suggestions. We agree that the identification of chromosome origins for transcripts would greatly improve the insights of selection, and we will investigate these issues, probably with a reference genome in the near future.

      Reviewer #3 (Public Review):

      The potential for sexual selection and the extent of sexual dimorphism in gene expression have been studied in great detail in animals, but hardly examined in plants so far. In this context, the study by Zhao, Zhou et al. al represents a welcome addition to the literature.

      Relative to the previous studies in Angiosperms, the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers).

      The main limitation of the study is the very low number of samples analyzed, with only three replicate individuals per sex (i.e. the whole study is built on six individuals only). This provides low power to detect differential expression. Along the same line, only three species were used to evaluate the rates of non-synonymous to synonymous substitutions, which also represents a very limited dataset, in particular when trying to fit parameter-rich models such as those implemented here.

      A third limitation relates to the absence of a reference genome for the species, making the use of a de novo transcriptome assembly necessary, which is likely to lead to a large number of incorrectly assembled transcripts. Of course, the production of a reference transcriptome in this non-model species is already a useful resource, but this point should at least be acknowledged somewhere in the manuscript.

      Each of these shortcomings is relatively important, and together they strongly limit the scope of the conclusions that can be made, and they should at least be acknowledged more prominently. The study is valuable in spite of these limitations and the topic remains grossly understudied, so I think the study will be of interest to researchers in the field, and hopefully inspire further, more comprehensive analyses.

      We acknowledged that our sample size was relatively small. We will investigate these issues at the population level, probably with a reference genome in the near future. We acknowledged in the revised manuscript that there may be some incorrectly assembled transcripts. We assessed the transcripts using BUSCOv5.4.5 and the latest embryophyta_odb10 database with 1,614 plant orthologs assessment. As mentioned, 95.0% of these orthologs were covered by the unigenes, which of 1447 (89.7%) BUSCO genes were “Complete BUSCOs”, 85 (5.3%) were “Fragmented BUSCOs”, and only 82 (5.0%) were “Missing BUSCOs” (Table S2). In short, the quality of transcriptome was high in the absence of a reference genome.

      Reviewer #1 (Recommendations For The Authors):

      My main criticism of this manuscript is that it refers to gene names and orthogroups throughout the text, however, the assembled transcripts are not accessible. The reference trascriptome, orthology data, and alignments used for evolutionary analysis should be made available through a public repository to support reproducibility and efficient use of produced resources in this study.

      We have uploaded these datasets in Researchgate (https://www.researchgate.net/publication/373194650_Trichosanthes_pilosa_datasets Positive_selection_and_relaxed_purifying_selection_contribute_to_rapid_evolution of_male-biased_genes_in_a_dioecious_flowering_plant).

      Comments to the authors:

      1) I have an issue with the tissue-biased gene expression analysis. Looking at Fig.3, it seems to me there are 3,204 male-biased genes that are expressed at the same level in male buds and mature flowers (same for 5,011 female-biased genes in female buds and flowers), however, only a handful of genes show sex bias between mature male and female flowers. Taking the male-biased genes as an example, if the 3,204 M1BGs experience the same expression levels in mature male flowers and are no longer male-biased when mature male vs female flowers are compared, why there are not found as female tissue biased (F2TGs)? I may be wrong, but one scenario would be that the M1BGs increase their expression in female flowers and become unbiased. However, that increase in expression (low expression in the female buds → higher expression in the female flowers) should classify them as female tissue-biased genes (F2TGs). Can you please clarify how are the M1BGs and F1BGs expressed in the flowers of the opposite sex?

      As to Fig. 3A, 3,204 male-biased genes expressed in male floral buds are part of all male-biased genes (3204+286+724=4214), as shown in Fig.2A. However, only 233 male-biased genes (88+1+144=233, Fig.2B and Fig.3B) expressed in male mature flowers. So, they are not expressed at the same level between male floral buds and mature flowers. Only 288 genes are sex-biased (M1BGs), as well as tissue/stage-biased (M1TGs) in male floral buds. M1BGs (4,214 male-biased genes) and F1BGs (5,096 female-biased genes) are 0 overlaps, except for 44,326 unbiasedgenes shown in Fig.2A. That is, F1BGs (5,096 female-biased genes) are low expression or no expression in M1BGs (4,214 male-biased genes). The expression levels of some genes have been shown in Table S14.

      2) Paragraph (407-416) describes the analysis of duplicated genes under relaxed selection but there is no mention of this in the results.

      In fact, these results have been shown in Table S13. It is not necessary for us to describe them in detail in the results.

      3) How did the authors conclude that the identified functions in male flowers make them more adapted to biotic and abiotic environments (line 347-350)? In the paragraph above (line 338-342) the authors describe that female buds are better equipped against herbivores, which are a biotic factor?

      Following your concerns, we have revised the manuscript as follows: For line 338-342, we revised the text as “Indeed, functional enrichment analysis in chemical pathways such as terpenoid backbone and diterpenoid biosynthesis indicated that relative to male floral buds, female floral buds had more expressed genes that were equipped to defend against herbivorous insects and pathogens, except for growth and development (Vaughan et al., 2013; Ren et al., 2022) (Fig. S7A and Table S11).” For line 347-350, we revised text as “We also found that male-biased genes with high evolutionary rates in male buds were associated with functions to abiotic stresses and immune responses (Tables S12 and S13), which suggest that male floral buds through rapidly evolving genes are adapted to mountain climate and the environment in Southwest China compared to female floral buds through high gene expression.”

      4) Line 417-418: decreasing codon usage bias is linked to decreasing synonymous substitution rates, should this be the opposite?

      No. Codon usage bias was positively related to synonymous substitution rates. That is, stronger codon usage bias may be related to higher synonymous substitution rates (Parvathy et al., 2022).

      5) Figures and Tables are not standalone and are missing details in the legends. - Fig.2C, which genes are plotted on the heatmap and what is the color scale corresponding to?

      • All Supplementary figures are missing the descriptions of individual panels (A, B, C,etc.) in the legends. In addition, please add the numbers of observations under boxplots.

      • Supplementary Fig.5 and 6: Panel B is not a Venn diagram, I suggest removing it from the figures.

      • Supplementary Fig.7: Should be 'sex-biased genes'. What is the x-axis on the plot?

      • Supplementary Fig.8: Please add the description of the abbreviations in the legend. - Supplementary Tables S4, S5, S6: Please add information about the foreground and background branches.

      • Supplementary Table S6, S7, S8, S9, S10: Please add more details about the column headers (what is Model-A, background ω 2a, Unconstrained_1.p, K, which was the foreground branch etc.).

      • Supplementary Table S11: Please add gene IDs for each KEGG category.

      We have revised/fixed these issues following your concerns and suggetions.

      Minor comments:

      Line 28: 'algae' in place of 'algas'

      Line 53-56: Please provide more recent references.

      Line65: 'most' instead of 'almost'

      Line 86-87: It is not clear from the sentence if the sex-biased expression was detected in flowers compared to leaves, or were the sex-biased genes detected between male and female leaves? Please clarify.

      Line 107-108: positive selection is referred to as adaptive evolution, please choose one or the other.

      Line 109: 'force' instead of 'forces'

      Line 110: 'algae' instead of 'alga'

      Line 132: '..mainly distributed from Southwest,' the country is missing.

      Line 202: 'protein sequence evolution'?

      Line 232: what does the 'number of evolutionary rates' refers to?

      Line 253: please provide a reference for the RELAX model.

      Line 274: 'relaxed selective male-biased genes' should be 'male-biased genes under relaxed purifying selection'?

      Line 318: Please add a sentence explaining why the Cucurbitaceae family is a great model to study the evolution of sexual systems.

      Line 321: 'genes' instead of 'gene'.

      Line 366: male-biased genes experience 'higher' or 'more rapid' evolutionary rates. line 377: in the present study and in the case of Ectocarpus alga, positive selection plays an important role in male-biased genes evolution, but does not account for the majority of evolutionary change. Therefore, I would not call it a 'primary' force.

      Line 477: missing reference for DESeq2 package.

      Line 480: 'used'.

      Line 498: 'coding sequences'.

      Line516: 'to' instead of 'by'.

      Line 553: 'the' is repeated twice.

      Sorry for the typos and grammatical issues. We have revised them accordingly.

      Reviewer #2 (Recommendations For The Authors):

      There are two areas for improvement, one empirical and one theoretical.

      Empirically, the analyses could be expanded by an attempt to distinguish between genes on the autosomes and the sex chromosomes. Genotypic patterns can be used to provisionally assign transcripts to XY or XX-like behavior when all males are heterozygous and all females are homozygous (fixed X-Y SNPs) and when all females are heterozygous and males are homozygous (lost or silenced Y genes). Comparing such genes to autosomal genes with sex-biased expression would sharpen the results because there are different expectations for the efficacy of selection on sex chromosomes. See this paper (Hough et al. 2014; https://www.pnas.org/doi/abs/10.1073/pnas.1319227111), which should be cited and does in fact identify faster substitution rates in Y-linked genes (and note that pollenexpressed genes, at least, are concentrated on the sex chromosome in this system: https://academic.oup.com/evlett/article/2/4/368/6697528, https://royalsocietypublishing.org/doi/10.1098/rstb.2021.0226).

      We have cited Hough et al. 2014 and noticed that several species have been observed to exhibit rapid evolutionary rates of sequences on sex chromosomes compared to autosomes, which has been related to the evolutionary theories of fast-X or fast-Z (lines 482-484).

      On the theoretical side, this study is making a very specific intervention, namely identifying more rapid evolutionary rates in genes with male-biased than femalebiased expression in a dioecious plant. The writing in the introduction and the discussion needs to be improved to differentiate between this comparison and similar comparisons, e.g. sex-biased expression in other dioecious plants (76-81), between Xlinked and Y-linked genes (Hough et al. 2014), sex chromosomes and autosome (several studies already cited), gametophytic and sporophytic tissue, and male and female reproductive tissue in hermaphroditic plants. Setting out this distinction early in the introduction will make the specific goals and novelty of this work clearer.

      Thank you for your constructive suggestions. We have revised the relevant part of the Introduction accordingly (lines 74-107).

      Specific comments by line:

      Sorry for the typos or wording issues. We have revised them.

      26 - driven not driving

      28 - check house style (algae vs algas)

      28-29 - consider clarifying the antecedent of "them" (evolutionary forces, not algas) 35 - maybe, but don't the signalling genes involved in stress responses function in many capacities, not just stress? Also, there's evidence that reproductive recognition machinery in plants may ultimately derive from immune function (e.g. https://doi.org/10.1111/j.1469-8137.2008.02403.x), so the GO category "biotic stress" may be too vague

      39 - maybe clarify that "for the first time" refers to male rather than female, since there have been other studies in dioecious plants

      66-68 - asserting that something is "essential" after describing how rare it is doesn't quite follow, since diecious plants - especially with sex chromosomes - are basically an exception. I agree that understanding the evolution of dioecious plants is important, but this isn't the most compelling way to make that case - perhaps try something else.

      137ff - this sentence can be consolidated and streamlined

      142 - "floral tissue" rather than "flowers tissue," here and elsewhere

      144 - divergence (singular)

      235 - "evidence for the contributions of" = "evidences" is unidiomatic 250 - efficiency or efficacy?

      300 - why is "inositol" capitalized here and elsewhere?

      300ff - are these typical patterns in male tissue in other species?

      308 - is that interesting? It seems like exactly what I'd expect. Perhaps start with the unsurprising but reassuring observation (anther and pollen development genes are indeed expressed in male buds) before moving on to the more surprising findings.

      319 - remove "the"

      321 - genes (plural)

      330 - replace "these differences" with "the differences" 336 - perhaps recap proportions / percents here?

      340 - unnecessary comma after diterpenoid

      341 - this seems like a big leap from the evidence, especially in the absence of supporting information about the chemical defenses of these species and how they differ by sex. Don't terpenoids have a diverse array of functions, not just defense? Here's a review: https://link.springer.com/chapter/10.1007/10_2014_295

      We have revised the text as “Indeed, functional enrichment analysis in chemical pathways such as terpenoid backbone and diterpenoid biosynthesis indicated that relative to male floral buds, female floral buds had more expressed genes that were equipped to defend against herbivorous insects and pathogens, except for growth and development (Vaughan et al., 2013; Ren et al., 2022) (Fig. S7A and Table S11)” (lines 373-378).

      349 - as mentioned in line 35, this is a big speculative leap. The discussion is the place for speculation, but consider other explanations too. How does the development of flowers work? Are male flowers suppressing or resorbing female primordial organs? Do male flowers in fact senesce faster? perhaps spell out the logic in more detail.

      We have revised the text as “In addition, the enrichment in regulation of autophagy pathways could be associated with gamete development and the senescence of male floral buds (Table S14) (Liu and Bassham, 2012; Li et al., 2020; Zhou et al., 2021). In fact, it was observed that male flowers senesced faster (Wu et al., 2011). We also found that homologous genes of two male-biased genes in floral buds (Table S14) that control the raceme inflorescence development (Teo et al., 2014) were highly expressed compared to female floral buds. Taken together, these results indicate that expression changes in sex-biased genes, rather than sex-specific genes play different roles in sexual dimorphic traits in physiology and morphology (Dawson and Geber, 1999).” (lines 390-402).

      351 - senescence of, not senescence for

      363 - but Hough et al. 2014 did show rapid evolution of Y-linked genes, and those are by definition sex biased ...

      391 - perhaps reiterate here that while some sex-BIASED genes did, sex-SPECIFIC genes did not, to avoid confusion

      We also revised them accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1- lines 56-57 : « have facilitated » : this wording confounds correlation with causation. Consider rephrasing as « is associated with »

      2- lines 58-60 : vague wording : what are these variations ? e.g. which tissues and stages are generally enriched?

      3- line 63 : this sentence is a bit misleading: consider changing it to « Most dioecious plants possess homomorphic sex-chromosomes » [and explain what homomorphic means in this context].

      4- line 68 : a reference is missing here. Also perhaps, allude to the fact that sexual selection in plants has long been considered a contentious issue (e.g. https://doi.org/10.1016/j.cub.2010.12.035)

      5- lines 72-76 : beyond simply describing the pattern, say what evolutionary processes are revealed by these observations.

      6- line 92 : remind the reader what these 5 studies are.

      7- line 94-95 : explain why the comparison of vegetative vs vegetative and vegetative vs reproductive tissues is a problem.

      The published studies only compared gene expression in vegetative versus vegetative tissues and vegetative versus reproductive tissues. Because it limited our understanding of sexual selection at different floral development stages. Revised accordingly (lines 103-104). We are very interested in flower development stage for sex-biased genes. The datasets could investigate sexual selection using two developmental stage (buds + mature flowers).

      8- line 100 « Evolutionary dynamic analyses » : this wording is vague

      9- line 110 : brown algae are NOT plants

      10- line 137-140 or in M&M : needs to describe somewhere how the male flowers differ from the female flowers and vice-versa: are the whole morphological structures related to female (male) reproduction entirely missing, or is their development arrested later on and they are still present but simply not producing gametes? This has consequences for the interpretation of the genes they express.

      We have revised the typos or wording issues accordingly. However, because the sampled floral buds were equal or less than 3 mm in size, we did not observe much morphological structural difference. Indeed, the male and female flowers at antheses were markedly different in this dioecious plant as shown in Fig. 1. Additionally, we found that dioecy is the ancestral state of Trichosanthes, and transitions to monoecy (Guo et al., 2020) based on our analysis (not shown in this study), which suggest that in the early stages of flower development, female floral buds may tend to masculinize, and vice versa (Fig. 2C).

      11- line 152 : it is important to be very transparent on the sample sizes here: « from three females and three males of the dioecious ... »

      12- line 153 : along the same line, explain here why a de novo transcriptome had to be generated here: « In the absence of an assembled reference genome for this nonmodel species, we de novo assembled ... »

      13- line 164-165 : « we have generated high-quality reference trancriptomes » : I am not entirely convinced of the quality of the transcriptome obtained without a reference genome, so I suggest simply removing this subjective sentence.

      Our assessment suggested that we have generated high-quality reference transcriptomes in the absence of a reference genome, which will be the next step of our work.

      14- line 169 : briefly explain the criteria used to call differentially expressed genes. Given the threshold (log-fold change >=1.3 if I read the figure correctly, but the M&M says >=1), explain how it was chosen.

      Sorry, you may have misunderstood the X, Y coordinates. The value of y coordinate represents -log10(FDR), and the value of x coordinate represents log2 (Fold Change).

      15- line 174 : Not clear to me how Fig2C is « revealing strong sexual dimorphism », since genes cluster neither by sex nor by tissue. This should be explained more clearly.

      16- line 174-177 : The fact that more ex-biased genes were identified in early buds than in mature flowers is an interesting observation that could be given more prominence in the manuscript, but it is not really explained. Could it reflect the fact that more genes are expressed in early buds because meiotic processes happen early in flower development? Also, the genes involved in male and female organ cell fate determination might also be expected to be expressed early, with mostly organ growth genes being expressed in the mature flower.

      17- line 181 : a wrap-up sentence might be useful here to drive the point home that sex-bias is more prevalent in buds than mature flowers.

      18- line 184 : « tissue-biased » : a more appropriate wording here would be « stagebiased », no ? These are indeed the same tissues but at different developmental stages.

      19- line 183-195 : this section could benefit from setting clear expectations in a hypothesis testing framework laying out the reasons to expect a different bias between stages and between sexes. How does that fit with the level of morphological divergence between sexes (relates to my point 10 above).

      20- line 197-204. A number of essential pieces of information are missing here: how many species, how divergent, say that one other is dioecious, and precise their relative phylogenetic placement (which is important to understand the models used below). Maybe adding a phylogeny of these species in Figure 4 could be useful. Also, briefly explain the « two-ratio » and « free-ratio » models here.

      21- line 196 and following: In these analyses, I could not understand the rationale for keeping buds vs mature flowers as separate analyses throughout. Why not combine both and use the full set of genes showing sex-bias in any tissue? This would increase the power and make the presentation of the results a lot more straightforward.

      As you pointed earlier (in the public review, paragraphy 2), “the dataset is interesting in that it focuses on reproductive rather than somatic tissues (which makes sense to investigate sexual selection), and includes more than a single developmental stage (buds + mature flowers)”, we totally agree with your points and were very interested in floral development stages for sex-biased genes.

      22- line 216 : say explicitly that the reason for not detecting a significant difference in spite of a relatively large effect size is probably related to the low number of genes, conferring low statistical power to detect a difference. An important feature also not highlighted here is that the trend (though not significant) is in the opposite direction than in the buds, and that both the 2-ratio and the free-ratio models agree on these inverted trends. This could be the basis for an interesting comparison.

      Thank you for your suggestions.

      23- line 220 : needs to explain more clearly how this « free-ratio » differs from the « two-ratio » model.

      24- line 232-234 : I don't see why this is necessary. Why not combine both? See also my comment 21 above.

      25- line 237 : the «A-model » was not defined before.

      26- line 237 : « male-biased » is missing after 343.

      27- line 253-258 : briefly explain what these other models are based on and how they are not redundant and instead complement the previous analyses and each other. 28- line 266-268 : the use of a more precise set of codons for male-biased genes than the others (if I understood correctly) could also be interpreted as another sign of stronger selective constraint, no?

      Codon usage bias is influenced by many factors, such as levels of gene expression. Highly expressed genes have a stronger codon usage bias and could be encoded by optimal codons for more efficient translation (Frumkin et al., 2018; Parvathy et al., 2022).

      29- line 269-291 : removing genes on a post-hoc basis seems statistically suspicious to me. I don't think your analysis has enough power to hand-pick specific categories of genes, and it is not clear what this brings here. I suggest simply removing these analyses and paragraphs.

      30- line 325 : say whether this patterns parallels / or not those in animals.

      31- line 335 : yes, these biological pieces of information are important and should be given in the introduction already.

      32- the discussion should focus at some point on the observation that more femalebiased genes are found in general, but that male-biased genes seem to be under greater selection. How do you reconcile these two apparently contradictory observations?

      We found that male-biased genes with high evolutionary rates in male floral buds were associated with functions to abiotic stresses and immune responses (Tables S12 and S13), which suggests that male floral buds through rapidly evolving genes are adapted to mountain climate and the environment in Southwest China compared to female floral buds through high gene expression (lines 387-390).

      33- line 355 : not clear how this follows from the previous sentences.

      34- line 356-358 : vagiue. not clear what the message of this sentence is.

      35- line 378-383 : say that these conclusions rely on the quality of gene annotation in this non-model species, which is probably pretty low (just like any other non-model species).

      36- line 403 : this conclusion seems far-fetched. Explain how exactly you reached this conclusion.

      37- line 406-416: these speculations on the role of paralogs seem unnecessary, in particular since the de novo transcriptome onto which all analyses are based cannot distinguish orthologs from paralogs.

      38- line 417-424. The discussion should not contain new results.

      39- line 510 : why were genes with dN/dS >2 discarded here? This might strongly bias the comparison, no? This needs to be properly justified.

      40- lines 516-523 : references to the models are missing.

      41- line 527: « omega = 1.5 » : why/how was this arbitrary threshold chosen?

      42- Fig 2 : write out « buds » and « mature flowers » on top of the graphs

      43- Fig 4 : add a phylogeny of the species showing the branch being compared. Also, add the number of genes in each category on each plot.

      Thanks, we revised/fixed these issues accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their thoughtful assessment and critiques. As detailed below in the point-by-point replies, we have modified the text and figures to clarify points of ambiguity and to document statistical significance in places where we had inadvertently neglected to do so. The manuscript is clearer and more rigorous as a result of the review process.

      Reviewer #1 (Public Review):

      This study addresses the fundamental question of how the nucleotide, associated with the beta-subunit of the tubulin dimer, dictates the tubulin-tubulin interaction strength in the microtubule polymer. This problem has been a topic of debate in the field for over a decade, and it is essential for understanding microtubule dynamics.

      McCormick and colleagues focus their attention on two hypotheses, which they call the "self-acting" model and the "interface-acting" model. Both models have been previously discussed in the literature and they are related to the specific way, in which the GTP hydrolysis in the beta-tubulin subunit exerts an effect on the microtubule lattice. The authors argue that the two considered models can be discriminated based on a quantitative analysis of the sensitivity of the growth rates at the plus- and minus-ends of microtubules to the concentration of GDP-tubulins in mixed nucleotide (GDP/GMPCPP) experiments. By combing computational simulations and in vitro observations, they conclude that the tubulin-tubulin interaction strength is determined by the interfacial nucleotide.

      The major strength of the paper is a systematic and thorough consideration of GDP as a modulator of microtubule dynamics, which brings novel insights about the structure of the stabilizing cap on the growing microtubule end.

      I think that the study is interesting and valuable for the field, but it could be improved by addressing the following critical points and suggestions. They concern (1) the statistical significance of the main experimental finding about the distinct sensitivity of the plus- and minus-ends of microtubules to the GTP-tubulin concentration in solution, and (2) the validity of the formulation of the "self-acting" model with an emphasis solely on the longitudinal bonds.

      We thank the reviewer for the comment about statistical significance, and we regret our oversight to have not included that analysis in the original manuscript. We have now included an analysis of statistical significance for the main experimental results supporting the interface-acting model (Fig. 2C and the replotting of those data against a different abscissa in Fig. 3C,D), and more broadly we have ensured that all figure legends contain information about the number of measurements and whether error bars indicate SD or SEM.

      The reviewers comment about the sole emphasis on longitudinal bonds helped us realize that a change to Fig. 1 (where we illustrate the two models) would improve clarity. We had originally chosen to illustrate Figure 1 using ‘pure’ longitudinal interactions (with no lateral contacts), and this may be what triggered the reviewer’s comment. We have now revised the figure to show ‘corner’ (longitudinal + lateral) interactions. There are two main reasons for this decision. First, the corner interactions are more long-lived and therefore more important for the phenomena under study. Second, because illustrating corner interactions provides a better basis for us to discuss what is a subtle aspect of our model – that the ‘GDP penalty’ affecting longitudinal or lateral interactions in a corner site is completely equivalent. Thus, our model is not quite as narrow/exclusive as the reviewer suggested. We appreciate having had the chance to clarify this.

      Reviewer #2 (Public Review):

      McCormick, Cleary et al., explore the question of how the nucleotide state of the tubulin heterodimer affects the interaction between adjacent tubulins.

      (1) The setup of the authors' model, which attributes the dynamic properties of the growing microtubule only to the differences in interface binding affinities, is unrealistic. They excluded the influence of the nucleotide-dependent global conformational changes even in the 'Self-Acting Nucleodide' model (Fig. 1A). As the authors have found earlier, tubulin in its unassembled state may be curved irrespective of the species of the bound nucleotide (Rice et al., 2008, doi: 10.1073/pnas.0801155105), but at the growing end of microtubules, the situation could be different. Considering the recently published papers from other laboratories, it may be more appropriate to include the nucleotide-dependent change in the tubulin conformation in the Self-Acting Nucleotide model.

      We understand the reviewer’s perspective, which may be summarized as: “We know conformational changes are happening and that they affect tubulin:tubulin interactions, so why isn’t your model trying to account for that?” In text added to the revised manuscript, we address this critique in the following ways. First, there is not a consensus in the field about how to parameterize the different conformations of tubulin and how they influence tubulin:tubulin interactions. Second, any attempt to explicitly account for different conformations of tubulin would substantially increase the number of adjustable model parameters, which in turn makes the fitting to growth rates more complicated. Third, compared to traditional ‘dynamics’ assays that use GTP, using mixtures of GMPCPP and GDP simplifies the biochemistry by eliminating GTPase. This results in a more uniform composition of nucleotide state in the body of the microtubule polymer, which diminishes the importance of explicitly modeling nucleotide-influenced changes in conformation. Fourth, it seems likely that different conformations of tubulin will modulate both longitudinal interactions (as tubulin becomes straighter the longitudinal contact area grows larger) and lateral interactions (as tubulin becomes straighter, the lateral contact areas on α- and β-tubulin come into better alignment). Our model treats longitudinal and corner (defined as longitudinal + lateral) interactions as independent, so in principle it could be implicitly capturing some of these conformational effects. By refining the strengths of the longitudinal and corner interactions independently, the model effectively allows the strength of longitudinal contacts to be different for pure longitudinal and corner interactions, which might implicitly capture some variations in longitudinal contacts for different tubulin conformations. Our model treats ‘bucket’-type sites (one longitudinal and two lateral interactions) as simply having an additional lateral interaction of equal strength as the first, but because bucket sites have such a high affinity, they rarely dissociate and this small oversimplification is unlikely to have a substantial effect. We have introduced text in several places (bottom of p. 7 and elsewhere) to cover these points.

      (2) The result that the minus end is insensitive to GDP (Fig. 2) was previously published in a paper by Tanaka-Takiguchi et al. (doi: 10.1006/jmbi.1998.1877). The exact experimental condition was different from the one used in Fig. 2, but the essential point of the finding is the same. The authors should cite the preceding work, and discuss the similarities and differences, as compared to their own results.

      Thank you for reminding us of this paper! We agree that it is an ‘on target’ citation, and have cited and discussed it in the revised manuscript (last paragraph of Introduction, third paragraph of Discussion).

      Reviewer #1 (Recommendations For The Authors):

      1) In my opinion, the way in which the authors have depicted their "self-acting" model in Fig. 1 and in Supplementary Figure 1, makes the model look intuitively implausible. The drawings seem to imply that at the plus-end the GTP hydrolysis in the beta-tubulin subunit somehow allosterically affects the alpha-tubulin subunit of the same dimer to weaken its longitudinal bond with adjacent tubulin dimer. Conversely, at the minus end, the same reaction now affects the very same beta-tubulin subunit, and modulates its longitudinal interaction with the next dimer.

      However, a more realistic formulation of the "self-acting" model would be that the exchangeable nucleotide affects the lateral bonds, formed by the same beta-tubulin with its lateral neighbors. Although the experimental data in this regard are controversial, at least some supporting evidence for this idea comes from structural arguments, e.g. [Manka, S.W., Moores, C.A. Nat Struct Mol Biol 25, 607-615 (2018).] This "lateral selfacting", but not the "longitudinal self-acting" hypothesis, seems more natural, and it was the one previously implemented in the seminal paper by [Vanburen et al, 2002 Proceedings of the National Academy of Sciences 99.9 (2002): 6035-6040.] and later by other some other models as well.

      This point has been addressed above, in part by modifying the cartoon in Fig. 1.

      2) To better clarify, which exact models are considered in this manuscript, it would be helpful if the authors provided a detailed table with all simulation parameters, including, k_off_loner, k_off_bucket and k_off_corner, for both nucleotide states, in both the selfacting and the interface-acting models.

      Thank you for the suggestion. We have added tables that show all simulation parameters, as well as the corresponding calculated on- and off-rates for each interaction.

      3) I am not sure that using some 'arbitrarily chosen' parameters is very helpful in Chapter 1 of Results. In fact, the results, obtained with an unconstrained set of parameters may be misleading or provide ambiguous answers. In other words, how reliable are the conclusions, based on the arbitrary parameter set? For example, could the dependences of the microtubule growth rate on the GDP-tubulin content be more or less pronounced with a different set of arbitrarily chosen parameters, compared to the graphs in Fig. 1BC?

      This is a fair criticism. In response, we have added three new sets of simulations that each test different choices of the biochemical parameters used in Figure 1. With respect to the original parameters, we tested a weaker and stronger choice for the longitudinal interaction (KDlong, a 100-fold range), the corner interaction (KDcorner, a 25-fold range), and the GDP weakening factor (a 100-fold range). The predicted supersensitivity of plus-end growth rates to GDP in the self-acting vs interface-acting mechanisms is robust across the range of different choices for the above parameters (Figure 1 Supplements 1 and 2). Parameters for these new simulations are shown in Tables 3 and 4.

      4) It took me some time to comprehend why the minus-end growth rate is assumed to be dependent only on the concentration of the GMPCPP-tubulin (in section 2 of Results). It would be great if the authors simply plotted the simulated dependence of the growth rate on the GMPCPP-tubulin concentration in the case when no GDP-tubulin was added. As I understand, that curve should almost exactly match the dependence observed in Fig 1B, correct? Otherwise, it does not seem obvious, why GDP-tubulin does not impede the minus-end growth. Again, is this conclusion model- and parameterdependent? This question is related to point 3 above.

      The minus-end growth rates decrease in proportion to the concentration of GMPCPPtubulin. We have added a note on minus-end growth rates in the Figure 1 legend.

      5) I was not quite convinced by the evidence for distinct sensitivities of the plus- and minus-end growth rates to GDP-tubulin concentration (Figure 2C and Fig 3C, D). These are the key experimental measurements in the paper. Therefore, I suggest that the authors try to strengthen this point by additional measurements to increase statistics. Or at least, please, explain the data points, the error bars, and provide some information on the number of independent measurements and the statistical significance between the curves. Maybe, they could be directly compared after normalizing by the "all GMPCPP growth rate"? How was the "1.5-fold" ratio obtained in Fig 2C? Does that number refer only to a certain GDP-tubulin concentration or does that value somehow characterize the whole range of the concentrations measured?

      This has been addressed above.

      Reviewer #2 (Recommendations For The Authors):

      These look identical to above and were addressed there.

      (1) The setup of the authors' model, which attributes the dynamic properties of the growing microtubule only to the differences in interface binding affinities, is unrealistic. They excluded the influence of the nucleotide-dependent global conformational changes even in the 'Self-Acting Nucleodide' model (Fig. 1A). As the authors have found earlier, tubulin in its unassembled state may be curved irrespective of the species of the bound nucleotide (Rice et al., 2008, doi: 10.1073/pnas.0801155105), but at the growing end of microtubules, the situation could be different. Considering the recently published papers from other laboratories, it may be more appropriate to include the nucleotide-dependent change in the tubulin conformation in the Self-Acting Nucleotide model.

      (2) The result that the minus end is insensitive to GDP (Fig. 2) was previously published in a paper by Tanaka-Takiguchi et al. (doi: 10.1006/jmbi.1998.1877). The exact experimental condition was different from the one used in Fig. 2, but the essential point of the finding is the same. The authors should cite the preceding work, and discuss the similarities and differences, as compared to their own results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviews

      Reviewer #1:

      We thank this reviewer for their comments on our paper. We have adjusted the methods secon to ensure it is clear, including an updated descripon of the stascs and in some cases updated stascal methods to ensure uniformity in analyses across datasets. The discussion has been modified so that the message regarding our results is set appropriately in the literature.

      Reviewer #2:

      We are grateful to this reviewer for their insight. We have modified the text of the discussion to address the points of this reviewer, including providing a greater focus on the significance of our results without overgeneralizing. We have addionally reframed our argument regarding the detecon of pescides by Bombus terrestris to more carefully consider conflicng results from other papers.

      Response to Recommendaons For The Authors

      Response to Reviewer #1

      We thank this reviewer for their thoughul comments and ideas. We have made several changes to the text of the manuscript to improve the clarity of our wring, and we are grateful to the reviewer for raising several important points that we had not sufficiently discussed in the paper previously. We feel the paper has been improved with the inclusion of a more thorough discussion and clarified methods. Please see below our responses to the points they raised.

      A few general thoughts that I had when reading your manuscript: I assume you have only tested the acve pescide ingredients, but not the formula generally applied in the field. Given that these formulas contain addional compounds but the acve ingredients, might it not be possible that they could be perceived by bees?

      For this study, we were interested specifically with the taste of acve pescide compounds, although we agree it could be interesng to explore the taste of other formula compounds, it was not within the scope of this paper to test these.

      Is there an alternave to quinine as a negave control? As you state, quinine is generally used in studies and likely oen in concentraons which are beyond what can be seen in e.g. floral nectar, which might explain its aversive effect. I would like to see it tested in natural concentraons and ideally in combinaon with other potenally toxic plant secondary metabolites (PSMs).

      The purpose of including quinine in our study was to provide an in-depth characterizaon of “biter” taste responses using the sensilla on bumblebee labial palps and galea (i.e., through the atenuaon of GRN firing). This was not to show how bumblebees may interact with plants containing quinine in the field, or other PSMs. It would indeed be interesng to explore other plant secondary metabolites, however this was beyond the scope of our paper.

      L177-187 AND 233-238 Could you, please, provide a photo or schemac drawing to illustrate your assay? I have a very hard me picturing the actual set-up.

      We have provided a labeled diagram of the bumblebee mouthparts and sensillum types (Fig 1A), as well as an image of the bumblebee feeding from a capillary in the behavioural assay (Fig 1G). Further details about the feeding assay (including a JoVe video) can be found with the Ma 2016 paper that we cite throughout our methods secon.

      L219 Why did you choose 5 sec here?

      This feeding bout duraon was selected based on the criteria defined in Ma et al 2016. We have added a citaon to that sentence.

      L221-224 How precisely was the behavior scored? Just length of bouts or also repeated short contacts? Please, specify.

      We used the first bout duraon and the cumulave bout duraon in our analyses. A sentence has been added to specify this.

      L231/233 Please, provide some brief details here, as many readers may find it annoying to find and read another study for methods' details.

      We have added three sentences in the methods to further explain the electrophysiological method.

      L238-245 See also my general methods comment: concentraons used for pescides and quinine differ quite substanally, which may explain their different effects on the bees' percepon. Are the concentraons used for quinine realisc? If not that is totally fine for a negave control, but it would be interesng to see a comparison of effects conducted for similar concentraons.

      The concentraons used of quinine were selected so that they would elicit a known “biter response” – these concentraons are not meant to be field-realisc, and our data (and others, e.g., Tiedeken et al 2014) show that lower concentraons of quinine are not detected by bumblebees.

      L277-301 I assume that this is a standard stascal approach to analyze electrophysiological data. However, I am really struggling with fully understanding what you did here. It might be good to add some addional explanaon/detail, e.g. on why you constructed firing rate histograms or how you derived slopes (aren't smulus and bin categorical variables?).

      Firing rate histograms are indeed very commonly used for visualizing neuron spikes over me. We have changed the text somewhat in an effort to make it more clear. Likewise, the “slopes” were derived from the LMEs, and in this case “bin” is a connuous me variable – any me variable will involve some binning depending on the resoluon used but should not be considered categorical.

      L291-295 As you were averaging and normalizing your data, could you, please, provide some informaon on variaon within animals?

      We have now included informaon on the coefficient of variaon for spike rates across sensilla for a given animal/smulus (CV range, median, and IQR).

      L295 I assume t-SNE represent a mulvariate approach for ordinaon, correct? Can you explain why you chose to use this approach? Did you use Euclidean Distance?

      Yes, t-SNE is a mulvariate technique for dimensionality reducon. It is parcularly well-suited for the visualizaon of high-dimensional datasets, as it can reveal the underlying structure of the data by embedding it in a lower-dimensional space (e.g., 2D) while preserving the local structure of the data as much as possible. We used t-SNE because it has been shown to be effecve in visualizing clusters of similar data points in high-dimensional data. Euclidean distance was used as the distance metric for the t-SNE embedding. Euclidean distance is the default distance metric for most implementaons of t-SNE and is appropriate for connuous data like the spike counts in this study. We have adjusted the methods to clarify this.

      L304 Why did you not always use LMEs?

      We have adjusted the text to show that we used LME for all comparisons, and the stascs have been updated accordingly in the results secon. None of the outcomes changed with the implementaon of LME for all tests.

      L306 Would it not make sense to also include the interacon between smulus and concentraon in your models?

      We have now included a sentence to explain that the interacon term was removed due to it being nonsignificant, and the models without the interacon term having beter model fit (determined by having lower AIC and BIC values).

      Results:<br /> L337, 339 and more: I would prefer to see actual p-values, not just "p > 0.05".

      This has been adjusted on L337 and 339. As far as we are aware, there are no other instances where exact p-values were not given (except when p < 0.0001).

      Discussion:<br /> L470 This is true, but the bees' behavior changed significantly, indicang that they may respond to this small change in firing paterns already?

      It is true that the bees’ behaviour changed significantly with 0.1mM QUI, but this was not the case with the pescides. Bees drank less overall of 0.1mM QUI than OSR because of the rapid posngesve effects of this compound. It’s important that the duraon of the first bout was not affected (i.e., they didn’t directly avoid it by taste upon first encountering it, as they do with 1mM QUI), but just that they drank less of the 0.1mM QUI over 2 minutes. Post-ingesve effects may occur as quickly as 30s aer inial consumpon. For the pescides, the small changes in GRN firing were not associated with any effects on consumpon or our other measures of feeding behaviour, and we suggest this results from a lack of rapid negave posngesve consequences. We now include further discussion of these important points.

      L481 But they consumed significantly less of the 0.1 mM QUI!?

      This is true, but they did not reject it (i.e., not drink it at all), and there were no changes in the amount of me the bees spent in contact with the 0.1mM QUI soluon compared to OSR. We have adjusted the text for clarificaon.

      L504/505 AND 520/521 AND 533-536 I see your point, but I am wondering whether the bees may need some me but are generally able to learn the taste of pescides, which may explain why e.g. Arce et al. only saw an effect over me. For example, learning to 'focus' on the pescide taste may require some internal feedback (bees not feeling well) or larvae feedback. If I understood right, you tested workers only, which might be less sensive than queens or larvae. I think these aspects should be discussed.

      In our study, we invesgated the mechanism of taste detecon of pescides. We agree that bees likely use posngesve mechanisms to learn to associate the locaon (or another cue) of a food source with posive or negave posngesve cues. ‘Focus’ is a higher-order process that involves increased atenon to sensory smuli but does not affect sensaon at the level of the receptor. We show that bees are unable to taste pescides using the gustatory receptors on their mouthparts, so post-ingesve learning would not be able to associate the pescides with any taste cue. Indeed, there may be caste-specific differences with foraging queens, however a discussion of this would be beyond the scope of our paper.

      I also recommend broadening the scope of your discussion. For example, you only focus on nectar, while the story might be different for pollen, which is also contaminated with pescides but represents a different chemical matrix with potenally different taste properes. Also, unlike nectar, pollen is collected with tarsae and hence on contact with other bee body parts.<br /> I would also like to see a discussion of your study's implicaons for other bee species and other potenally toxic compounds (e.g. PSMs).

      We do not include any data in this paper regarding tarsal or antennal taste or other potenally toxic compounds. In this paper we present one mechanism of biter taste percepon (i.e., of quinine) and specifically show that the buff-tailed bumblebee is unable to taste certain pescides using their mouthparts. To avoid overgeneralizing, we have not included discussions about other species or compounds, which may or may not share similaries with our study.

      Response to Reviewer #2

      We thank this reviewer for their comments. We have adjusted the text to avoid overgeneralizaons with our conclusions, and atempted to soen language in the discussion that may have been construed as combave towards the Arce et al (2018) paper. We hope this reviewer finds these adjustments to be in line with their expectaons.

      1) In two parts of the manuscript, the authors made broad predicons and conclusions beyond what the evidence in the paper can support and wrote "Future studies will be necessary to confirm this." (Lines 508-509) and " Future studies that survey a greater variety of compounds will be necessary to confirm this." (563-564). Instead of making conclusions based on what experimental data in future studies may support, I would ask the authors instead to make conclusions that their current study can support based on experimental evidence in this paper.

      We have removed these predicons that extend beyond the scope of the paper.

      2) Line 315 "GRNs encode differences in sugar soluon composion". The logic of the tle is wrong.

      This has been fixed.

      3) Since this study is only performed in one bumblebee species, then I would suggest that the tle be more specific e.g., "Mouthparts of the bumblebee Bombus terrestris exhibit poor acuity for the detecon of pescides in nectar".

      We have made this change.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for recognizing the importance of our work and for their insightful suggestions. A point-by-point response to their comments is listed underneath each reviewer’s section.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      1) Have the authors optimized the expression level of dCas9? I cannot find this information in this paper or in their 2021 paper. It is important to avoid the toxicity phenomenon that occurs when using guide RNAs that share specific five base seed sequences (referred to as 'bad seeds').

      Cui L., Vigouroux A., Rousset F., Varet H., Khanna V., Bikard D. A CRISPRi screen in E. coli reveals sequence-specific toxicity of dCas9. Nat. Commun. 2018; 9:1912.

      Rostain W., Grebert T., Vyhovskyi D., Thiel Pizarro P., Tshinsele-Van Bellingen G., Cui1 L., Bikard D. Cas9 off-target binding to the promoter of bacterial genes leads to silencing and toxicity. Nucleic Acids Research, 2023, gkad170.

      2) One guide per gene is highly unusual given that different guides block the RNA polymerase with different efficiency. This was even shown by the Machner lab in the Legionella context in Figure 1c of Ellis et al. 2021 for sidM and vipD. Typically, genes need three guides minimum to ensure that the gene of interest is knocked down fully unless it is not possible as the gene is too small and/or it is difficult to find an NGG sequence. The authors have used one guide per effector, how can they be sure that each gene is knocked down? The Machner lab themselves in Figure 3c of Ellis et al. 2021 shows not all genes targeted using multiplex CRISPRi are equally efficiently knocked down. Please justify why only one guide per gene was chosen and add controls to validate the results. The authors themselves state that the strategy of one guide may be problematic. Lines 315-316 it reads... A possible explanation was the incomplete knockdown of a seemingly important process.

      3) Given what the Machner lab observed about spacer location in Ellis et al. 2021 would it not make more sense to take one set of redundant effectors and make multiplex randomized CRISPRi with them in different locations? For Figure 1 at least.

      4) Following infection, it seems that the bacteria were not plated onto antibiotic media, so it is not known how well the plasmid harboring guides is kept through infection.

      Specific comments

      A) The first results paragraph describes the set-up of 10-plex synthesized CRISPR arrays, where 10 effector encoding genes of specific gene families are selected. The rationale of the choice of these genes is not given. Please explain.

      B) Please also add some biological data on what these genes code for, and what is their known or predicted function. It is not very informative and exciting to have tables of lpg numbers without any knowledge of what these genes code for and why they were selected, at least some.

      C) Figure 1 A Why are only some of the MC arrays shown? Please, at least include in supplementary the others. Again one array in detail would be more informative, showing true knockdown of all genes by qPCR and ideally by western blot.

      D) I am not convinced that the gene silencing efficiency qPCR comparison is done in the correct way. In my opinion, each of the genes to be knocked down should be tested against the expression of a control gene e.g. rpoS and then these results should be compared and not the results of empty plasmid or CRISPR array containing plasmid directly. L. pneumophila are very sensitive to growth conditions and inoculum, thus the two strains might not be completely at the same growth stage when being compared which can impact the results.

      E) Figure 1 B As stated in general comment number 4, the authors do not appear to plate onto antibiotic so we don't know how well the plasmid harboring the guides is kept through infection. The sustained presence of the guide is particularly important for CRISPRi.

      F) The authors found only a few growth phenotypes and mainly this was due to single genes and not combinations of genes. This might again be due to the fact that only one guide per gene was used. How do the authors know that all genes targeted were indeed knocked down?

      G) Line 119 Alternatively, the genes were not 100% all knocked down, escaping the knockdown effect expected. Could authors take three genes with three guides each and look at impact instead of only one?

      H) The authors then develop the randomized multiplexed arrays and chose to test 44 TME encoding genes. Line 141 Justify why these effectors were chosen in the text.

      I) Unfortunately, the method is not clearly described, and many parts are complicated and the text needs to be re-read several times to be understood (lines 150 - 166). Please re-write to better explain to the reader. In line 156 the authors point to a supplementary note 1. This information should be in the main text.

      J) What is the copy number of the CRISPR plasmid? Please add in the Material and Method section also the origin of this plasmid.

      Figure 2

      K) In the paper (line 154-160) and the extra notes, it states that authors attempt to size select CRISPR arrays. However, this is not apparent in Figure 2 schematic. Or are the authors stating that plasmids only containing one guide were selected out? However, line 312 would suggest not. Please clarify

      L) A limiting factor in making multiplex guide CRISPR, as the authors are trying to establish in this study, is cloning of multiple guides. In the pre-determined CRISPR arrays in this study, the guides were synthesized. For the randomized multiplex CRISPR in this study, the authors adapt a Golden Gate cloning method to generate multiple sgRNAs in the Cas9 vector. A similar protocol was established in the below paper. Please add this reference.

      Zuckermann, M.; Hlevnjak, M.; Yazdanparast, H.; Zapatka, M.; Jones, D.T.W.; Lichter, P.; Gronych, J. A novel cloning strategy for one-step assembly of multiplex CRISPR vectors. Sci. Rep. 2018

      M) As the authors note, Zuckermann et al. similarly note that plex of 3 or 4 is most common and above 5 is rare. This thus appears to still be the limiting step of multiplex CRISPR technology. Please discuss

      Figure 4

      N) The idea of multiplexed CRISPRi seq to address the biological phenomenon of redundancy is an interesting one, however, I am missing the in-depth functional characterization and discussion of at least one of the redundant functions discovered. Please add.

      Figure5/6

      O) As noted above, the strength of the experiments is undermined by how CRISPRi is set up. Having an average multiplex of 2 or three and again only using one guide per gene weakens the study and the results obtained. Furthermore, as stated in general comment number 4, the authors do not appear to plate onto antibiotic so again, we don't know how well the plasmid harboring the guides is kept through infection. The sustained presence of the guide is particularly important for CRISPRi. Please add a validation that the guides are all present.

      Response to Reviewer #1

      We are grateful to the reviewers for their insightful comments and suggestions on how to further improve the manuscript.

      Regarding the issue of ‘bad seed sequences’ (comment #1), we had previously evaluated the expression level of dcas9 (plotted in Figure 1b of the 2021 Communications Biol paper) and tuned our induction conditions accordingly (40 ng/mL as described in the Methods). Since all strains used in this study express dcas9 from the chromosome, not a plasmid, this eliminates the possibility of fluctuations in expression levels due to variabilities in plasmid copy numbers.

      In the rare event that toxicity by any given guide occurs, we would expect that guide to already be underrepresented or missing in the input pool following 24+ hours of CRISPRi induction during axenic growth. Our data, now discussed in the manuscript (Lines 211-216 and Figure S2), revealed that this was not the case and that all guide-encoding spacers were present in roughly equal amounts (median of >5000 occurrences). As with any knockdown study, the creation of true chromosome deletions was performed throughout as to alleviate the issue of false positives.

      Regarding comments #2, #3, and specific comments made under point F, G, and O, on the topic of using single vs. multiple guides, we agree that there are circumstances under which using more than one guide per target may be advantageous, for example when attempting to delete a gene from mammalian cells using conventional CRISPR engineering. In the study described here, this is not the case. In fact, we did create a second array library with alternative guides targeting the same group of genes at locations other than the “optimal location” identified in our 2021 paper and found that these “sub-optimal” guides were inefficient for identifying critical effectors as described in Supplemental Note S1 under the heading “Sub-optimal annealing sites” (Lines 919+). These data suggest that adding sub-optimal guides to the arrays of optimal guides might ‘poison’ the arrays and limit rather than enhance their ability to identify gene combinations.

      Regarding comment #2, #3, and specific comments made under point C, F, and G, on the topic of confirming efficient gene knockdown for the identification of critical genes, we remind Reviewer 1 that we did confirm knockdown of 60 of the target genes of the 10-plex screen to be at least 2-fold, with an average fold repression of one order of magnitude or more (Figure 1A). While knockdown of every gene in every 10-plex construct would be an unprecedented ask of any published CRISPR screen, we believe that these 60 genes provide a large enough sampling of all guides to elucidate the range of knockdown to be expected by our CRISPRi platform. As with other knockdown technologies, such as RNAi, there is no expectation of accomplishing complete knockdown for any given target. Hence, the data in Figure 1A suggest that the lack of identifying critical genes using pre-determined 10-plex arrays was not due to a lack of knockdown efficiency, but rather the difficulty to accurately predict redundancy within a cohort of uncharacterized genes, accentuating the need for array randomization with MuRCiS.

      On the topic of antibiotic use for plasmid selection (comments #4, E and O), we would like to clarify that the CRISPR plasmids were selected by thymidine prototrophy, not antibiotic resistance, and we apologize for not making this clearer. The laboratory strain Lp02 is a thymidine auxotroph (thyA-) L. pneumophila variant, and plasmid retention is routinely achieved by including the thymidine biosynthesis gene (thyA) on the plasmid backbone. Only with a plasmid bearing the thyA gene can L. pneumophila grow on CYE (thymidine-) plates. Our use of vectors bearing thyA and plating on CYE plates is described in the Methods section. Further, in Figure 7 of our 2021 paper, we show that CRISPR plasmids are efficiently retained by Lp02 for the duration of a 48-hour infection, resulting in efficient multi-gene knockdown even at the end of the intracellular growth experiment.

      Regarding comments A and B, on publishing the biological data used to classify genes in gene families for 10-plex silencing, we do not consider it critical to provide additional information beyond the broad classification (e.g. kinases, phosphatases, etc) described in Table S1. Structural predictions constantly change due to continuously evolving databases. Our initial analyses were made in 2015 using HHPRED Hidden-Markov models and, in all likelihood, those predictions have been refined since then. Moreover, with the recent advent of Alphafold, anyone interested in learning more about select effectors from our list is advised to simply access the most recent functional predictions directly on the Alphafold webpage (https://alphafold.ebi.ac.uk/). We clarify how predictions were made on Lines 97-101.

      Regarding specific comment D, on our method for qPCR normalization and comparison, we point Reviewer 1 to the Methods section (Lines 460+) where we describe that data obtained from each CRISPRi strain were in fact normalized to the levels of rpsL prior to comparing them to the normalized data from the strain with the empty control plasmid. This normalization to rpsL, a gene encoding a ribosomal protein, also corrects for growth differences between samples.

      Regarding specific comment H, the justification for studying 44 transmembrane effector-encoding genes was driven by the fact that activities mediated by transmembrane proteins are difficult (though not impossible) to be replaced by cytosolic proteins, for example the transport of metabolites across the LCV membrane. And since transmembrane regions can be predicted with high confidence, we decided to probe this group of TMEs for synthetic lethality with the randomized CRISPRi approach as proof-of-concept. To make this clearer, we have added more detail to the text (Lines 151-155).

      Regarding specific comment I, we have further simplified the description of the cloning technique to increase clarity (Lines 156+). The information listed under Supplemental Note S1, though informative, is not critical for the overall understanding of this highly technical section, and since the reviewer already considered this section to be difficult to follow, we would prefer to not further complicate the text by including these non-essential details.

      Regarding the origin of the CRISPRi plasmid (specific comment J), we point Reviewer 1 to the reference (Hammer BK and Swanson MS (Mol Microbiol 1999)) listed in Table S10: Strains and Plasmids Used in this Study.

      Regarding specific comment K and O, on the clarity of depicting the CRISPR array size selection process, we have updated the Figure 2 schematic. Reviewer 1 is correct in that despite our best efforts to exclude short CRISPR arrays, inevitably some 1-plex arrays remained in our input vector pool. Still, the average length of all arrays used in our pilot study exceeded three crRNA-encoding spacers. Further, having a population of 1- or 2-plex arrays in our libraries did allow us to pin-point the most critical effectors of a larger arrays within the same MuRCiS experiment (Table S5 and Table S7), a strength of MuRCiS as described in the discussion (Lines 378+).

      Regarding specific comment L, we appreciate Reviewer 1’s suggestion of an additional reference and we have added it to the manuscript as reference #23 (Line 71). While this reference does use a Golden Gate strategy to build a multiplex array, that array was not randomized but had a predefined order. Hence, our assembly method is unique due to its randomization.

      Regarding specific comment M, on array length cloning limitations, we agree with the conclusion of Zuckermann in Figure 1d of their article that longer inserts are generally harder to get into vector backbones. The challenge of cloning longer inserts is a common phenomenon of general biology and is not unique to cloning CRISPR arrays. We have altered the wording in our manuscript to better describe the intrinsic competition between short and long inserts during cloning (Lines 162-164).

      Regarding specific comment N, we second Reviewer 1’s desire to learn more about the critical effector pairs discovered here. With that said, the goal of this manuscript is to report the development of a novel MuRCiS pipeline to identify these critical pairs. Biochemical and molecular investigations of the encoded protein pairs are on-going and will be the topic of a future manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Specific points

      1) The effector repertoire of L. pneumophila seems to have evolved in response to the plethora of potential protozoan hosts (PMID: 31988381). To further assess evolutionary aspects of the vast L. pneumophila effector arsenal, it would be interesting to test the single and double effector mutant strains (Fig. 5FG, Fig. 6EF) for growth in protozoa other than A. castellanii.

      2) Most CRISPR arrays comprising genes encoding functionally similar proteins or encoding evolutionarily conserved proteins did not substantially affect intracellular growth of L. pneumophila (Fig. 1B). This rather surprising result should be further discussed.

      3) l. 118/119: "Similar results ..., where none of the MC arrays ..." This statement should be phrased more precisely, since some CRISPR arrays did indeed have an effect on intracellular growth of L. pneumophila in U937 macrophages, while none affected intracellular growth in A. castellanii (Fig. 1B).

      4) Typos:

      • l. 852: ... (arbitrarily set to -100).

      • l. 862: ... Legionella-containing vacuole (LCV).

      • l. 895: ... and so we would recommend ...

      Regarding point 1, we thank Reviewer 2 for the suggestion of testing effector mutants in different hosts. While the primary purpose of the current manuscript was to optimize the MuRCiS platform, future studies using this technology to investigate specific biological questions related to Legionella infection would certainly benefit from including more than one amoebaean species.

      Regarding point 2, we agree that the lack of substantial growth defects seems surprising. Yet only two of the seven core effectors (found in all Legionella sp.), lpg2300 and mavN, individually attenuated Legionella intracellular growth when deleted (Burstein 2016 Nat Genetics; Isaac et al., 2015 PNAS). Thus, we hypothesize that the functions many effectors fulfil are of such importance for intracellular survival that that redundancy reaches beyond the boundary of conservation or like-function. We have added a statement emphasizing this at the end of the Figure 1 results section (Line 122-125).

      Regarding points 3 and 4, we thank Reviewer 2 for catching these errors and have corrected where needed in the text.

      -l. 852 (now Line 874): … (arbitrarily set to -100,000) is correct for Figure 6E.

    1. Author Response

      The following is the authors’ response to the previous reviews

      Comments from reviewer 1:

      Comment 1. Regarding SBSMMA, the authors may complement their discussion by mentioning recent work (PMID: 35738428) where SBSMMA was used to exemplify a potential fragment-based design approach for developing allosteric effectors for kinases.

      Thank you for the suggestion, we have added a short summary of the work where SBSMMA is used as a basis for developing small molecules to target kinases using fragment-based design approach

    2. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their generous comments on the manuscript and have made edits to address their concerns. The manuscript has been restructured and the reference (PMID: 35738428) has been added to the review. We addressed the reviewer's comment below.

      Reviewer #1 (Recommendations For The Authors):

      Regarding SBSMMA, the authors may complement their discussion by mentioning recent work (PMID: 35738428) where SBSMMA was used to exemplify a potential fragment-based design approach for developing allosteric effectors for kinases.

      Thank you for the suggestion, we have added a short summary of the work where SBSMMA is used as a basis for developing small molecules to target kinases using fragment-based design approach.

    1. Author Response

      We thank the reviewers and the editorial team for their assessment and valuable feedback on our manuscript. Their supporting comments reinforce the significance of our findings.

      Regarding the specific point raised about the partial effects observed in the TGN46 KO cell line, we acknowledge the importance of addressing this issue in more detail in the revised version of our manuscript. The partial effects observed when using the TGN46 KO cell line are likely caused by several factors:

      1) It is important to consider the phenomenon of cell adaptation/compensation, which is documented to occur in gene knockout cell lines. Cells often respond to genetic perturbations by adapting to compensate the loss of a specific gene. These compensatory effects could potentially mitigate the full impact of TGN46 depletion and might explain the partial effects observed.

      2) Our data indicate that the absence of TGN46 reduces PAUF secretion, but does not completely block its export. These results align with our proposed role TGN46 in cargo sorting. In its absence, the secretory proteins likely exit the TGN via alternative routes/mechanisms, such as "bulk flow" or by entering other transport carriers in an uncontrolled manner. The partial redistribution of the TGN46-∆lum mutant into VSVG carriers (Figure 4D) supports this likelihood. Importantly, similar situations are observed when unrelated sorting factors are depleted from the Golgi membranes. For example, when the cofilin/SPCA1/Cab45 sorting pathway is genetically disrupted, the secretion of this pathway's clients is inhibited but not completely halted (e.g., von Blume et al. Dev. Cell 2011; J. Cell Biol. 2012).

      3) As suggested by the reviewers, it remains possible that TGN46 is not the sole player for cargo sorting. The existence of redundant or alternative mechanisms cannot be ruled out.

      In our revised manuscript, we will provide a more in-depth discussion of these factors and their potential contributions to the observed partial effects in TGN46 KO cells. We believe that a comprehensive exploration of these possibilities will improve our understanding of the role(s) of TGN46 in cargo sorting and TGN export.

    1. Author Response

      eLife assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence.

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Provisional point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge.

      We respect the thoughtfulness of the reviewers and editors and look forward to improving the paper to fully answer both public and private comments with a revised manuscript.

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      1. Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We will provide a more detailed description of the methods and results to clarify the temporal relationships between neural activation, astrocyte calcium dynamics, and astrocyte morphology segmentation.

      2. Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We will expand upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      3. Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We will provide additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      4. Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We will enhance our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes.

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge.

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge.

      Further, we used a lower stimulus frequency (2Hz) than Stobart et al. (90 Hz) to assess subthreshold activities. We found that stronger stimuli decreased response delays and will include this result in the revised manuscript. Interestingly, from Fig 4F, higher stimulus did not significantly alter the spatial threshold. In the revised version of the manuscript, we will provide a more detailed analysis and the consequent discussion of this analysis.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we aim to address this by novel analysis that will be provided in the revised version of the manuscript.

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we will include text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicates an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension.

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items will be discussed and clarified in the revised version of the manuscript.

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we aim to further address this issue in the revised version of the manuscript by analyzing the calcium dynamics in individual domains.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Here we take a conservative approach to constrain ROIs to SR101-positive astrocyte territory outlines without invading neighboring cells in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results.

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses.

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data will be interesting. We will provide the results of the suggested analysis in the revised version of the manuscript.

      1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses.

      2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome). The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal. Indeed, we have found arborization activity precedes soma activity. However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies.

      3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and an analysis of spatial clustering on pre-soma domain activation may be useful to answer it.

      4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      This is another interesting analysis that can be done with a spatial clustering analysis.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant or AQuA. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents.

      Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell, and we chose to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We plan to include a paragraph in the discussion to address this limitation in our study.

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we will acknowledge this is in the discussion.

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer that we should add to the paper a discussion for our justification on the use of the Heaviside step function, and plan to include this. We chose the Heaviside step function to represent the on/off situation that we observed in the data. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a similar graph should be included in Fig. 5 as well. We agree that a different statistical model describing the data would be more convincing and also confirmed the spatial threshold with the use of a confidence interval in the text.

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We will increase the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

    1. Author Response

      We are grateful to the reviewers for recognizing the importance of our work on transcription-independent early recovery of proteasome activity. We also thank them for their thoughtful criticisms and suggested improvements, which we will address in the revised version as described below.

      The reviewers and editors asked for data to support the model that early recovery of proteasome activity is due to accelerated proteasome assembly. This model is backed by published data that proteasome assembly intermediates increase dramatically in cells treated with proteasome inhibitors (Fig. 6 in Ref. 46 of the revised manuscript). We will expand the discussion of this paper in a paragraph that describes our model. Another key experiment to confirm this model would be to determine what fraction of nascent polypeptides is degraded within minutes after synthesis, which is not trivial, and Ibtisam ran out of time to conduct these experiments because she had to graduate in spring before the expiration of her visa. This type of experiment usually uses metabolic labeling by a heavy or radioactive amino acid that always includes a prior depletion of a non-labeled amino acid. However, the fundamental flaw of this approach, which is not recognized by the scientific community, is that depletion of an amino acid stresses cells and reduces the rate of protein synthesis, especially if this amino acid is methionine. Thus, this model is not easily to test, and should be considered a speculation. We will therefore move the description of this model, together with Fig. 4, into a separate "Ideas and Speculation" section and remove this model's description from the abstract.

      Reviewer 1 raised the possibility that a background band detected on the western blot of DDI2 KO cells could be a highly homologous protease DDI1. This is highly unlikely because, according to Protein Atlas, DDI1 is selectively expressed in the testis and is not expressed in the cell lines we used. Reviewer 1 also suggested that we should base our conclusion on Nrf1 KD, which we de-facto did because we confirmed that DDI2 KD blocks Nrf1 activation (Fig. 1d).

      In response to Reviewer 1 critiques regarding the presentation of proteasome subunits stability data in Fig. 4 (Ref. 45 of the revised manusript), we will remove PSMB8 and replace chaperons with the subunits of the 26S base. We will change color palettes, symbols, and axis scales to improve clarity.

      We will acknowledge in the discussion that our work did not exclude DDI2 role in the recovery of proteasome after repeated pulse treatments, as suggested by Reviewer 1.

      We agree with Reviewer 2 that using proteasome levels is inaccurate when describing our activity measurement data. However, in the manuscript, we use "levels" only when discussing data in the literature. We believe measuring activity and not the total levels is more important because not all proteasomes are active, e.g., latent 20S proteasome core particles.

      Reviewer 3 expressed concern that our conclusions were based on data in HAP1 cells, which are haploid, and appear not very sensitive to proteasome inhibitors. This is why we used DDI2 KD in MDA-MB-231 and SUM149 cells, which are highly sensitive to proteasome inhibitors (Weyburne et al., Ref. 11). In our experience, full extent of proteasome inhibitor cytotoxicity is not revealed until 48hr after treatments, and viability determined at 12hr and 24hr as on Fig. 1c should not be used to determine sesnsitivity (it was used for activity assay normalization). We will add a new supplementary figure showing that HAP1 cells are as sensitive to proteasome inhibitors as MDA-MD-231 cells when cell viability is assayed 48hr after treatment (new Fig. S2). Another panel on this new figure will demonstrate that the baseline proteasome activity is very similar in HAP1, MD-MB-231 and SUM149 cells. We will also add data demonstrating that inactivativion of DDI2 by mutation does not change the recovery of proteasome activity in HCT-116 cells (new Fig. 1g). Recovery in MDA-MB-231, SUM149, and HCT-116 cells was measured at 18hr, which is still within the 12 – 24hr window when other investigators observed partially DDI2-dependent recovery.

      We have conducted an experiment in which we followed activity recovery for up to 72hr. We found that activity plateaued at 24hr and opted against the repeat because there were no changes. We feel that the manuscript should not include one biological replicate data. The fact that the recovery is incomplete and that cells seem to survive with lower levels of proteasome activity is interesting; however, investigating the molecular basis for this phenomenon is beyond the scope of the current project.

      We were not disputing the conclusions of previous studies that DDI2/Nrf1 is responsible for enhanced expression of proteasomal mRNA in cells continuously treated with proteasome inhibitors. In fact, we confirmed that pulse-treatment causes similar increase (Fig. 2b). As for papers that measured activity recovery after pulse treatment, we objectively discuss our results in the context of these papers.

      We will also respond to Reviewers' recommendations and minor points:

      • We will review the revised version carefully to eliminate spelling and grammatical errors and typos.

      • We will no longer refer to DDI2 as a novel protease, as suggested by Reviewer 1.

      • We agree with Reviewer 2 that our CHX results do not necessarily mean that recovery involves translation of proteasomal mRNAs, and we will now conclude that proteasome recovery requires protein synthesis.

      • We will revise Fig. 1c, 3a and 4a to improve clarity.

      • We have stated in the caption that data in Fig. 4a comes from Table S4 in Ref. 45.

      • We will accept an excellent suggestion of Reviewer 3 to change "recovery" to "early recovery" in the title.

      • Regarding Reviewer 3 request to assay activity recovery at additional time points before 12hr, this was done in the cycloheximide experiment in Fig. 3A.

      • Even if we assume that the differences in the observed recovery activity in MDA-MB-231 cells (Fig. 1f) are statistically significant, which may implicate DDI2 involvement in the activity recovery, the percentage is still small, suggesting that most activity recovery is DDI2-independent.

      • We will tone down the statement "the present findings suggest that DDI2 desensitizes cells to PI by a different mechanism," replacing "suggest" with "raise a possibility."

      • We will indicate that only Bortezomib is approved for mantle cell lymphoma.

      • We will change the description of clinical dosing as suggested by Reviewer 3. We will add a reference on PK of subcutaneous bortezomib (Ref. 9), even though the review we quoted (Ref. 7) discussed subcutaneous dosing.

    1. Author Response

      Reviewer #3 (Public Review):

      Youssef et al. have used a range of markers to identify cancer stem cells (CSCs) in patients with oral cancers. CSCs were identified in lab conditions and were often linked to the invasiveness of cancers. The authors found a combination of markers convincingly liked to known biology and found cells expressing them in the invading cancers.

      The major weakness of the paper is in the technical side. There isn't enough description as to how they discriminated between CSCs inside the tumour and those invading its surroundings. Similarly, the way the information is presented it is not clear why artificial intelligence was needed to enhance the accuracy of the method linking CSCs to cancer invasion (and ultimately deadly metastasis to other organs).

      The method for applying tumour mask is displayed in Figure 2E for cohort 1 and Figure 2 figure supplement 3 for cohort 2. Briefly, in the image analysis pipeline, dense areas of EpCAM+ (cohort 1) or Vimentin+ (cohort 2) cells are merged to specify tumour/stroma regions. Thus, CSCs inside tumours (in the EpCAM dense tumour region) can be discriminated from CSCs invading the surroundings (in the Vimentin dense stromal region).

    1. Author Response

      Reviewer #1 (Pulic Review):

      The authors aimed to understand whether the superficial, retinorecipient layers of the mouse superior colliculus (sSC) participate in figure-ground segregation and object recognition. To address this question, they use a combination of optogenetic perturbations of sSC and recordings. These data are consistent with SC being causally involved in object recognition. This would be useful information for the field and likely to be cited.

      Thank you for your positive evaluation.

      However, I have several concerns regarding their conclusions.

      A significant limitation of this study is methodological. The major novelty is the effect of optogenetic silencing, because the recordings are largely correlative, but the optogenetic silencing approach lacks appropriate controls for the effects of the optogenetic excitation light. The authors acknowledge that the optogenetic light is a potential confound, but attempt to address this by shielding the fiber to eliminate light leak and strobing a blue led in the arena. The former does not account for the effects of excitation light scattering intracerebrally--during optogenetic experiments, intracerebral scattering causes the eyes to light up--and for the latter, there is no way to compare the intensity or qualia of the externally strobed LED and the intracerebral light. The proper control would be a cohort of mice lacking channelrhodopsin expression in sSC. Regardless, it is essential to acknowledge this potential confound.

      This is a good point. We have added discussion of this in lines 90-95. The proposed experiment was done in Kirchberger et al. (Sci Adv 2021, Suppl Figure 3). In mice without expression of channelrhodopsin trained on the same task as in our study, blue laser light in the cortex did not affect accuracy. Although the exact location of these fibers is different from ours, the distance from the fiber to the eye is very similar. Furthermore, in answer to this comment, we have done a new set of experiments with 4 wild type mice, in which we recorded neural activity in the sSC while delivering optogenetic light stimulation. The procedure was similar to our previous experimental animals except that they did not receive a virus injection. In these mice, we did not see any response in the superior colliculus to the laser light, but we noticed a 5% reduction in response to the visual stimuli (new Figure 1—figure supplement 3). This small reduction could be a small reduction of contrast of the visual stimulus due to the laser light hitting the retina, but given that we did not see any response to the laser alone, it is more likely to come from the known inhibiting effects of light on neural activity (e.g. through heat, see Owen et al. Nat Neurosci 2019). Because our aim was to silence sSC, this particular effect is not a strong confound for our study.

      Relatedly, as the authors note, there are GABAergic projection neurons in sSC that may be driving these effects via gain of function. This is a significant concern that has limited the widespread adoption of this approach in sSC despite its popularity in studies in cortex. Indeed, one recently published study of behavioral functions of deep SC found that activating inhibitory neurons actually caused paradoxical behavioral effects consistent with gain of function in the targeted hemisphere, due to the effects of long-range inhibitory projections on the other SC hemisphere. Given the presence of inhibitory projections in sSC, it would be preferable to use an orthogonal method for silencing and at least to thoroughly acknowledge these concerns and cite these recent studies.

      This is a valid point. When we started our study, we had some experience with inhibitory opsin (archaerhodopsin and halorhodopsin) and were not confident that we could widely inhibit the sSC reversibly, repeatedly and consistently for an extended period. Other labs have now shown this is feasible with improved inhibitory opsins, so this would now be our preferred option too. The method of silencing sSC by inhibition of GABAergic neurons, however, is still the most common optogenetic method to silence sSC for an extended period (e.g. Hu et al. Neuron 2019, Brenner et al. Neuron 2023) .

      We thank the reviewer pointing us to recently published paradoxical behavioral effects. These effects, that we found in Essig et al. (Comm. Biol. 2021) are very interesting, but are not really a concern for the interpretation of our results, partially because as the reviewer pointed out, the GABAergic neurons activated there were in the deep and intermediate layers of the SC, below the sSC that we targeted. The paradoxical effects in that manuscript were attributed to direct inhibition of the contralateral superior colliculus. In our case, we activated the inhibitory neurons bilaterally, and this interhemispheric GABAergic connectivity, if it extends to sSC, only strengthened the bilateral silencing of the sSC. However, we have now discussed the possibility of our transfection of these deeper GABAergic neurons (lines 272-278). The more general point that activating GABAergic neurons in the sSC may also cause inhibition in other structures is indeed a concern. GABAergic neurons in the sSC project to the PBG and the LGN (in particular the vLGN) (Gale & Murphy, 2014; Whyland et al., 2019; Li et al., 2023). Although the primary effect of our manipulation is silencing of the superior colliculus, including the GABAergic neurons (see our answer further below), we cannot exclude the possibility that activating these extracollicular GABAergic projections has an effect. We have edited our discussion of this and updated the references (lines 268-272). However, our measurements in anesthetized (previous submission) and in awake mice (new Figure 1—figure supplement 2) show that apart from a short period directly after the onset of the laser, also almost all putative GABAergic neurons are reduced in their response (see also our answer to the next comment).

      A minor point is that although activation of GABAergic neurons in sSC is expected to cause inhibition of neighboring neurons, I would expect channelrhodopsin-expressing GABAergic cells to show an increase in firing during optogenetic excitation. However, it seems that none of the cells plotted (assuming each point in Supplementary Fig 4D is a cell, which the legend does not specify) had such an increase. Do these extracellular recordings not detect inhibitory neurons well?

      This is indeed an intriguing observation. The data in the original figure (Supp Fig 1D) was spiking data from 15 recording sites and not from sorted units. This was mentioned in panel C, but not in the caption. For the purpose of the amount of silencing, there was no need to sort single units. Still, it is surprising to see the reduction on almost all channels. The data of Supp Fig 1D came from experiments in anesthetized mice. Prompted by a question from another reviewer, we have now redone these experiments in head-fixed awake mice. The new Figure 1—figure supplement 2 shows these results, for single- and multi-unit clusters. In response to a short laser pulse (50 ms), we find that many units significantly increase their firing rate (Figure 1—figure supplement 2A-B). However, almost all activated then reduce there firing rate and again, we see an overall reduction of responses to visual stimuli. Only one unit fires significantly more when the laser is on during the period of visual stimulation compared to when the laser is off, and the overall firing rate is strongly reduced (Figure 1—figure supplement 2C-E). It appears that optogenetically activating the inhibitory neurons in the sSC for a longer period also reduces the activity of these neurons. The effect that we are seeing might be similar to the paradoxical effects that may occur in visual cortex, where additional excitation of inhibitory neurons leads also leads to their reduced activity due to network dynamics (see e.g. Sadeh & Clopath, Nat Neurosci Rev 2021). However, the effect may also be due to a few inhibitory neurons having a strong inhibitory effect on other inhibitory neurons. This is an interesting point worthy of more investigation, but it falls out to scope of this manuscript.

      Finally, the relationship between these stimuli and objects is not entirely clear. The authors acknowledge this but it would be worthwhile to devote more attention to this point. In effect, as the authors note, the gray screen and sinuisoidal grating do not have any sharp edges on the screen, whereas each of the behaviorally relevant stimuli will create a sharp, step-like edge on the screen. Whether edge detection is truly object detection or simply a variant of more general visual detection is unclear.

      Indeed, the task can be solved by detection of texture edges, and it is not necessary to integrate the edge components into an object to successfully perform the task. A linear decoder fed with simple cell-like inputs is able to do the orientation task (Luongo et al., 2023). The same network failed to learn the phase task, but also the image of a phase-defined figure contains features that are not present in the background image, and could be solved by learning only local features. Even the texture-defined figures used in Kirchberger et al. (2021) and in earlier monkey studies (Lamme, 1995) which do not contain any sharp stimulus edges can be detected without integrating the local edges into objects and segregation the figure from the background. Several monkey studies show that late neuronal responses in V1 are enhanced for neurons with receptive fields on what we, humans, perceive as the figure. This effect has also been seen in mouse V1, even in the case where there are no local features distinguishing the figure from the background (Fig 7. in Kirchberger et al. 2021). Interfering with activity in V1 in this late phase reduces the ability to detect the figure in human (by TMS) and mouse (by optogenetics). This is suggestive that this figure-ground modulation is used in solving the task, but not a proof. To understand if mice solve the tasks by detecting a figure or by detecting specific features, we can look at generalization. Mice were previously shown to generalize to some degree for size, position and spatial phase of the figure grating patch (Schnabel et al., 2018), suggesting that the mice did not train to detect specific features at specific locations. Rats trained on a similar task had difficulty generalizing from a luminance-defined object to an orientation-defined object (De Keyser et al., 2015), as do mice (Khastkhodaei et al., 2016), but once the rats were acquainted with one set of oriented figures, they immediately generalized to other texture-orientations above chance. On a slightly different figure-detection task mice also showed generalization for different orientations once the initial task was learned (Luongo et al. 2023). This suggests that at least some generalization to object detection occurs in this task. We have added these observation to the discussion (line 301-305).

      Reviewer #2 (Public Review):

      The goal of this study is to show that the superficial superior colliculus (sSC) of mouse signals figure-ground differences defined by contrast, orientation, and phase, and that these signals are necessary for the animal to detect such figure-ground differences. By inhibiting sSC while the animals perform a figure-ground detection task, the study shows that detection performance decreases when sSC activity is suppressed during the onset of the visual stimulus. The study then intends to show that sSC neurons exhibit surround suppression based on orientation differences, and that surround suppression is stronger when the animal detects the correct location of the figure on the background.

      The major strength of this study is the use of a behavioural paradigm to test detection performance of figure-ground stimuli while manipulating neural activity in the sSC during different times after stimulus onset. This paradigm would show whether activity in the sSC is relevant for performing the task. Secondly, the study collected data to confirm previous findings: sSC neurons exhibit orientation specific surround suppression. Additionally, it is impressive that the authors were able to train mice to generalize their task performance across different stimulus categories (figure-ground differences in orientation and phase). This should be highlighted as it may inform future studies.

      Thank you for your positive evaluation. We have extended our discussion on the generalization in object detection tasks in mice.

      The study has, however, methodological and analytical weaknesses so that the stated conclusions are not supported by the presented results.

      1) Optogenetic inhibition is not limited to sSC (even expression may not be limited) About 30% of inhibitory neurons in the sSC project to other areas, e.g. ventral LGN, parabigeminal nucleus and pretectum (Whyland et al, 2019, see ref in manuscript). This means that these areas receive direct inhibition when inhibitory sSC neurons are optogenetically stimulated. This fact is mentioned in the discussion but the consequences and implications for the results are ignored. This is a major flaw of the optogenetic experiments of this study. Additionally, no evidence is given that opsin expression was limited to the superficial layers (except for one histological slice), which the authors acknowledge in line 285. Deeper layers may have other inhibitory neurons with long-range projections.

      The finding that sSC neurons show no figure-ground modulation for phase while the optogenetic manipulation has behavioural effects may be an indication for other areas being affected by the optogenetic manipulation.

      This is a valid point, also raised by reviewer 1. Although the primary effect of activating the GABAergic neurons in the sSC is a strong reduction of activity in the sSC (see also new figure S1), we cannot rule out that we also activate GABAergic neurons below the sSC and that there are some effects of activating GABAergic connections to the LGN and PBG. We have extended our discussion of this point in lines 269-277. However, as shown in new Figure 1—figure supplement 2, the effect of optogenetically activating Gad2-positive neurons appears to lead to a counter-intuitive reduction of their activity. This effect has previously been observed in cortex.

      2) Could other behavioural variables explain the results?

      a) Are there any task events other than the visual stimuli that the mice could use to make their decisions? The authors state the use of a custom made lick spout but it is not clear how this spout works, i.e. how do mechanics of the spout deliver water to the right versus the left output and could the mouse perceive these mechanics?

      We believe there were no task events besides the visual stimuli that the mice could use to make their decisions. The lick spout was Y-shaped (see Figure 1B) to facilitate the two-alternative forced choice task. Each side of the lick spout was connected to a separate water tube. The water flow in each tube was controlled using a valve. Also, each side of the lick spout was connected to its own lick detector wire. The two valves and the two detector wires were connected to an Arduino which was controlled by our MATLAB task script. The task script was coded such that, when the lick of the mouse had been on the correct side, the valve controlling the water flow on the correct side would briefly open to deliver the water reward. To summarize, the water would only flow after the mouse had licked and if the first lick had been on the correct side. Hence, the water reward did not produce additional cues. We have edited the description of the lick spout in the Methods section to make the functioning of the lick spout more clear (lines 511-513).

      b) Could the different neural responses to figure versus ground shown in Fig 2I-J and Fig 3B be explained by behaviours varying between the trial types, e.g. by early lick movements (which are conceivable even if the spout is not present), eye movements or changes in pupil-linked arousal? A behavioural difference seems even more likely to occur between hit and error/miss trials (Fig 4). If these behaviours were not measured, the possibility of behavioural modulation should be discussed.

      In the awake behaving electrophysiology experiments, the lick spout was not present until 500 ms after stimulus onset, so the mouse could not lick the spout. We did not record whisking or other face and jaw movements, hence we cannot say for sure whether the mice performed early ‘licks’ in the absence of the lick spout. We did, however, add a supplementary figure showing the licking behavior of the mice in the optogenetic interference experiments (see Figure 1—figure supplement 5). In this experiment, the lick spout was present at all times so all early licks would be recorded. Any licks before 200 ms after stimulus onset were disregarded as this would be too early for the decision to include knowledge about the stimulus. Figure 1—figure supplement 5B shows that the mice indeed only performed very few early licks as they probably knew this would not yield reward. The mice that performed the awake electrophysiology experiments were trained on the same task as these mice before introducing the lick spout delay of 500 ms. So although we cannot rule out early licks during electrophysiology, we think early licks would be an unlikely explanation for the neural response differences.

      We have added a new supplementary figure (Figure 2—figure supplement 2) showing data for eye movements and pupil dilation during the tasks. We had excluded all trials where the mice performed eye movements between 0-450 ms after stimulus onset, and indeed we saw no eye movements during the peak of the visual response (0-250 ms). Furthermore, the pupil dilation of the mice also did not change in this period.

      All in all, we view it as unlikely that the differences in neural activity in sSc were caused by either licking, eye movements or pupil-linked arousal.

      3) What is the behavioural strategy of the animals? Only licks beyond 200 ms after stimulus onset determine the choice of the animal because "mice made early random licks" from 0 to 200 ms. To better understand the behavioural strategies of the animals we need to see their behavioural data, i.e. left and right licks aligned to stimulus onset. It would be particularly interesting to see how number and latency of licks changes during optogenetic manipulation.

      Based on these suggestions, we investigated the licking behavior of the mice during the optogenetic experiments in more detail. Our new Figure 1—figure supplement 5 taught us several things:

      1) The fully trained mice hardly perform any early licks; they seem to understand that early licks cannot yield reward.

      2) The mice typically only lick one side of the lick spout during one trial. In correct trials the fluid reward is given directly after a correct lick, which causes the mouse to lick the correct side of the spout even more. However, even if the first lick is incorrect (bottom rows), the mouse generally does not lick the other (correct) side afterward. They seem to know that correct licks after an incorrect lick do not yield reward.

      3) The maximum licking rates were not significantly affected by laser onset.

      4) The latency of the first lick (reaction time) was not significantly affected by laser onset. (Please also see our response to question 2b).

      4) Data relating to misses should be included in analyses to provide a complete picture of behaviour and neural responses

      a) In the optogenetic manipulations, an increase in misses seems to dominate the decreased accuracy (please, explain when a response was counted as a miss). A separate analysis of miss trials may be more robust than of error trials and also offers a different interpretation of the data, namely that the mouse did not see the stimulus rather than perceiving the figure on the opposite side. However, if the mice reduced their lick rate in general during optogenetic stimulation, this begs the question whether their motor performance was affected by optogenetic manipulation. Can this possibility be excluded?

      Trials were counted as follows: A trial was counted as a hit when the first lick after 200 ms after stimulus onset was on the correct side. A trial was counted as an error, when the first lick after 200 ms after stimulus onset was on the incorrect side. A trial was counted as a miss, when the mouse did not lick in the window between 200 and 2000 ms after stimulus onset. We have clarified this in the methods section (line 517-526).

      Our previous text may not have been sufficiently clear but the decrease in accuracy during optogenetic trials is not dominated by an increase in missed trials. As we have now indicated explicitly in its caption, in figure 1, missed trials are excluded from the analysis. Hence, the significant effects shown in figure 1 are not driven by an increase in missed trials but rather by an increase in erroneous licks. When comparing figure 1 vs figure S3, where the missed trials are added to the analysis as if they were error trials, we can see an overall downward shift of the performances. Indeed, mice miss more trials when the laser is on. The increase in number of missed trials is lower than the increase in number of wrong choices. Furthermore, the range between the performances at early laser onset and late laser onset is still very similar. This indicates that the mice on average do not have higher miss rates when laser onset is early.

      Finally, nor maximum licking rate, nor the reaction time is affected by the laser onset (see the new figure S2)

      Related to Fig 4, it would be equally interesting to see how FGM changes during misses. Do the changes support the observations for error trials?

      We are not convinced that the neural data from missed trials can be interpreted in a simple way. Mice may have various reasons to miss a trial: they may be tired or not paying attention, they may not have seen the stimulus well, they may not feel thirsty enough, they might be distracted by some sensory input that humans might not be aware of, etc. This is why we specifically opted to not use a go-no/go task but instead opted to use a 2-alternative forced choice task.

      5) Statistical tests do not support the conclusions, are missing or inadequate

      a) In Fig 1E, accuracy is significantly affected at only 1-2 time points in each task, specifically either the 1st and 3rd or the 2nd time point. How do the authors interpret these results? If inhibition starting at the 2nd time point has no significant effects, why would it be significant when inhibition starts later (at the 3rd time)? Furthermore, given that all other starting points of laser stimulation have no significant effects, there is no reason to trust the latency of inhibition effects based on mostly insignificant data points. This analysis in its current form should be removed, including a comparison of latencies between tasks, which was not tested for significance. It may be more meaningful to analyse accuracy for each animal separately. This may reduce variability.

      We can understand that the reviewer may have concerns regarding the post-hoc analysis of Fig 1E, but we feel these concerns stem from a misinterpretation of our goal with this analysis. In Figure 1E, we use a 1-way repeated-measures ANOVA. By using this test, we ask whether the performance of the animals is affected by the laser onset. More specifically “does the performance increase or decrease with increasing laser onset?” The test is significant, so indeed the performance goes up as laser onset goes up. This indicates that the performance of the mice is affected by the inhibition of sSC. For the sake of completeness we had included the post-hoc tests for each latency in the statistics table. Indeed, some individual latencies are not significantly different to the no-laser condition. However, this does not invalidate the conclusion of the main test: a repeated measures ANOVA can only be performed on data with 3 or more groups, so the conclusion of the repeated measures ANOVA could not have been drawn from simply those laser onset(s) that is/are significantly different from the no-laser condition. The main effect of higher performance with higher latencies is significant, even if some individual comparisons are non-significant. The difference in significance of the post-hoc tests does not indicate a significant difference between the groups, but insufficient power to do six individual tests.

      We have changed the wording in the reporting of the statistics of Figure 1E to hopefully more precisely indicate the conclusions we drew from the statistics. We do not draw conclusions from the post hoc tests. We have considered removing them from the statistics table 1, but believe that some readers might be interested. We can remove them if the reviewer believes that would be better.

      b) Analyses regarding the difference in neural response to figure and ground (Fig 2I-J, Fig 3B, Fig 4B, Fig 5C) would be more convincing and informative if the differences were analysed on the level of single neurons in response to the same orientation within their RF (or at the location where the figure is presented, for edge-RF neurons). A histogram of these differences would show how many neurons are affected and how large the effect is in single neurons.

      We fully appreciate this idea, but the way we set up the behavioural task does not quite allow for this type of statistical analysis. This is because we tested all three of the tasks during single sessions (contrast/orientation/phase), and on top of that, we varied the orientations of the stimuli (0/90deg), as well as the phase of the gratings (60 different phases). This all was done with the idea that it would prevent the mice from memorizing the individual stimuli of the task. This also had the effect that only very few trials per session contained the exact same stimulus type, figure-ground condition, orientation and phase. For example, if a mouse would perform around 120 trials in a session. 25% of those were contrast-stimulus-trials, 37.5% of those were orientation-stimulus-trials and 37,5% were phase trials. If we look into 120*0.375 = 45 orientation-stimulus-trials, half of those were figure trials, half were ground trials: 22 trials each. If we split these trials up by their individual orientations, we are left with only about 11 trials per condition to analyse for figure-ground effects, each of which would probably have a different grating phase. Given the firing rate variations that the individual neurons show in awake mice, this amount of trials would not provide enough statistical power to test the significance of modulation in single neurons.

      Although we feel the study design would not allow analysis of individual neurons in response to the same orientation within their RF, we did perform an aggregated analysis on orientation selectivity. For this analysis, we included all the trials where the RF of the recorded neurons was on the background-half of the screen. We then computed the responses of each neuron to the trials where the background orientation was 0 and 90, respectively. This analysis showed that most neurons had no preference for either of the two tested orientations of the other. Only 4 out of 64 (6%) neurons showed a significant preference. We therefore believe that splitting the data by orientation preference would not be very informative.

      c) All statistical tests performed across neurons should account for dependencies due to simultaneous recordings (dependency on session) and due to recordings in the same animal (dependency on animal). This can be done in most cases by using linear mixed-effects models.

      We agree with the reviewer and have changed the analysis for figure 2I, 3B and 3E to an LME analysis (see also Table 1).

      d) There was no significant difference between model weights (Fig 3D), so the statement in line 210 (RF-edge neurons had higher weights) should be removed.

      In answer to previous we question changed the analysis for what is now Figure 3E to an LME. This shows that relative weights were significantly higher for the orientation compared to the phase task. We have adapted our conclusion accordingly (line 214-218).

      e) Fig 4B compares FGM during correct and error trials. This comparison has to be performed with the same set of neurons in correct and error trials (not the case for orientation). Again, the most compelling and informative comparison would be on the level of single neurons: response difference between figure and ground (same visual features at figure position) during hits versus errors.

      As described above, we feel the study design does not allow analysis on the level of individual neurons. The analysis in 4B was actually performed using the same set of neurons, we have removed the typo.

      f) There is no evidence that FGM for phase was different between hit and error trials as stated in line 234.

      Indeed, we had phrased this incorrectly. Since we recorded all task during single recording sessions, we have data for each task for most neurons. We were therefore able to pool the results from the different tasks, and the main d-prime difference between hit vs. error was significant. Post-hoc tests showed that this is mainly driven by the difference in the orientation task. We have edited the wording to be more accurate (line 239-242).

      g) It is not clear why and how the mixed linear effects model was used pooling data across tasks (Fig 4C and Fig 5D). Different neurons were recorded for each task, so the sample points (neurons) are not affected by both task effects (orientation and phase). Each task should be analysed separately.

      Since we recorded all three task versions during single behavioral sessions, we have data for multiple tasks from each neuron. This is why the linear mixed effects model pools the data across the tasks. We have added a note in the main text for clarity (line 238-242)

      h) Bonferroni correction in Fig 1E should correct multiple comparisons across time points, not across tasks (see Table 1).

      The multiple time points all belong to the same one-way repeated measures ANOVA, so there’s no need to correct the post-hoc analysis. We did run the ANOVA for three tasks, which is why we corrected the p-values of each task. We think that this is best way, but can also present uncorrected p-values if needed.

      i) What is the reason to perform some tests one-tailed, others two-tailed?

      Following the reviewer comments, we changed some analyses to LME models. The remaining tests that require definition of the tails are all two-tailed.

      6) The results relating to "multisensory neurons" are ambiguous regarding their interpretation (if significant at all) and seem unrelated to the goal of the study. It is particularly likely that behaviours like licking or other movements cause the response differences between figure and ground.

      We agree with the reviewer that finding these neurons was not the aim of the study. We did not include enough type of tests in our paradigm to fully determine the properties of these neurons. Furthermore, we note that we have recorded too few of these neurons to draw strong conclusions. The data shown in new Figure 2—figure supplement 1H suggest that the responses of these neurons or not as strongly time-locked to the first lick as they are to the trial onset. We presented the behavior of these neurons in our manuscript, because, whatever their exact behavior, they are clearly distinct from the visually responsive cells that show a short latency response to the visual stimulus (Figure 2—figure supplement 1). We still feel that it is useful for the reader to know there are cells in the sSC that show such a distinct behavior, but we have moved the figure and the accompanying text to a figure supplement to avoid distraction from the main message of the manuscript.

      7) What depth were neurons recorded from (Fig 3 and 4)?

      The depths of the recorded visually responsive neurons is now shown in Figure 2—figure supplement 1E.

      Reviewer #3 (Public Review):

      The authors used optogenetic manipulations and electrophysiology recordings to study a causal role and the coding of superficial part of the mouse Superior Colliculus (SCs) during figure detection tasks.

      Authors previously reported that figure-ground perception relies on V1 activity (Kirchberger et al. 2021) and pointed out that silencing of V1 reduced the accuracy of the mice but still the performance was above the chance level. Therefore, visual information necessary in this task, could be processed via alternative pathways. In this study, authors investigated specifically SCs and used similar approach and analysis as in Kirchberger et al. 2021. Optogenetic silencing of the activity of visual neurons in SCs impaired the accuracy in all 3 versions of the figure detection task: contrast, orientation, and phase. Electrophysiology recordings revealed that SCs neurons are figure-ground modulated, but only by contrast- and orientation-based figures. They show SCs visually responsive neurons reflect behavioral performance in orientation-based figure task. The authors conclusion is that SCs is involved in figure detection task.

      Overall, this study provides evidence that mouse SCs is involved in a figure detection task, and codes for task-related events. Authors heroically compared results between 3 different versions of the figure-based detection task. The logic of the study flows through the manuscript and authors prepared a detailed description of methods.

      Thank you for your positive comments.

      However, my main concern is with 1) the amount of data used to make the key arguments, and 2) the interpretation of results. The key findings of this study (figure-ground modulations in SCs) could be a result of the visual cortical feedback in SCs during the task, or pupil diameter changes. Unfortunately, the authors did not rule out these possibilities.

      Still, this study can be relevant to a general neuroscience audience, and results could be more convincing if the authors could clarify:

      1) Optogenetic inactivation

      a) The impact of laser stimulation on neural activity is not satisfactory (Supplementary Figure 1). The method seems to be insufficient to fully salience neurons. Electrophysiology control recordings of inactivation are performed in anesthetized mice, which is not a fair estimation of the effect in awake state. Therefore, it rises a major question how effective the inactivation is during the task?

      We have conducted new control experiments for the impact of laser stimulation on neural activity, now in awake animals (see Figure 1—figure supplement 2). The reviewer was right to ask for these experiments. We had not expected much difference in the effect of silencing in the awake and anesthetized state. To minimize the animal discomfort, we had therefore done these control experiments in terminal experiments under anesthesia. However, these new set of experiments showed that the impact of laser stimulation was much stronger in awake mice than anesthetized mice. We see an average spike rate reduction of 90% when the laser is on. Although it is not full silencing, we think this reduction is sufficient to draw some conclusions on the role of sSC in the behavioral tasks.

      b) Could authors provide more details if laser stimulation has an effect only on visual, or all sampled units? How many of units were recorded, and how many show positive and negative laser modulation?

      We defined visually responsive units as units that have an evoked rate of at least 2 spikes/s. In the new figure 1—figure supplement 2D from the new set of control experiments, we plotted, for every unit, the mean rate in laser ON and OFF trials - also including the non-visually responsive units. It is evident that the spiking activity of most units – including those that were not classified as ‘visual’ – is reduced in the laser ON compared to OFF trials. We observed 1 unit that showed strong positive laser modulation over the entire duration (figure 1—figure supplement 1D). Many units were activated by shorter laser pulses directly after laser onset (Figure 1—figure supplement 2A-B), but these also reduced in activity as the stimulation continued.

      c) How local the inactivation effect is? Where was the silicon probe placed in relation to AAV expression and optical fiber position?

      The AAV was injected at 0.3 mm anterior and 0.5 mm lateral to the lambda cranial landmark. With this injection location we aimed to focus the expression at low/nasal receptive fields, in front of the mouse, because that is where the visual stimulation would take place. From there, the expression did spread laterally across sSC (see Figure 1C). The silicon probe was placed roughly in the same location as the viral injection. The optical fiber was positioned such that the tip would shine on the surface of the sSC at a slight angle, from a lateral distance of ~200 µm from the silicon probe. We have edited the methods section to make this more clear (line 583-585). This procedure allowed us to record only relatively local effects of the inactivation. Although we did not record neural activity across the entirety of sSC, we did record from multiple electrode penetrations per mouse, each time slightly varying the recording location with up to ~300µm and ~500µm in the anterior and lateral directions, respectively. In these variations of recording location the optogenetic effect was always present (see new Figure 1—figure supplement 2G). Moreover, the suppressive effect of optogenetic stimulation of GAD2+ neurons was observed across the entire depth of the sSC (new Figure 1—figure supplement 2H).

      2) Number of sessions and units

      a) The inactivation effect on behavior (Figure 1E) during phase-task has a significantly larger effect at 66ms after stimulus onset. How can authors explain this? Could this result be biased by one animal/session, or low number of trials for this condition? There is no information about number of trials, or sessions from individual animals. Adding a single example of animal's performance, and sessions for individual mice could clarify results in Figure 1.

      The criterium for each mouse to be included in the analysis for one of the tasks was to have 100 trials where optogenetics were used (aggregated across the latencies). So at minimum, we would have about 100 trials/6 latencies = 17 trials per latency per mouse. For most mice though, the number of trials per latency was closer to about 40. We have added more information about this to the methods section (lines 567-570). Despite these inclusion criteria, the 66 ms effect is present for multiple mice (we have now added data visualizations for the individual mice in Figure 1—figure supplement 4). To address the reviewer’s concerns, we can only speculate as to why this happens. It might be random variation. A more speculative conclusion would be that perhaps this 66ms laser onset is particularly disturbing to the visual processing and/or decision-making of the mouse. But we feel that we do not have enough evidence to conclude this.

      b) Figure 2H shows an example of neuron with an effect in the figure detection task based on phase difference, but Figure 2I/J (population response) shows there is no effect. Overall, the conclusion is that SCs neurons are not modulated by a phase-defined object. It seems that number of mice and hence units are smaller in phase-detection task comparing to two other tasks. How many of single units are modulated in each version of the task? How big is the FGM effect on single neuron response (could authors provide values in spikes/s)? One task is dropped from analysis which it is one of the main points of the paper: to compare responses across different versions of the figure detection task in SCs. But Figures 3-5 only focuses on two tasks, because there is not enough of data for figure-based contrast task.

      We have updated Figure 2H to show spikes/s of the example single neuron response. For the population responses, we explicitly normalized the individual neurons because they all have different baseline and peak firing rates. This normalization was important for the decoding, so we decided to print the data such that the data from Figures 2I and 3B went into the decoding as printed. If we look at the non-normalized values, the maximum amplitude of the average FGM effect is 22.3, 5.9 and 2.9 sp/s respectively for the three tasks (for neurons with RF on stimulus center).

      We have furthermore updated the FGM analysis such that the clustered statistic is now based on linear mixed effects statistics instead of T-test statistics. The results based on this new analysis are largely the same (see statistics table T1). We checked the significance of individual neurons in the time window where the grouped LME analysis was significant. For the phase task (n.s. in grouped analysis), we used the significant window from the orientation task. For this analysis, we want to stress that the number of trials for each version of the task for each individual neurons is quite limited as we recorded all three of the tasks during each recording session. Individually, 7/23 neurons were significant for the contrast task, 1/49 were significant for the orientation task, 0/32 were significant for the phase task (after Bonferroni-holm correction).

      To address the final part of this comment on dropping the contrast task: we indeed have recorded too few data points to draw conclusions on decoding (Fig. 3) and discriminability (Fig. 4) for the contrast task. However, we do not see the contrast detection task as the main point of the paper. As earlier work had already shown involvement of the sSC in visually-evoked behaviours based on objects that are clearly isolated from the background, the main focus in this work is to show involvement of sSC in complex object detection, where the visual contrast and luminance is the same across object and background.

      3) Figure-ground modulation in SCs

      a) How is neural activity correlated with pupil size, movement (eg. whisking, or face), or jaw movement (preparation to lick)? Can activity of FGM neurons in SCs be explained by these behavioral variables?

      We did not record whisking or other face and jaw movements. We did record the eye of the mice, so have included a new Figure 2—figure supplement 2 which shows eye position and pupil dilation during the task. For the analysis in the originally submitted paper, trials with substantial eye movement (Z-score of eye speed > 2.5) between 0 and 450 ms had already been removed from the analysis. This way, we could exclude effects of eye movements (but not pupil dilation) on the visual responses in sSC. The additional figures and analyses have been done using the same inclusion criteria. Indeed, in the included trials mice did not move their eyes during the peak of the visual response (0-250 ms). The pupil dilation also did not change in this period.

      b) Could authors describe in more detail how they measure a pupil position and diameter, by showing raw data, pupil size aligned to task events?

      We have added a new Figure 2—figure supplement 2 to show the pupil position and diameter aligned to task onset.

      c) How does pupil diameter change between tasks? Small pupil changes can affect responses of visual neurons, and this could be an explanation of FGM effect in SCs. Can authors rule out this possibility, by for example showing pupil size and changes in position at stimulus onset in different tasks?

      Our new Figure 2—figure supplement 2B shows that pupil dilation changes and differences in pupil dilation between figure/ground trials do occur, but only after ~300 ms, so after the peak of the visual response and after the FGM is present in sSC.

      d) Authors in discussion mentioned that the modulation of V1 could be transferred to SCs through the direct projection. Moreover, animals perform above chance in both inactivation experiments (V1 and SC), which could be also an effect of geniculate projections to HVAs (eg. Sincich et al. 2004). Could authors discuss different possibilities?

      The direct geniculate projection to HVAs is an interesting possibility that we had not considered yet. The dLGN in the mouse projects (apart from V1) mostly to the medial HVAs (Bienkowski et al. 2018). The lateral extrastriate regions receive only very sparse input from the dLGN. The medial HVAs, however, could be silenced without drop in performance in a simple visual detection task (Goldback et al., 2020). Therefore, it does not seem likely that this geniculate to HVAs projections would be important in the figure detection task.

      4) Interpretation of multisensory neurons is not clear. In Figure 5B, there is an example of neuron with two peaks of response. Authors speculate about the activity (pre-motor) but there is lack of clear measurement showing "multisensory" response of these neurons. Could these responses be related to the movement of the lick spout towards the mouth of the mouse (500 ms after the presentation of the stimulus)? Moreover, the number of "multisensory" units is very low (5 units, and 8 units).

      We have not done definitive test to show what these putative multisensory neurons exactly respond to. Because of their response was after the appearance of the lick and time locking to the trial start, rather than to the licking response, we think that is likely that these neurons responded to the appearance of the spout. There might have been visual, auditory, vibrational or touch clues to which these neurons respond. We believe it is interesting for the reader to know that there is class of neurons in the sSC that did not show a visual stimulus but was time locked to the trial. This was the reason that we had included this figure in the manuscript. However, given the reviewers comments we have decided to move the figure and accompanying text to a figure supplement (Figure 2—figure supplement 1) in order to not distract from the main message of the manuscript.

    1. Author Response

      Joint Public Review:

      1) For the in vitro work, only one cell line is used in this article: HPAEpiC cells, an immortalized human cell line derived from alveolar epithelial type II cells. This limits the generalizability of the results obtained in this study, as SARS-CoV-2 is known to infect several kinds of cells.

      We appreciate the concerns of the reviewing editor. To test whether our findings were applicable to other cells, we performed similar experiments in human hepatoma cells (Huh-7) and renal tubular cells (HK-2), which are highly susceptible to SARS-CoV-2 (Yeung et al., 2021). We found that infection by SARS-CoV-2 upregulated the protein levels of ACE2, while colchicine treatment significantly inhibited the expression of ACE2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2A-D). In addition, we found that colchicine treatment also reduced the viral load of SARS-CoV-2 in HK-2 cells and Huh-7 cells (Revised Figure 3-figure supplement 2E and F).

      2) From the results of two separate experiments (colchicine leading to reduced ACE2-expression in HPAEpiC cells & colchicine leading to reduced SARS-CoV-2 replication in HPAEpiC cells), the authors infer that inhibition of ACE2 expression by colchicine suppresses SARS-CoV-2 infection. However, their experiments do not explicitly prove this hypothesis and do not give weight to the importance of this reduced ACE2 expression in the colchicine antiviral effect they observed, as other mechanisms may play a (bigger) role in producing this effect.

      It has been well-established that the infection of SARS-CoV-2 and the Spike-RBD binding are dependent on ACE2 expression in different cell lines. ACE2 knockdown dramatically reduces SARS-CoV-2 infection in Caco2 cells (Shen et al., 2022), Spike-RBD binding, and SARS-CoV-2 replication in Calu-3 cells (Samelson et al., 2022). In contrast, overexpression of ACE2 greatly enhances SARS-CoV-2 virus infection in both A549 and H1299 cells (Chen et al., 2021). Meanwhile, two recent studies have demonstrated that androgen receptor positively regulates the expression of ACE2 at a transcriptional level (Qiao et al., 2021; Samuel et al., 2020). Importantly, inhibition of ACE2 expression by reducing the AR signaling attenuates SARS-CoV-2 infectivity (Qiao et al., 2021). A very recent study has demonstrated that ursodeoxycholic acid (UDCA), an inhibitor of the farnesoid X receptor (FXR), reduces ACE2 expression in human lung, intestinal, and liver organoids, thereby inhibiting SARS-CoV-2 infection (Brevini et al., 2022). These results clearly demonstrate that ACE2 expression levels determine the efficiency of SARS-CoV-2 infection to host cells.

      3) The authors refer to colchicine as a drug leading to mortality benefit when used as treatment for COVID-19 (line 101-105). However, whether colchicine is beneficial in COVID-19 is unclear. For instance, the randomized controlled trial by the RECOVERY Collaborative Group (Lancet Respir Med 2021), which included more than 11,000 patients, did not find benefit from colchicine in patients admitted to hospital with COVID-19. The authors refer to the review of Drosos et al to infer benefit of colchicine in COVID-19, however this review ignores the numerous trials contradicting this (as also stated in a letter from Finsterer in response to this review). The meta-analysis by Elshafei to which the authors refer was published before the largest RCT by the RECOVERY Group was published.

      We agree with the assessment made by the reviewing editor. Our goal is to discover a new mechanism of regulating ACE2 expression. Using colchicine, we have- identified that SP1 is a crucial transcription factor that regulates ACE2 expression. In response to the reviewer’s comments, we added the sentences “This study has several limitations. Firstly, although SP1 was identified as a pivotal transcription factor in modulating ACE2 expression via the action of colchicine and MithA, neither of these compounds currently qualify as a candidate for the treatment of COVID-19.…Additionally, the efficacy of colchicine as a treatment for COVID-19 remains inconclusive. While some studies suggest benefits (Chiu et al., 2021; Drosos et al., 2022; Elshafei et al., 2021), others indicate negligible impact on mortality or disease progression (Group, 2021; Mikolajewska et al., 2021).” in Discussion of revised manuscript (Lines 329-342).

      4) The authors did not let a pathologist blinded to the infection/treatment state of the animals score the samples obtained in the animal experiments, which could have introduced bias in these results.

      We appreciate the concerns of the reviewing editor. Actually, histological observations were made by one of authors, Dr. Li-Qiong Wang, who is a pathologist, blinded to group identity. In response to the reviewer’s suggestion, we have now added a sentence “Tissue sections were evaluated by a trained pathologist (L.-Q. W.) blinded to group identity” in the section of Material and Methods (Lines 516 and 517).

    1. Author Response

      We appreciate the insightful comments from three reviewers on our manuscript. These comments help us improve the clarity of this manuscript. We will revise our manuscript comprehensively in subsequent revision, and enclose a detailed response to each of these comments. In this public reply, we focus on (a) clarifying the theoretical motivation and implication of the present study, and (b) discussing the implications of our LLM study. Besides, we provide a brief justification regarding some methodological concerns shared by the reviewers.

      1) Theoretical rationale and implication

      As we stated in the manuscript, the present study tested whether body size serves as a reference for locomotion and object manipulation, or alternatively, plays a pivotal role in shaping the representation of objects as suggested by Protagoras. Behind this question is the long-lasting debate regarding the representation versus direct perception of affordance.

      One outstanding theme shared by many embodied theories of cognition is the replacement hypothesis (e.g., Van Gelder, 1998). This hypothesis challenges the necessity of representation in the sense of computationalism cognitive theories (e.g., Fodor, 1975), which implies discretizing/categorizing inputs and then subjecting them to certain abstraction or symbolization so as to create discrete stand-ins for the input (e.g., representations/states). In this sense, our theoretical motivation can be restated explicitly as to test the ‘representationalization’ of affordance. That is, we tested whether object affordance would simply covary with its continuous constraints such as object size, in line with the representation-free view, or, whether affordance would be ‘representationalized’, in line with the representation-based view, under the constrain of body size. Such representationalization would generate categorization between the affordable (the objects) and those beyond affordance (the environment).

      Debates regarding the replacement hypothesis often turn into wrestles on the definition of representation (Shapiro, 2019). The present study tried to avoid this pitfall but examined where the embodied and computational theories make opposite hypotheses: discontinuity. Specifically, we considered two computationalism propositions about representation: (a) representations entail discretization of continuous input, and (b) the product of such discretization (representations) is supramodally accessible (that is, transcending sensorimotor processes). These claims are opposite to the prediction based on the idea of direct perception and other representation-free embodied theories.

      Thus, we tested whether, for continuous action-related physical features (such as object size relative to the agents), affordance perception introduces discontinuity and qualitative dissociation, i.e., to allow the sensorimotor input to be assigned into discrete states/kinds, as representations envisioned by computationalists. Alternatively, does the activity directly mirror the input, free from discretization/categorization/abstraction, as proposed by the replacement hypothesis that organisms do not need to re-present the world as they are always in contact with the world in a continuous way?

      All the experiment settings and analyses in the present study were organized around this motivation, following a progressive logic chain.

      First, we tested the discretization hypothesis, that is, whether affordance leads to discontinuity in perception. Here, the discontinuity in affordance perception would be in line with the representation-based view instead of the representation-free proposals. Second, to ensure that the observed discontinuity can be attributed to the discretization of sensorimotor input involved in human-object interaction rather than amodal sources, such as the discrete abstract concepts of the objects (independent from agent motor capability), we tested the embodied nature of this discontinuity through the body imagination experiment. If there is discontinuity in representing embodied information, this discontinuity should be locked to the motor capacity (constrained by the physical constitution such as body size) of the agent, rather than reflecting independent categorization of the absolute size of the objects. Finally, we probed the supramodality of this embodied discontinuity: whether this discontinuity is accessible beyond the sensorimotor domain. To do this, we leveraged the recent advance in AI and tested whether the discretization observed in affordance perception is supramodally accessible to disembodied agents which lack access to sensorimotor input but only have access to the linguistic materials built upon discretized representations, such as large language models (LLM).

      In this way, the experiments in the present study collectively contributed to the debate on the replacement theme of the embodiment of cognition, which serves as one of the three key themes of embodied theories of cognition (Shapiro, 2019). By addressing this theme, we hope to shed light on the nature of representation in, and resulting from, the vision-for-action processing. Our finding regarding discontinuity suggested that sensorimotor input undergoes discretization implied in the computationalism idea of representation. Further, not contradictory to the claims of the embodied theories, these representations do shape processes out of the sensorimotor domain, but after discretization.

      2) Implication in the development of LLM-based agents

      The finding that affordance was representationalized may have profound implications for the development of LLM-based agents. Traditional robots and non-LLM-based agents require implementation-level action instruction, acting as a tool for human beings to achieve desired results. In contrast, LLM-based agents (for a review, see Wang et al., 2023), such as Auto-GPT and BabyAGI, are able to autonomously perform tasks and achieve desired results based on LLMs’ planning ability. In this sense, LLM-based agents show a primary ability to interact on their own with the world. Generative agents, for instance, the agents in Smallville (Park et al., 2023), are a particularly applauded recent advantage in the school of LLM-based agents, which show even larger potentials in this aspect. Drawing on generative models to simulate human behaviors, these agents can formulate their own memories and goals, generate new environment-dependent behaviors, and interact convincingly with humans and other agents and their environments in the course. This brings new possibilities in resolving the long-lasting challenge in artificial general intelligence (AGI) development, that is, to bestow AI with human-level ability in agent-environment interactions. However, it is worth noting that the present investigation in LLM-based agents is still largely confined to virtual environments. This leaves an open question as to how to equip these agents with the ability of agent-environment physical interaction. Especially, according to embodied theories of cognition, sensorimotor interactions with the environment provide unique knowledge upon which various cognitive domains are built. From this point of view, building agents with human-level ability in agent-environment physical interactions might provide an unreplaceable missing piece for AGI.

      By probing the representation of action possibilities (affordances) provided by the environment to the agent (or the absence of them), the present study provided a clue in achieving such ability by illustrating the representationalization of affordance and the supramodality of these representations. For instance, the finding of supramodality may alleviate the doubts about the physical interaction ability of LLM-based agents comparable to biological agents. Specifically, LLM-based agents can leverage the affordance representation distilled into language to interact with the physical world. Indeed, by clarifying and aligning such representation with the physical constitutes of LLM-based agents, and even by explicitly constructing an agent-specific object space, we may facilitate the sensorimotor interactions of LLM-based agents so as to achieve animal-level interaction ability with the world. This in turn may provide new instances for embodied theories.

      3) Clarification on incomplete evidence

      In response to the methodological and validity concerns of the reviewers, we will provide a point-by-point detailed response to reviewers enclosed with the revised manuscript. Here, we reply to the most prominent concerns.

      Reviewers were concerned about the statistical power of both the body imagination experiment and the fMRI experiment. Regarding the number of participants in the imagination study, we would like to clarify that we did not remove 80% of the participants. Actually, a separate sample of participants was recruited in the body imagination experiment. The sample size for the body imagination experiment (100 participants) was indeed smaller than that recruited for the first experiment (528 participants). This is because the first experiment was set for exploratory purposes, and was designed to be over-powered.

      Admittedly, the fMRI experiment recruited a small sample (12 participants), which might lead to low power in estimating the affordance effect. In revision, we will acknowledge this issue explicitly. Having said this, note that the null hypothesis of this fMRI study is the lack of two-way interaction between object size and object-action congruency, which was rejected by the significant interaction. That is, the interpretation of the present study did not rely on accepting any null effect. In addition, the fMRI experiment provided convergent evidence for the affordance discontinuity at the neural level. We showed that behind the behavioral discontinuity in action judgement, neural activity was qualitatively different between objects within the affordance boundary and those beyond, which reinforces our statement that objects were discretized along the continuous size axis into two broad categories.

      Reviewers also commented that more objects and actions should be included. We agree, and in revision, we will advocate future studies with more objects and more actions to comprehensively portray discontinuity. The present set of objects was designated to cover a relatively large range of object sizes, ranging from 14 cm to 7,618 cm to cover most size categories studied in Konkle and Oliva's (2011) work. In addition, the actions were selected to cover daily interactions between human and objects or environments from single-point movements (e.g., hand, foot) to whole-body movements (e.g., lying, standing) referencing the kinetics human action video dataset (Kay et al., 2017). Thus, this set of selected objects and actions is sufficient to test the discontinuity.

      References

      Fodor, J. A. (1975). The Language of Thought (Vol. 5). Harvard University Press.

      Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.

      Shapiro, L. (2019). Embodied Cognition. Routledge.

      Van Gelder, T. (1998). The dynamical hypothesis in cognitive science. Behavioral and Brain Sciences, 21(5), 615-628.

      Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., ... & Wen, J. R. (2023). A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.

    1. Author Response

      First of all, we would like to thank you for the opportunity to get the three valuable sets of comments on our work from the reviewers and the important summary from the Chief Editor. If we understand correctly, at this moment, we are expected to check for any factual errors, and our response at this stage will affect the choice of which reviewer’s comment will be published as a part of the reviewed Preprint. If so, we want to comment on some of the reviewer's points (Part A). These are not factual errors but more misunderstandings that need to be corrected. Furthermore, it depends on your decision whether it will be a part of the response or not. In Part B, we will address the reviewer's comments.

      Part A:

      1) Reviewers #1 and #3 missed our originally already reported PNAs dynamics based on live-cell imaging (mainly Reviewer #3 stressed that the dynamic we present is extrapolated from fixed imaging). We previously published the detailed dynamics of PNAs as detected by live-cell imaging (Imrichova, Aging 2019, doi: 10.18632/aging.102248. Epub 2019 Sep 7). It seems that we have not sufficiently highlighted this important aspect in the present eLife manuscript, despite in the Introduction part, we have described the dynamic transitions between the individual PNAs types/stages, yet without explicitly emphasizing that such dynamic insights were deduced from our live-cell imaging experiments.

      2) Reviewer#2 asked us to reconcile the different phenotypes after RNAi of TOP2A (KD induces PNAs) and TOP2B (KD does not induce PNAs), vis a vis the fact that the TOP2B-targeting drug -doxorubicin is a strong inducer of PNAs formation. We would like to stress that doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (at low concentration) or inhibit (at high concentration) all subtypes of topoisomerase 2. In other words, doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes, which, on the other hand, can manifest under conditions when only a specific one member is depleted genetically. We have further discussed this interesting issue in the discussion presented in our manuscript, and we believe there is no discrepancy, due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B in preventing PNAs.

      3) We are aware that the biological significance of the interaction of PML with nucleolus has not been fully solved yet. At this moment, we can conclude that PNAs recognize and sequester the damaged/aberrant rDNA from active nucleolus. This novel sorting mechanism might be necessary for maintaining the integrity of the repetitive rDNA loci that might otherwise be altered or lost during complex recombinational rDNA repair. Importantly, we also identified substances (mostly chemotherapeutics) that cause rDNA damage. Given that PML is a multifaceted protein involved in diverse processes; PML depletion might affect several stress-related processes. The rDNA quality/quantity analysis is also highly challenging because of the high number of rDNA copies (200-400). As preparing such an experimental model/s is difficult and time-consuming, addressing this issue in more detail will be a part of our follow-up work. Nevertheless, we will perform the bulk of the experiments recommended by the reviewers, to strengthen the conclusions of this manuscript, as follows: A) We will explore whether the PNAs formation is linked to some specific cell cycle phase; B) To strengthen the experiments with inhibition of NHEJ (DNA PKi) and HR (B02i), we will perform the RNA interference or use some other inhibitor/s operating through a distinct mechanism yet targeting the same repair process; C) We will analyze the recovery from I-PpoI treatment and assess cell proliferation, ability to form colonies, and the presence of senescent cells.

      Part 2

      Reviewer #1 (Public Review):

      Summary:

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways.

      Strengths:

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51-mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure.

      Weaknesses:

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      At this moment, we cannot mechanistically fully elucidate the biological significance of this peculiar process. However, our data shows that the dynamic interaction of PML with nucleolus can sequester damaged rDNA from reactivating nucleolus. We propose that in this way, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on 5 chromosomes are repetitive. Thus, this novel sorting mechanism might help sustain repetitive rDNA loci integrity.

      Reviewer #2 (Public Review):

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms.

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease I-Ppol induced DSB at a defined location in rDNA and led to PNAs.

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study.

      1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B?

      1) We thank the reviewer for this comment and below explain why there is no discrepancy in the observed phenotypes. Doxorubicin is not a specific poison of TOP2B (e.g., Atwal 2019; DOI: 10.1124/mol.119.117259). It can poison (stabilize ternary complex at low concentration) or inhibit (e.g., defects in decatenation at high concentration) all subtypes of topoisomerase 2. It intercalates DNA (alteration of DNA torsion; histone eviction) and elevates oxidative stress. Therefore, the observed effect of doxorubicin reflects its broader impact, also beyond inhibition of Top2B: as doxorubicin targets a wider spectrum of type 2 topoisomerases and hence can limit any potential redundant roles of the individual subtypes (which on the other hand can manifest under conditions when only one specific member is depleted genetically), thereby causing a robust induction of PNAs. We have further discussed this issue in the Discussion section of our manuscript, and we believe there is no discrepancy, in the observed phenotypes due to the wider impact of doxorubicin and an apparently more dominant role of TOP2A than TOP2B (both of which are impacted to some extent by doxorubicin) in preventing PNAs.

      2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach.

      We are grateful for this suggestion and will perform the recommended experiments the outcome of which will indeed help to exclude the possible off-target effects of B02 and NU-7441. We are now collecting/testing the necessary tools and will carry out these analyses proposed by the reviewer.

      3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We are aware of the relevant literature on ATM, and appreciate this question from the reviewer. During the revision of this manuscript, we will therefore address the role of ATM signaling in the phenomena that we report here. As ATM signaling is essential for the repression of pre-rRNA synthesis and the compaction of rDNA into the nucleolar caps in response to rDNA damage, we will complement this knowledge by testing to what extent might ATM inhibition affect the induction of PNAs/PML-NDS in our model and experimental settings.

      Reviewer #3 (Public Review):

      Summary:

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited.

      Strengths:

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures.

      Weaknesses:

      The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics.

      We believe this comment reflects a misunderstanding, for the following reason: We fully agree with the reviewer that live-cell imaging is critical to properly capture the dynamics of the PNAs formation and evolution, and apologize for not sufficiently highlighting that this was already presented in our previous study in which we described the existence and dynamics of PNAs over time, based on the live cell imaging that the reviewer correctly regards as important. In Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7), we used live-cell imaging to describe the dynamics of forming PNAs and the transition between individual types, and we referred to this work in the Introduction section of our present manuscript. By those experiments, including the live-cell imaging, we showed that after the recovery of RNAPI transcription, which usually follows the washout (removal) of the DNA-damaging agents, the funnel-like PNAs are transformed into PML-NDS. These newly emerging PNAs (PML-NDS) are placed next to the reactivated nucleolus. To document this, we paste below the relevant part of the Introduction text that was included in our submitted manuscript (see below in italics). Nevertheless, we did not emphasize that the transition between individual types of PNAs was obtained using live-cell imaging of cells ectopically expressing PML-EGFP and B23-RFP. In the revised manuscript, we will include this critical information and will complement this by a scheme explaining the dynamics of PNAs transitions.

      Copied text from our manuscript, relevant to this issue: Doxorubicin, a topoisomerase inhibitor and one of the PNAs inducers, provokes a dynamic interaction of PML with the nucleolus, where the different phases linked to RNAPI inhibition can be discriminated into four basic structural subtypes of PNAs termed according to the 3D structures obtained by super-resolution microscopy as PML 'bowls', PML 'funnels', PML 'balloons' and PML nucleolus-derived structures (PML-NDS; (36)). The doxorubicin-induced inhibition of RNAPI leads to a nucleolar cap formation around which diffuse PML accumulates to form the PML bowl. Note that this event is rare as a minority of nucleolar caps are enveloped by PML (36). As the RNAPI inhibition continues, PML bowls protrude into PML funnels or transform into PML balloons wrapping the whole nucleolus. When the stress is relieved and RNAPI resumes activity, a PML funnel transforms into distinct compartments placed next to the non-segregated (i.e., reactivated) nucleoli, PML nucleolus-derived structures (PML-NDS). PML-NDSs contain nucleolar material, rDNA, and markers of DNA DSBs (36,37).

      Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division.

      We agree this is an important point. In a complementary setting we previously published (Imrichova et al., doi: 10.18632/aging.102248. Epub 2019 Sep 7) that exposure of RPE-1 hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. Thus, most of such cells will not enter the cell cycle again. Regarding the I-PpoI-based model, we indeed did not show in the present manuscript how I-PpoI activation (rDNA damage) affects the cell cycle. In our preliminary experiments that address this issue, we saw that only about 1–3% of cells can recover from the stress and form colonies in a colony-forming assay. We will further repeat and corroborate these preliminary data and include these results in the revised manuscript, together with β-galactosidase staining to demonstrate the presence of senescent cells.

      Furthermore, as suggested by this reviewer, we will assess the cell cycle phase/position of the cells in our experiments, to find out whether the cell cycle phase affects/correlates with the PNAs formation.

      The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.

      This is partly explained by our response to Reviewer no 1, related to our previous live-cell imaging analyses. The 'bowl' emerges first and can be transformed into a 'funnel' or 'balloon'. All these PML structures are in contact with the nucleolar cap (the RNAPI is inhibited). Upon reactivation of RNAPI, the funnel can transform into the PML-NDS. At this moment, we cannot conclude to which precise process the individual structure is linked. However, we already know (Hornofova et al., DOI: 10.1016/j.dnarep.2022.103319) that the funnels colocalize with the highest portion of rDNA, which may reflect some process of concentration/clustering of rDNA. This observation is supported by results presented in this manuscript, which show that individual acrocentric chromosomes (NORs) also accumulate in one funnel. To summarize, the formation of the bowl reflects the aberration in rDNA. The funnel can accumulate rDNA and NORs in one site. The transition between the funnel and PML-NDS mirrors the changes after the reactivation of RNAPI and facilitates the sequestration of damaged rDNA/NORs outside of the active nucleolus. As the processes linked to the individual PNA are not solved yet, we will at least address this issue in a discussion.

      An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea, although demanding and somewhat outside the focused scope of the present study. Our follow-up work will focus on the localization of individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI. In the context of those studies, we also plan to analyze rDNA 3D architecture.

      Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with this possibility and in response, we will perform a series of cell cycle analysis experiments to address this issue, during the revision phase of this manuscript. We will analyze whether I-Ppol-induced PNAs are linked to some cell cycle phase(s).

      Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.

      We will follow this recommendation by the reviewer. In ALT, PML is essential for clustering several (damaged) telomeres into APB. In PML-deficient cells, there is not only a defect in the formation of APB, but also the ALT telomeric DNA synthesis in G2 cells is blocked. As we already mentioned, funnel-like PNAs can accumulate several NORs. Thus, the recombination process between NORs might be facilitated. We will highlight this link and its relevance for cancer in our revised manuscript, thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their insightful comments, suggestions, and criticism. In the updated version of the manuscript, all these will be properly reflected. Here we briefly address the main points raised:

      Reviewer #1:

      1.1) Patient selection and tumor area selection are crucial for this study but not very carefully defined. Why are some core and others not? Figure referral is an issue here (sup figure 6 where all core and non-core samples are supposed to be according to the legend of Fig 4 is likely sup fig 7 but this is then a complete copy paste of Figure 4). In the methods it is stated that the core samples are based on limited contamination of additional morphotypes (<20%) but Fig 4 suggests that all tumours listed have multiple morphotypes.

      The tissue samples were obtained from a hospital cohort of patients with stage II-IV colorectal cancer (at diagnostic time), with no particular selection criteria imposed, as this was an exploratory study.

      Tumor regions were marked for macro-dissection by an experienced pathologist following the standard practice for whole-tumor transcriptomics studies. The subregions (morphological regions) were marked by the same experienced pathologist for macro-dissection (in an adjacent section) and reassessed later with respect to their “morphological purity”. It is impossible to macro-dissect regions containing a single morphological pattern. Hence, those regions which contained significant amount (>=20%) of other morphologies were considered “non-core”, while the rest were called “core” regions. This distinction applies to morphological regions solely and not to whole-tumor samples. Indeed, the reference in caption to Figure 4, should refer to Supp. Fig. 7 (and has been updated).

      1.2) CMS subtype should be performed with single sample predictor rather than CMScaller.

      We agree that a single-sample predictor for CMS is needed, however CMScaller is the de facto classifier for CMS (>130 citations) so we used it to illustrate the practical implications.

      1.3) A couple of surprising observations need specification. MUC2 is a strong CMS3 reporter gene yet Mucinous tumours appear to end up in CMS4 rather than 3. Can the authors show that indeed stroma cells are very evident in these samples?

      We do not have a direct estimation of the amount of stromal cells, but the high scores of the various fibroblast-related signatures in mucinous regions (Fig2 B, D) indicate that, indeed, there is an enrichment in stroma. In the follow-up study we plan to perform specific staining as well as spatial transcriptomics of these regions to further investigate our findings.

      1.4) The SE PP and CT are assigned to CMS2, but in Figure 4 this appears a lot more variable than the authors would make the reader believe. The full data are not completely clear (see point 1).

      In the paper, we transparently state that PP, SE, and CT were assigned to CMS2 in 62.5%, 41.7% and 41.9% of cases, respectively. These proportions referred to all samples for which CMSCaller made a prediction. In Fig.4, we also show the proportion of cases in which CMSCaller did not predict any subtype.

      1.5) The tumor response rates are rather weird as this is likely dependent on the complete tumour and not so much the subareas. It is not very well described what we see in this analysis.

      We did not compute any response rates but simple prognostic scores as (weighted, if weights were provided) means of genes in the specific signatures (see Methods). The question addressed was whether these scores were comparable between whole tumor and corresponding tumor regions (within same tumor). Given the observed (relative) variability, the more important follow-up question - which we cannot answer with our limited survival data – is whether a higher score in a region in comparison with whole-tumor is indeed indicative of a higher risk of relapse.

      1.6) Serrated adenomas have previously been aligned with CMS4. Is this different from serrated areas in cancers?

      We do not have data from adenomas to compare with the serrated carcinoma regions. But a comparison of (regions of) both traditional serrated and sessile serrated adenomas to serrated carcinoma would be interesting.

      1.7) The fact that iCMS2 and iCMS3 align rather well with the current analysis of the distinct regions suggests that the analysis that was reported last year is the proper way to view tumor intrinsic signatures. The authors now propose a rather similar outcome to this issue which does take away a lot of the novelty of the findings of this study.

      In the manuscript it is clearly stated that our goal was to describe the molecular characteristics associated with several morphological patterns. It was not to propose another stratification paradigm for colorectal cancer. As such, our analyses were not limited to molecular subtypes and the respective observations were but a small part of our findings. Indeed, the intrinsic subtypes (iCMS 2/3) were stable and robust, as they were based on the genes expressed in epithelial cells, and they might well prove to be of clinical importance too. However, they do not cover all aspects (e.g. fibroblasts subtypes) and, as stated in Joanito et al. Nat Gen 54, pages 963–975 (2022), “iCMS, MSI status and CMS jointly inform the molecular classification of CRC”. Last, in our opinion, the molecular classification of CRC, while a useful point of view in tumour classification, is not covering all the necessary perspectives on tumour heterogeneity.

      Reviewer #2:

      2.1) Overall, the manuscript provides an interesting histological/morphological framework through which we can consider heterogeneity in colorectal carcinoma and an approach by which we might improve the performance of gene expression-based classifiers in predicting clinical behaviour and/or responses to therapy. Exploration of CRC morphotypes and their differences was quite interesting. However, more work is needed to support the claims made by the authors. While I appreciate that the authors themselves identify limitations of their study within the manuscript, I believe awareness of these limitations is not reflected in some of the claims made in the abstract and at points in the main text when discussing the use of expression-based classifiers.

      The manuscript was improved to clarify several aspects that Reviewer 2 rightly pointed out:

      1. We clarify that for a patient (tumor) there might be one or several corresponding transcriptomics profiles (see Methods).

      2. The resulting “molecular portraits” were not derived with the goal to deconvolve the bulk tumor expression profiles and to estimate the proportions of morphotypes. Whether this is possible at all, is an open question and we mention this aspect in “Ideas and Speculation” section.

      3. We improved figures captions to be more descriptive.

      4. We included the reference for “Isela signature” at its first appearance.

    1. Author Response

      eLife assessment

      This useful study addresses epilepsy caused by the loss of a molecule called Pten, resulting in hyperactivity of the mTOR pathway. The findings suggest that inhibiting two molecules called mTORC1 and mTORC2 can reduce epilepsy symptoms but there is much less effect when inhibited separately. The evidence supporting the conclusions is currently incomplete, but could be strengthened after additional experiments.

      We thank the editors for this assessment and the reviewers for their comments. We will consider each of the recommendations we received and revise the manuscript accordingly.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform EEG monitoring on this many animals. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Raptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Raptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. We had also performed a preliminary analysis of the hippocampal Cre expression, which suggested that Cre expression in the hippocampus did not affect generalized seizure occurrence. We plan to include data on Cre expression in the hippocampus in the revised version of the manuscript.

      3) Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest-amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data.

      4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. We plan to include a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      Soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation in neurons reduces both soma size and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls. We will elaborate on this in our revised submission.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We will revise the manuscript to reflect this.

      2) the data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      3) it would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      We plan to include data further describing the location of knockout in each animal (in both the hippocampus and cortex) in the revised version of the paper. Initial analyses indicated that the affected area did not differ between groups.

      Also, it is not clear which cortical cells were measured for soma size.

      In the Methods it says “Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced.” Earlier under “Histology and imaging” it says “Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used.”

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

    1. Author Response

      Reviewer #1 (Public Review):

      Erbacher and colleagues provide further evidence for the function of epithelial cells as major contributors to the transduction of sensory stimuli. This technically advanced imaging study of human skin advances support for the anatomical and functional association of nerve fibers and skin keratinocytes. With combined high-resolution imaging and immunolabeling, the authors also advance the idea that gap junctions are at least one means by which direct neurochemical (e.g., ATP) communication from stimulated keratinocytes to nerve fibers can be achieved.

      A major strength of the study is the combined use of super-resolution array tomography (srAT), expansion microscopy, structured illumination microscopy and immunolabeling to analyze human skin in situ as well as co-cultures of human neurons and keratinocytes. High resolution static and video imaging of skin clearly supports the ensheathment by keratinocytes of nerve fiber projections as they traverse layers of the epidermis. Another strength of this study is the srAT imaging combined with connexin Cx43 immunolabeling that focus on sites of nerve fiber-keratinocyte contact zones. Imaging of Cx43+ plaques support these sites as regions of direct epithelial-neural contact and as such, of communication.

      Although imaging data support Cx43+/connexin plaques and neural ensheathment as regions of direct epithelial-neural communication, e.g., via keratinocyte release of ATP, this relationship remains correlative and lacking in quantification.

      The conclusion of this paper regarding the anatomical relationship between nerves and keratinocytes is well supported. Data also support the proposal of connexin plaques as sites of communication, although analyses that validate this relationship, using experimental models and in human samples, remain for future studies.

      Please note, comments referring to specific pages within the revised manuscript always refer to the tracked-word file version.

      Reviewer #2 (Public Review):

      Erbacher et al. have used new techniques to explore the neuro-cutaneous structures of human epidermis, which is a valuable goal given the lack of in-depth studies in human skin. Human skin is less studied than rodent skin because it presents challenges in obtaining samples and finding excellent immunohistological labels. They have employed expansion microscopy and super resolution array tomography for histological studies and have developed a human keratinocyte and human iPSC-derived sensory neuron co-culture. The authors have used these techniques to investigate the relation of intraepidermal nerve fibers (IENF) and keratinocytes, as well as to probe the localization of connexin 43. The data offer some anatomical insights, but as is does not add to our understanding of keratinocyte-neuron coupling.

      Strengths:

      This paper is applying newer techniques to probe structure in human skin and establishes some useful immunohistochemical labels to do this, which sets up a foundation that will be valuable for future studies. The observation that IENF sometimes tunnel through keratinocytes is interesting, and the manuscript does show that Cx43 hemichannels are localized near IENF. Their data definitely represents a technical achievement, as these studies are challenging.

      Weaknesses:

      Throughout the paper, the authors imply that they make discoveries that shed light on neuro-cutaneous interactions, but the data in this manuscript do not offer any functional insight into connections between IENF and keratinocytes. For example, the final figure legend indicates they have found evidence of "electrical and chemical synapse-like contacts to nerve fibers" (Figure 9), but no such evidence was shown. Only a single neuron vesicular marker (synaptophysin) was shown to localize to neurons in culture, as expected. They also "...propose a crucial role of nerve fiber ensheathment and Cx43-based keratinocyte-fiber contacts in neuropathic pain and small fiber pathology." but do not show any data regarding the contribution of their anatomical findings to sensory function.

      We recognize that our anatomical findings do not provide a complete picture of neuro-cutaneous interactions. Related findings on functional level, namely activation of nerve fibers after keratinocyte stimulation were previously reported (Klusch et al., 2013; Mandadi et al., 2009; Sondersorg et al., 2014). However, these studies otherwise lack morphological and molecular grounding and human biomaterial/cells, which we aimed to decipher in our study. We agree that functional and anatomical findings need to be connected in the future. We rephrased and attenuated our conclusions on Cx43 contacts in the context of IENF-keratinocyte interaction.

      Their data do show that IENF are anatomically closely apposed to keratinocytes, but this is inevitable given their location in the epidermis. The expression of Cx43 in human epidermis is also known (PMID: 7518858) and localizing Cx43 plaques near IENF does not add to current knowledge, as wide expression in keratinocytes naturally positions them near the embedded IENF. There is no indication whether IENF also expresses Cx43 to form gap junctions. Moreover, due to the lack of quantification, it is not clear whether Cx43 labeling is enriched at IENF sites as compared to other areas on the keratinocytes.

      We appreciate previous work on Cx43 and have integrated respective findings in the revised Introduction of our manuscript (see page 3-4):

      “Connexin 43 (Cx43) pores are well established as a major signaling route for keratinocyte-keratinocyte communication (Tsutsumi et al., 2009) and potentially transduce external stimuli likewise towards afferents.”

      As the Reviewer highlighted, Cx43 is widely clustered between keratinocytes and serves as an intercellular signaling route. Similar to keratinocyte-keratinocyte contacts, gap junctions (homomeric/heteromeric) or hemichannels towards IENF are possible. We aimed to quantify Cx43 contacts in healthy control and small fiber neuropathy patient-derived skin sections, since alterations in these contacts would affirm their biological relevance. We have generated pilot data for relative quantification of Cx43 contacts in skin samples of healthy controls (n = 5) and patients with small fiber neuropathy (n = 4). We have added respective passages in the Methods (see page 16-18), Results (see page 31-33), and Discussion (see page 41) sections of our revised manuscript. Please also see Figure 5.

      The authors' implication that their anatomical data offers insight into neuro-cutaneous functional coupling is a leap that is evident throughout the manuscript.

      We have attenuated our tone throughout the manuscript e.g. in:

      Abstract (page 2):

      “Unraveling human intraepidermal nerve fiber ensheathment and potential interaction sites advances research at the neuro-cutaneous unit.”

      Discussion (page 42):

      ”Our observation of Cx43 plaques along the course of IENF in native skin and a human co-culture model substantiates a morphological basis and suggests keratinocyte hemichannels or gap junctions as one potential signaling pathway towards IENF.”

      Conclusion (page 44):

      “Epidermal keratinocytes show an astonishing set of interactions with sensory IENF including ensheathment and potential electrical and chemical synapse-like contacts to nerve fibers which may have substantial implications for the pathophysiological understanding of neuropathic pain and neuropathies.”

      References

      Jiang, N., Rasmussen, J.P., Clanton, J.A., Rosenberg, M.F., Luedke, K.P., Cronan, M.R., Parker, E.D., Kim, H.-J., Vaughan, J.C., Sagasti, A., 2019. A conserved morphogenetic mechanism for epidermal ensheathment of nociceptive sensory neurites. eLife 8, e42455.

      Klein, T., Gruener, J., Breyer, M., Schlegel, J., Schottmann, N.M., Hofmann, L., Gauss, K., Mease, R., Erbacher, C., Finke, L., 2023. Small fibre neuropathy in Fabry disease: a human-derived neuronal in vitro disease model. bioRxiv, 2023.2008. 2009.552621.

      Klusch, A., Ponce, L., Gorzelanny, C., Schafer, I., Schneider, S.W., Ringkamp, M., Holloschi, A., Schmelz, M., Hafner, M., Petersen, M., 2013. Coculture model of sensory neurites and keratinocytes to investigate functional interaction: chemical stimulation and atomic force microscope-transmitted mechanical stimulation combined with live-cell imaging. J. Invest. Dermatol. 133, 1387-1390.

      Kruger, L., Perl, E., Sedivec, M., 1981. Fine structure of myelinated mechanical nociceptor endings in cat hairy skin. J. Comp. Neurol. 198, 137-154.

      Mandadi, S., Sokabe, T., Shibasaki, K., Katanosaka, K., Mizuno, A., Moqrich, A., Patapoutian, A., Fukumi-Tominaga, T., Mizumura, K., Tominaga, M., 2009. TRPV3 in keratinocytes transmits temperature information to sensory neurons via ATP. Pflugers. Arch. 458, 1093-1102.

      Sondersorg, A.C., Busse, D., Kyereme, J., Rothermel, M., Neufang, G., Gisselmann, G., Hatt, H., Conrad, H., 2014. Chemosensory information processing between keratinocytes and trigeminal neurons. J. Biol. Chem. 289, 17529-17540.

      Talagas, M., Lebonvallet, N., Leschiera, R., Sinquin, G., Elies, P., Haftek, M., Pennec, J.P., Ressnikoff, D., La Padula, V., Le Garrec, R., 2020. Keratinocytes Communicate with Sensory Neurons via Synaptic‐like Contacts. Ann. Neurol. 88, 1205-1219.

      Tavares-Ferreira, D., Shiers, S., Ray, P.R., Wangzhou, A., Jeevakumar, V., Sankaranarayanan, I., Cervantes, A.M., Reese, J.C., Chamessian, A., Copits, B.A., Dougherty, P.M., Gereau, R.W.t., Burton, M.D., Dussor, G., Price, T.J., 2022. Spatial transcriptomics of dorsal root ganglia identifies molecular signatures of human nociceptors. Sci. Transl. Med. 14, eabj8186.

      Tenenbaum, C.M., Misra, M., Alizzi, R.A., Gavis, E.R., 2017. Enclosure of Dendrites by Epidermal Cells Restricts Branching and Permits Coordinated Development of Spatially Overlapping Sensory Neurons. Cell Rep. 20, 3043-3056.

      Tobin, D.J., 2006. Biochemistry of human skin--our brain on the outside. Chem. Soc. Rev. 35, 52-67.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors provide compelling evidence that the activation of distinct populations of NTS neurons provides stronger decreases in eating/body weight when co-activated. Avoidance is not necessarily linked to the extent of the effects but seems to depend on specific neurons which when activated, not only reduce eating but also induce avoidance reactions. The results of this study provide strong data promoting multi-targeted approaches to reduce eating and body weight in obesity. Interestingly, none of the pathways identified is necessary for the weight-reducing effect of vertical sleeve gastrectomy. Future studies will hopefully shed light on the type of neurotransmitters released by these distinct populations of NTS neurons.

      We thank the reviewer for these helpful and supportive comments.

      Reviewer #2 (Public Review):

      Prior results established that Lepr, Calcr, and Cck neurons are non-overlapping neuronal populations in the NTS that individually suppress food intake when activated. This paper examines the consequences of activating or inhibiting two or three of these populations simultaneously. Activating two or three populations inhibits food intake a body weight more than each individually. Activation of Lepr and/or Calcr neurons is not aversive based on the conditioned taste aversion test, whereas activating all three is aversive by this test, indicating that aversion due to Cck neurons activation is dominant. Vertical sleeve gastrectomy (VSG) causes weight loss, but inhibiting each of these neurons individual or all three of them does not prevent weight loss. Overall, this paper provides a solid set of results but does not provide mechanistic insight into any of the phenomena examined.

      We have now added data demonstrating differences in the activation of FOS-IR in the downstream targets of our NTS neuron types, alone or in combination (new Figure 6). Our findings reveal that each population (NTSLepr, NTSCalcr, and NTSCck) activates an at least partially distinct set of neurons and that only NTSCck cells activate the known aversive PBN CGRP cells. These data suggest that the cumulative effects mediated by each of these NTS populations stem in part from their ability to activate at least partly distinct populations of downstream neurons.

      Unfortunately, it is outside of the scope of this manuscript (and the realm of the currently possible) to define the neurons that mediate the response to VSG, and we have now reorganized the manuscript to clarify that our VSG data (along with the feeding-induced FOS-IR data) serve to reveal that additional populations of neurons (other than NTSLCK cells) must contribute to the restraint of feeding.

    1. Author Response

      Reviewer #1 (Public Review):

      I believe it is important for the authors to clarify how the time frames to test for group differences of ERP components were defined. Were the components defined based on a grand average across lesions and controls or based or on the maximum range for both groups? As the paper is written currently this is unclear to me. It is also unclear why the group comparisons between controls and lateral PFC group were based only on the control group. To ensure no inadvertent biases towards the larger control group were introduced and ensure the studies findings were reliable, it would be appreciated if the authors could clarify this.

      We thank the reviewer for the helpful comment. We recognize the need for a clearer definition of time frames for testing group differences in the ERP components and apologize for any ambiguity in the previous version of the manuscript.

      Regarding the time frames to test for group differences of ERP components for the OFC and control groups, they were determined based on the combined maximum range for both groups. The time range for each group and each ERP component was derived from the statistical analysis of the condition contrasts run for each group. For instance, for the Local Deviance MMN, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a MMN component from 67 to128 ms, while the same condition contrast for the OFC group revealed a MMN from 73 to131 ms. The time frame used for the group comparison on the MMN time window was 50 to 150 ms to capture component activity for both groups. In the same way, for the Local Deviance P3a, the condition contrast (i.e., Control condition versus Local Deviance condition) for the CTR group revealed a P3a component ranging from 141 to 313 ms, while the same condition contrast for the OFC group revealed a P3a from 145 to 344 ms. The time frame used for the group comparison on the P3a time window encompassed 140 to 350 ms to capture component activity for both groups.

      In the “Results” section of the main manuscript, together with the results from the cluster-based permutation independent samples t-tests, we provide the time frames in which the latter were computed for each ERP component. These segments have been highlighted with yellow in the revised manuscript. Moreover, in the section “Materials and methods - Statistical analysis of event-related potentials” of the main manuscript [page 37, paragraph 2], we provide a revised description of how the time frames for group differences of ERPs were defined. The revised description states: “In a second step, to check for differences in the ERPs between the two main study groups, we ran the same cluster-based permutation approach contrasting each of the four conditions of interest between the two groups using independent samples t-tests. The cluster-based permutation independent samples t-tests were computed in the latency range of each component, which was determined based on the maximum range for both groups combined. The latency range for each group and component was based on the time frames derived from the statistical analysis of task condition contrasts.”

      Regarding the comparisons between the lateral PFC and control groups, they were not based solely on the control group condition contrast. This was miswritten. The approach to define time frames to test for ERP differences between the CTR and the lateral PFC group was the same as the one used to test differences between CTR and OFC groups. We apologize for any confusion this may have caused. We have revised the erroneous statements in the Supplementary File 1 [highlighted text, page 9-10].

      An additional potential weakness of the paper, and one that if addressed would increase our confidence that neural differences arise because of the specific lesion effect, is the lack of evidence that the lesion and control groups do not differ on measures that could inadvertently bias the neural data. For example, while the groups did not differ on demographics and a range of broad cognitive functions, were there any differences between the number or distribution of bad/noisy channels in each subject between the two groups? Were there differences in the number of blinks/saccades or distribution of blinks or saccades across the conditions in each subject across the two groups.

      We thank the reviewer for this suggestion. We have completed a number of measurements and tests to ensure that the OFC lesion group and the control group did not differ on measures that could affect the neural data. First, we computed the number of bad/noisy channels for each subject and group, and found that the two groups did not differ significantly. Second, we computed the number of trials remaining after removing the noisy segments across conditions for each subject and group, and found no significant differences between the groups. Third, the number of blinks/saccades across conditions for each subject and group showed no significant group differences. Altogether, the results indicate that the neural differences observed in our study arose because of the specific lesion effect.

      These additional EEG measures and the statistical test results are included in the Supplementary File 1 [page 15-16] and Supplementary File 1g. We have also added text in the section “Materials and methods - EEG acquisition and pre-processing” of the main manuscript [page 35, paragraph 3], which states: “To ensure the validity of the neural data analysis, potential sources of bias were assessed between the healthy control participants and the OFC lesion patients. Specifically, no significant differences were observed between the two groups in terms of the number of noisy channels, the number of noisy trials, or the number of blinks across the task blocks and the experimental conditions.”

      On a similar note, while I appreciate this is a well established task could the authors clarify whether task difficulty is balanced across the different conditions? The authors appear to have used the counting task to ensure equal attention is paid across conditions although presumably the blocks differ in the number of deviant tones and therefore in the task difficulty. Typically, tasks to maintain attention are orthogonal to the main task and equally challenging across the different blocks. Is there a way to reassure readers that this has not affected the neural results?

      Thank you for pointing this out. Indeed, the experimental blocks differ in the number of deviant tones and therefore in the task difficulty. Thus, it is a very good suggestion to look for behavioral performance differences across the different blocks. In the present set of analyses, two block types were used: Regular (xX) and Irregular (xY). In regular blocks, where the repeated sequence is xxxxx, participants were required to count the rare/uncommon sequences, i.e., xxxxy and xxxxo. In irregular blocks, where the repeated sequence is xxxxy, participants were required to count the rare/uncommon sequences, i.e., xxxxx and xxxxo. We have now updated the behavioral analysis. First, by excluding the omission block’s counting performance, and second, by calculating the counting performance separately for the two blocks. The new behavioral analysis revealed that participants from both groups performed better in the irregular block compared to the regular block. However, there was no statistically significant difference between the counting performances of the two groups.

      The new results are reported on page 5 of the main manuscript, section “Results - Behavioral performance”, paragraph 1: “Participants from both groups performed the task properly with an average error rate of 9.54% (SD 8.97) for the healthy control participants (CTR) and 10.55% (SD 6.18) for the OFC lesion patients (OFC). There was no statistically significant difference between the counting performance of the two groups [F(24) = 0.11, P = 0.75]. Participants from both groups performed better in the irregular block (CTR: 8.39 ± 8.24%; OFC: 7.50 ± 7.34%) compared to the regular block (CTR: 10.69 ± 11.36%; OFC: 13.60 ± 10.97%) [F(24) = 3.55, P = 0.07]. There was no block X group interaction effect [F(24) = 0.73, P = 0.40].”

      As with many patient lesion studies, while the comparison directly against the healthy age matched controls is critical it would have strengthened the authors claims if they could show differences between the brain damaged control group. Given the previous literature that also links lateral PFC with prediction error detection, I understand that this region is potentially not the clearest brain damaged control group and therefore another lesion group might have strengthened claims of specificity. Furthermore, the authors do not offer an explanation for why no differences between lateral PFC and control groups were found when others have previously reported them. Identifying those differences would strengthen our understanding of the involvement of different structures in this task/function.

      We thank the reviewer for raising this crucial issue. We recognize the importance of addressing the lack of neurophysiological differences between the lateral PFC lesion group and the control group. First, it is important to clarify that the lateral PFC lesion control group was initially included not as a control for specific lateral PFC lesions but rather a broader control group to account for potentially general effects of frontal brain damage. However, considering that previous studies have implicated specific areas of the lateral PFC (e.g., inferior frontal gyrus; IFG) in predictive processing, we also think that a more thorough justification of these null findings is needed.

      Intracranial EEG studies examining local and global level prediction error detection pointed to the role of inferior frontal gyrus (IFG) as a frontal source supporting top-down predictions in MMN generation (Dürschmid et al., 2016; Nourski et al., 2018; Phillips et al., 2016; Rosburg et al., 2005). However, other intracranial studies reported unclear (Bekinschtein et al., 2009) or weak (Dürschmid et al., 2016) frontal MMN effects. El Karoui et al. (2015) observed late ERP responses in the lateral PFC related to global deviants but no MMN to local deviants, and it was not clear where in the PFC these responses occurred, not showing responses in the IFG. Additionally, studies employing dynamic causal modeling of MMN consistently modeled frontal sources in the IFG region (Garrido et al., 2008; Garrido et al., 2009; Phillips et al., 2015). A review by Deouell (2007) highlighted the potential contributions of both IFG and middle frontal gyrus to MMN generation, suggesting that the specific source might vary depending on characteristics of the deviant stimuli, such as pitch or duration.

      In Alho et al. (1994) lesion study, diminished MMN to local-level deviants was found after lesion to the lateral PFC, with the lesion cohort exhibiting a hemisphere ratio of 7/3 for left and right hemispheres, respectively, which is different from our cohort's ratio of 4/6. Furthermore, all individuals in that study had infarcts in the middle cerebral artery, resulting in a more uniform lesion location compared to our cohort. Notably, the lesions observed in our lateral PFC group appeared to be situated in more superior brain regions and towards the MFG compared to the predominantly reported involvement of the IFG in previous studies. Another factor that might contribute to the lack of significant effects is the heterogeneity of the lesions in our lateral PFC group (see Supplementary Figures 2, 3 and 4). Especially for the left hemisphere cohort, the individual lesions did not share a consistent anatomical location. The right hemisphere cohort had a greater lesion overlap, but overall, the lesions were not centered in the IFG area with highest overlap being in the MFG area. This distinction in lesion location might contribute to the absence of effects observed in our study.

      Regarding the global effect, often reflected in the P300 component, it appears that the neural sources responsible for processing global deviance exhibit a more distributed pattern. This means that the brain regions involved in detecting and processing global deviations may not be as localized or concentrated as those implicated in local deviance processing. Given that the neural mechanisms underlying global deviance detection and processing are likely to involve a wider network of brain regions, they may be less susceptible to disruptions caused by focal lesions in the lateral PFC.

      In response to your comment, we have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      Finally, while the authors have already cited widely across multiple fields, again speaking to the likely large impact the study will make, there does appear to be an unexplored conceptual link between the conclusions here that the OFC supports "the formation of predictions that define the current task by using context and temporal structure to allow old rules to be disregarded so that new ones can be rapidly acquired" and that lesions of the lateral portions of the OFC disrupt the assignment of credit or value to a stimuli that occurred temporally close to the outcome (Walton et al 2010, Noonan et al 2010, PNAS, Rudebeck et al 2017 Neuron, Noonan et al 2017, JON, Wittmann et al 2023 PlosB, note the wider imaging literature in line with this work Jocham et al 2014 Neuron and Wang et al bioRxiv). Without the OFC monkeys and humans appear to rely on an alternative, global learning mechanism that spreads the reinforcing properties of the outcome to stimuli that occurred further back in time. Could the authors speculate on how these two strains of evidence might converge? For example, does the OFC only assign credit in the event of a prediction error or does one mechanism subsume another?

      We thank the reviewer for this comment regarding the unexplored conceptual link between our study’s conclusion, which suggests that the OFC facilitates the detection of prediction errors, and the findings of other research that delves into the OFC’s role in assignment of credit to stimuli. We find this comment very interesting and appreciate the opportunity to speculate on the potential functional convergence of these two processes within the OFC.

      The OFC is a critical neural hub implicated in learning, decision-making, and adaptive behavior. The detection of prediction errors and the assignment of credit to stimuli are mechanisms linked with the OFC, which play an important role in all these functions (Noonan et al., 2012; Schultz & Dickinson, 2000; Sul et al., 2010; Tobler et al., 2006; Walton et al., 2010; Walton et al., 2011). Prediction errors involve recognizing discrepancies between expected and actual outcomes, which engages the OFC in rapidly updating stimulus valuations to align with newfound information (Holroyd & Coles, 2002; Kakade & Dayan, 2002). Signaling of errors provides a powerful mechanism whereby OFC facilitates adaptive learning and enables the brain to adjust its expectations based on novel experiences (Schultz, 2015; Seymour et al., 2004). Credit assignment, on the other hand, refers to properly identifying the causes of prediction errors. Without proper credit assignment, one might have intact error signaling mechanisms, but lose the ability to learn appropriately. This is especially true when multiple possible antecedents may be related to the error or when past choices have been unpredictable. In such situations, it is important to assign credit to the most recent choice and not get distracted by previous alternatives (Stalnaker et al., 2015).

      These mechanisms within the OFC appear interrelated yet distinct. While prediction errors could trigger credit assignment, the OFC's ability to continually assess stimuli's values extends beyond instances of prediction errors. The OFC is involved in continuously evaluating and updating the values of stimuli based on ongoing experiences (Padoa-Schioppa & Assad, 2006; Tremblay & Schultz, 1999). This process enables the brain to learn from both unexpected outcomes and regular, predictable interactions with the environment. In situations where outcomes are not solely determined by prediction errors, the assignment of credit remains important. Complex decision-making involves considering a variety of factors beyond just prediction errors, such as contextual information and long-term consequences. Clarifying the convergence of these mechanisms within the OFC holds profound implications for understanding the intricacies of learning dynamics and the orchestration of adaptive responses to the environment.

      While we recognize the value of this discussion, we believe it extends beyond the primary focus of our study. Consequently, we have made the decision not to incorporate it into the current manuscript.

      One remaining weakness, which plagues all patient studies, is that of anatomical specificity. The authors have analysed what is, for the field, a large group of patients, and while the lesions appear to be relatively focused on the OFC the individuals vary in the degree to which different subregions within the OFC are damaged. This is increasingly important as evidence over the last 10 years has identified functional roles of these specific structures (Rushworth et al 2011, Neuron, Rudebeck et al 2017 Neuron). It would be important to ultimately know whether the detection of prediction errors was specific to a particular OFC subregion, a general mechanism across this area of cortex, or whether different subregions were more involved during different contexts or types of stimuli/contexts/tasks etc. Some comments on this would be appreciated.

      The reviewer raised an important point here. It would have been interesting to explore this aspect. However, one challenge with focal lesion studies is to establish large patient cohorts. The group size of our study, which is relatively large compared to other studies of focal PFC lesions, does not allow us to perform any exploratory lesion-symptom mapping analyses. A larger patient sample will provide a stronger basis for drawing conclusions about the critical role of a particular OFC subregion to the detection of prediction errors and allow statistical approaches to lesion subclassification and brain-behavior analysis (e.g., voxel-based lesion-symptom mapping (Bates et al., 2003; Lorca-Puls et al., 2018)).

      Considering the average percentage of damaged tissue in our study, the medial part of OFC or Brodmann area 11 is affected more by the lesion (approx. 33%), followed by the anterior-most region of the prefrontal cortex or Brodmann area 10 (approx. 25%), and the lateral portions of the OFC or Brodmann area 47 (approx. 12%). From our analysis, it is difficult to conclude whether the detection of prediction errors in our study was specific to a certain OFC area, or whether different subregions were involved more than others during different types of stimuli/contexts processing.

      To provide a more balanced interpretation of our findings, we incorporated a section in the “Discussion”, titled “Limitations and future directions” [page 24-25], which delves into the limitations of our study and lesion studies generally with respect to anatomical specificity and the challenge to establish large patient cohorts.

      Reviewer #2 (Public Review):

      The current version of the manuscript is overall very long and verbose, for example, the introduction is 5 pages long and includes up to 102 references. In my view this is way too much. I suppose authors wish to be very detailed, but somehow they get an opposite effect, the main message of the introduction and aims get diluted.

      We thank the reviewer for the feedback on our manuscript's length and content. This prompted us to carefully reconsider the balance between providing necessary context and ensuring the clarity of our main message. Our intention was to establish a strong foundation for our research by presenting relevant literature and setting the stage for our aims. In our revised manuscript, we have condensed the Introduction while retaining the key elements necessary to understand the context and motivations behind our research. Specifically, the current version of the “Introduction” is three pages long and includes 83 references.

      I wonder if the presentation rate used, SOA; 150 is too fast and the stimuli too short 50 ms. Please prove a rationale for this.

      We appreciate the reviewer's thoughtful consideration of the stimulus duration and presentation rate (SOA) used in our study. We understand the importance of providing a rationale for our choices to ensure the validity of our experimental design. The decision to use a SOA of 150 ms and stimuli of 50 ms duration was grounded in established practices and relevant literature in the field. Similar presentation rates and stimulus durations were employed in previous studies using similar auditory oddball paradigms, investigating rapid cognitive processes in combination with event-related potentials (ERPs). For instance, Bekinschtein et al. (2009) first introduced the task by using a SOA of 150 ms and stimulus duration of 50 ms, demonstrating that this combination is sensitive to detecting auditory deviations and eliciting early and late ERP components. Additionally, Wacongne et al. (2011), Chennu et al. (2013), Uhrig et al. (2014), and El Karoui et al. (2015) employed similar task designs with the same SOA and stimulus duration in combination with scalp EEG, fMRI and intracranial recordings, further supporting the validity of this approach. Other studies, employing the same paradigm, such as Chao et al. (2018) and Doricchi et al. (2021), used a SOA of 200 ms but kept the same stimulus duration of 50 ms.

      One of the conditions is 'omissions', but results are not reported, so either authors do not mention this at all, or they report these data, which would be probably interesting.

      We thank the reviewer for the nice reminder. The “omissions” condition is indeed an integral part of our study, and we acknowledge its potential significance. However, we have decided to publish the detailed analysis of the 'omissions' condition in a separate paper, because we think that such analysis and discussion would make the current paper quite dense and complicated. We apologize for any confusion that might arise from the absence of the 'omissions' results in this manuscript. On page 33 of the main manuscript, we state the reason for not including the “omissions” condition in the current analysis: “In the present set of analyses, the Omission blocks were not further examined, because such analysis and discussion would make the current paper overly dense and complicated.”

      The Discussion is very long and in some aspect even too speculative. For example, in the conclusions authors claim that the OFC contributes to a top-down predictive process that modulates the deviance detection system in the primary auditory cortices and may be involved in connecting PEs at lower hierarchical areas with predictions at higher areas. I am not sure the current data support this. This would-be probably more appropriate if they could compare results from OFC and AC etc. so it is a more dynamic study.

      We thank the reviewer for this observation. We have made revisions to shorten and refine the discussion, with a primary focus on presenting and interpreting the key results in a more concise and straightforward manner (See tracked changes in the revised manuscript).

      However, the overall length of the Discussion has not been reduced significantly because we have introduced two additional sections within the Discussion (i.e., “Lack of findings in the lateral PFC lesion group” and “Limitations and future directions”) in response to reviewers’ request to address the lack of finding in the lateral PFC lesion group and certain limitations associated with the employed lesion method.

      We also agree that the claim mentioned by the reviewer is overly too speculative and therefore revised the sentence as follows [page 38, “Conclusion”]: “We suggest that the OFC likely contributes to a top-down predictive process that modulates the deviance detection system in lower sensory areas.”

      At the beginning of Discussion, the authors mention that overall, these findings provide novel information about the role of the OFC in detecting violation of auditory prediction at two levels of stimuli abstraction/time scale. I think this needs to be detailed more specifically rather than mention they provide novel results.

      We understand the importance of providing readers with precise descriptions about the novelty of our study. Therefore, we have revised the statement to provide more detailed information about the novel contributions offered by our study. The revised text states as follows [“Discussion”, page 18,]: “These findings indicate that the OFC is causally involved in the detection of local and local + global auditory PEs, thus providing a novel perspective on the role of OFC in predictive processing.”

      I am not sure I like to have a section as a general discussion within the discussion itself, probably this heading should be reformatted to be more specific to what is discussed.

      As suggested by the reviewer, we reformatted the heading to “OFC and hierarchical predictive processing” [page 22-24] to better capture the essence of the content covered in this section of the “Discussion”. Here, we discuss the functional relevance of our EEG findings under the umbrella of the predictive coding framework and the potential role of OFC in predictive processes (See tracked changes in the revised manuscript).

      Reviewer #3 (Public Review):

      The central claim of the study is that hierarchical predictive processing is altered in OFC patients. However, OFC patients were able to identify global deviants as well as controls. Thus, hierarchical predictive processing itself seems to be unaltered, even though its neural correlates were different. This begs the question of what exactly the functional meaning of the EEG findings is. From the evidence presented this is difficult to determine for three reasons (See comments below).

      We thank the reviewer for the detailed observations and valuable comments. The reviewer points out that hierarchical predictive processing is unaltered even though the neural correlates were altered, because OFC patients were able to identify global deviants as accurately as control participants. We respectfully disagree with the reviewer’s claim for two reasons: 1) The primary purpose of the behavioral data in this study was not to measure the participants’ deviant detection performance, but to confirm that they were paying attention to the global rule of each block. However, we agree that an effect of lesion on behavioral performance would strengthen the claim of altered high-level predictive processing. Your point highlights the importance of looking more carefully at our behavioral results. In a follow up study, which we are currently running, we explore the behavioral nuances of our task by measuring reaction times of correct deviant detections. 2) Earlier lesion studies reported typical performance on simple oddball tasks for patients with focal frontal lesions that did not significantly differ from control participants. However, despite normal task execution and neuropsychological profiles, patients with LPFC and OFC lesions present distinct neurophysiological evidence of alterations in novelty processing (Knight, 1984, 1997; Knight & Scabini, 1998; Løvstad et al., 2012; Yamaguchi & Knight, 1991).

      Regarding the central claim of our study being that hierarchical predictive processing is altered in OFC patients, we have tried not to make strong claims about our results showing altered hierarchical predictive processing. For example, the conclusion of the abstract states: “the altered magnitudes and time courses of MMN/P3a responses after lesions to the OFC indicate that the neural correlates of detection of auditory regularity violation is impacted at two hierarchical levels of stimuli abstraction.” Thus, we do not claim that detection of regularity violation is directly impaired (e.g., OFC patients were able to identify global deviants as well as healthy controls) but that the neural correlates of deviants’ detection are altered, and therefore impaired.

      Finally, we have gone through all the comments/reasons, which the reviewer believes are difficult to determine the functional meaning of our EEG findings, and addressed them one by one (see comments below). We hope that the revised manuscript has been improved accordingly and provides a more critical view on the extent to which the findings support hierarchical predictive coding.

      It is possible that the shifts in scalp potentials are due to volume conduction differences linked to post-lesion changes in neural tissue and anatomy rather than differences in information processing per se.

      We appreciate your comment regarding the potential influence of volume conduction differences on the observed shifts in scalp potentials in our study. We acknowledge that there are special challenges in interpreting ERP findings in brain lesion populations (Kutas et al., 2012; Rugg, 1995). To reliably interpret changes in the ERPs in lesion patients as reflecting impairments in certain cognitive processes, it is necessary to identify factors that might possibly affect the results and to apply the appropriate control measures. As noted by the reviewer, structural pathology, and the replacement of neural tissue by cerebrospinal fluid following tumor resection, likely causes inhomogeneities in the volume conduction of electrical activity and resulting changes in current flow patterns. Moreover, post-craniotomy skull defects can cause local inhomogeneities in the resistive properties of the skull (Løvstad & Cawley, 2011; Rugg, 1995). Both types of biophysical changes might alter the amplitude levels and/or topography (by altering the configuration of the generators) of surface-recorded ERPs (e.g., Swick (2005)). Consequently, caution is warranted when comparing the ERPs and their scalp distributions of intact and brain-lesioned groups. It is difficult to directly quantify the consequences of brain lesions on tissue conductivity. To conclude that ERP differences between patients and controls reflect functional abnormalities in particular cognitive processes, and not primarily nonspecific effects of structural brain damage, it is helpful to demonstrate that they are specific to certain ERP components/stages of information processing and task conditions. Changes confined to one or a subset of ERP components, that additionally may not manifest across all task conditions, can give some indication concerning the specificity of ERP changes (Kutas et al., 2012; Swaab, 1998). In our study, group differences pertaining to ERP amplitudes were limited to specific task conditions and not across all data. This condition-dependent pattern suggests that the observed shifts are related to the specific cognitive processes engaged during those task conditions rather than being a global artifact of volume conduction. If volume conduction was the main driver, we would expect these group differences to be more uniformly present across task conditions. Another piece of evidence against volume conduction effects is the scalp potentials’ latency differences between the two groups observed for the Local + Global deviance detection. Group differences in the latencies of ERPs, such as the MMN and P3a, cannot be attributed to volume conduction alone (Hämäläinen et al., 1993). These differences in the timing of neural responses strongly indicate genuine variations in cognitive processing.

      To provide a more balanced interpretation of our findings, we have incorporated a section in the “Discussion” that delves into the limitations of our study and lesion studies generally with respect to volume conduction and amplitude changes, titled “Limitations and future directions” [page 24-25].

      It is unclear from the analyses whether the P3a amplitude differences are true amplitude differences or a byproduct of latency differences. The reason is that the statistical method used (cluster based permutations) might yield significant effects when the latency of a component is shifted, even if peak amplitudes are the same. Complementary analyses on mean or peak amplitudes could resolve this issue.

      We thank the reviewer for raising an important concern about the use of cluster-based permutation tests and their potential to yield significant effects when the latency of a component is shifted. We acknowledge this concern and recognize the need for complementary analyses to address this issue. To provide a clearer understanding of the nature of the observed ERP amplitude differences, we conducted complementary analyses on mean amplitudes of the MMN and P3a components on the midline sensors for the conditions where significant group differences were observed. For the MMN component elicited by the Local Deviance, we found group amplitude differences on the electrodes AFz (p = 0.021), Fz (p = 0.008), CPz (p = 0.015), and Pz (p < 0.001). Surprisingly, we also found amplitude differences for the P3a component elicited by the Local Deviance on the electrodes AFz (p < 0.001), Fz (p < 0.001), FCz (p < 0.001), and Cz (p = 0.002) that were not observed previously with the cluster-based permutation analysis. For the MMN component elicited by the Local+Global Deviance, our analysis showed group amplitude differences on the electrodes AFz (p = 0.007), FCz (p = 0.051), Cz (p = 0.004), CPz (p = 0.002), and Pz (p < 0.001). However, as the reviewer rightly pointed out, the group differences for the P3a elicited by the Local + Global Deviance seem to be a byproduct of latency differences, as we did not find amplitude differences on any of the midline electrodes. Overall, this complementary analysis shows that the OFC patients had an attenuated MMN/P3a to local level prediction violation, and an attenuated and delayed MMN followed by a delayed P3a to the combined local and global level prediction violation. The new analysis is added in the Supplementary File 1 [page 5-7] and Supplementary File 1c and 1d.

      The MMN, P3a and P3b components are difficult to map to the hierarchical PC theory. Traditionally, the MMN is ascribed to lower level processing while P3a and P3b are ascribed to higher level processing. However, the picture is more complicated. For example, the current results show that the MMN is enhanced in local + global surprise while the P3a is elicited by local surprise. Furthermore, the P3a is classically interpreted as reflecting attention reorientation and the P3b as reflecting the conscious detection of task-relevant targets. How attention and conscious awareness fit in hierarchical PC is not entirely clear.

      Indeed, the relationships between MMN, P3a and P3b components and the predictive coding (PC) framework can be intricate. However, numerous studies employed the PC theory to interpret these common electrophysiological signatures as prediction error (PE) signals (Garrido et al., 2007, 2009; Lieder et al., 2013) and dissociations between these ERPs supported that there are successive levels of predictive processing (Chennu et al., 2013; El Karoui et al., 2015; Wacongne et al., 2011).

      In terms of hierarchical PC (Friston, 2005), the temporally constrained MMN has been traditionally linked with first-level predictive processing, known as the local effect of short-term stimulus deviance. PE signals at this level feed forward to a temporally extended, attention-dependent system that extracts longer-term patterns. PE signals at the higher level are usually indexed by the P300, identified as the global effect of longer-term stimulus deviance. The P300 reflects a more attention-driven process, emerging in response to novel or low-probability “target” stimuli that violate broader contextual expectations (Polich, 2007), such as those that form over multiple trials. Because the MMN, P3a and P3b also appear to exhibit varying degrees of sensitivity to preconscious and conscious perceptual predictions (Sculthorpe et al., 2009), they could serve as measures for examining the concept of a predictive neural hierarchy.

      Indeed, the MMN has been viewed as sensitive to local violation and essentially blind to higher-order regularities. However, this is a simplified view. For example, Wacongne et al. (2011) showed that violating a low-level perceptual expectation triggers the MMN, violating contextual expectations triggers the higher-level P3, and when both expectations are simultaneously violated, a larger response is evoked compared to either one alone. These findings, which are consistent with the results of our study, show that the local and global effects are not fully independent but interact in an early time window, indexed by enhanced and temporally extended MMN responses. They provide support not just for a hierarchical model, but for a predictive rather than a feedforward one. Moreover, the MMN has been found to be relatively insensitive to attention, because it is elicited in situations in which the subjects’ attention is directed away from the stimuli and there are no task demands (Chennu et al., 2013). Given that early MMN is a pre-attentive automatic ERP component (Näätänen et al., 2001; Pegado et al., 2010; Tiitinen et al., 1994), and given that it has been observed in comatose and vegetative state patients (Bekinschtein et al., 2009; Fischer et al., 2004; Naccache et al., 2004), the finding that even early MMN is impaired in OFC patients indicate that patients may suffer from a deficit in sensory predictive processing that is independent of attention and conscious awareness.

      The picture is more complicated when it comes to the predictive roles of P3a and P3b components. Following the MMN, a positive polarity P300 complex, sensitive to the detection of unpredicted auditory events, has been reported (Chennu et al., 2013; Doricchi et al., 2021; Kompus et al., 2020; Liaukovich et al., 2022). However, the two types of P300 (P3a and P3b) have not been clearly fitted into the hierarchical PC theory. The P3a is considered to be part of the brain's mechanism for detecting PEs (Wessel et al., 2012; Wessel et al., 2014) and may indicate that the brain is reallocating attentional resources to process and learn from these unexpected events. The P3a is typically interpreted as reflecting an involuntary attentional reorienting process (Escera & Corral, 2007; Ungan et al., 2019), which may relate to the operations of the ventral attention network (Corbetta et al., 2008; Corbetta & Shulman, 2002; Nieuwenhuis et al., 2005). Predictive coding emphasizes the role of contextual information in generating predictions with P3a being influenced by the context in which an unexpected event occurs (Schomaker et al., 2014). In the hierarchy of predictive processing, the P3a may reflect PEs at different hierarchical levels, depending on the complexity of the prediction and the degree to which it deviates from the sensory input. On the other hand, the P3b is linked to higher-level cognitive processes that involve updating long-term predictions based on incoming sensory information. It is highly dependent on attention, conscious awareness and active engagement with the task (Bekinschtein et al., 2009; Del Cul et al., 2007; Sergent et al., 2005; Strauss et al., 2015). It is thought to play a role in integrating the unexpected sensory input into the current context, potentially leading to updates of predictions in working memory (Chao et al., 1995; Donchin & Coles, 1988; Polich, 2007).

      Hierarchical PC theory is continually evolving, and the relationship between these ERP components and attention or conscious awareness remains an active area of research. We acknowledge the need for further investigation to better understand how attention and conscious awareness fit within this framework. In light of your comment, we provide a more comprehensive discussion about the functional meaning of the EEG findings in our “Discussion - OFC and hierarchical predictive processing” [page 22-24].

      The fact that lateral PFC patients show unaltered neural responses contradicts prominent views from PC identifying this region as a generator of the MMN and a source of predictions sent to temporal auditory areas.

      We appreciate the reviewer's comment and want to acknowledge that another reviewer raised this concern previously. We have provided a detailed response to this issue in our previous response (see Response to Reviewer #1 Comment 4). We have expanded the “Discussion” to address this point by adding a new section titled “Lack of findings in the lateral PFC lesion group” [page 21]. In this section, we first present some of the findings implicating specific areas of the lateral PFC in the generation of MMN and in predictive processing, and then offer an account of the potential reasons behind the lack of neurophysiological differences between the lateral PFC and control groups.

      For these reasons, a more critical view on the extent to which the findings support hierarchical predictive coding is needed.

      By responding to the reviewer’s previous comments (i.e., the reasons why the reviewer thinks it is difficult to determine the functional meaning of the EEG findings), we believe that we have offered a more critical view on this matter.

      References

      Alho, K., Woods, D. L., Algazi, A., Knight, R., & Näätänen, R. (1994). Lesions of frontal cortex diminish the auditory mismatch negativity. Electroencephalography and clinical neurophysiology, 91(5), 353-362.

      Bates, E., Wilson, S. M., Saygin, A. P., Dick, F., Sereno, M. I., Knight, R. T., & Dronkers, N. F. (2003). Voxel-based lesion–symptom mapping. Nature neuroscience, 6(5), 448-450.

      Bekinschtein, T. A., Dehaene, S., Rohaut, B., Tadel, F., Cohen, L., & Naccache, L. (2009). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences, 106(5), 1672-1677.

      Chao, L., Nielsen-Bohlman, L., & Knight, R. (1995). Auditory event-related potentials dissociate early and late memory processes. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 96(2), 157-168.

      Chao, Z. C., Takaura, K., Wang, L., Fujii, N., & Dehaene, S. (2018). Large-scale cortical networks for hierarchical prediction and prediction error in the primate brain. Neuron, 100(5), 1252-1266. e1253.

      Chennu, S., Noreika, V., Gueorguiev, D., Blenkmann, A., Kochen, S., Ibánez, A., Owen, A. M., & Bekinschtein, T. A. (2013). Expectation and attention in hierarchical auditory prediction. Journal of Neuroscience, 33(27), 11194-11205.

      Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human brain: from environment to theory of mind. Neuron, 58(3), 306-324.

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215.

      Del Cul, A., Baillet, S., & Dehaene, S. (2007). Brain dynamics underlying the nonlinear threshold for access to consciousness. PLoS biology, 5(10), e260.

      Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of Psychophysiology, 21(3-4), 188-203.

      Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behavioral and brain sciences, 11(3), 357-374.

      Doricchi, F., Pinto, M., Pellegrino, M., Marson, F., Aiello, M., Campana, S., Tomaiuolo, F., & Lasaponara, S. (2021). Deficits of hierarchical predictive coding in left spatial neglect. Brain communications, 3(2), fcab111.

      Dürschmid, S., Edwards, E., Reichert, C., Dewar, C., Hinrichs, H., Heinze, H.-J., Kirsch, H. E., Dalal, S. S., Deouell, L. Y., & Knight, R. T. (2016). Hierarchy of prediction errors for auditory events in human temporal and frontal cortex. Proceedings of the National Academy of Sciences, 113(24), 6755-6760.

      El Karoui, I., King, J.-R., Sitt, J., Meyniel, F., Van Gaal, S., Hasboun, D., Adam, C., Navarro, V., Baulac, M., & Dehaene, S. (2015). Event-related potential, time-frequency, and functional connectivity facets of local and global auditory novelty processing: an intracranial study in humans. Cerebral cortex, 25(11), 4203-4212.

      Escera, C., & Corral, M. (2007). Role of mismatch negativity and novelty-P3 in involuntary auditory attention. Journal of psychophysiology, 21(3-4), 251-264.

      Fischer, C., Luauté, J., Adeleine, P., & Morlet, D. (2004). Predictive value of sensory and cognitive evoked potentials for awakening from coma. Neurology, 63(4), 669-673.

      Friston, K. (2005). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological sciences, 360(1456), 815-836.

      Garrido, M. I., Friston, K. J., Kiebel, S. J., Stephan, K. E., Baldeweg, T., & Kilner, J. M. (2008). The functional anatomy of the MMN: a DCM study of the roving paradigm. Neuroimage, 42(2), 936-944.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2007). Evoked brain responses are generated by feedback loops. Proceedings of the National Academy of Sciences, 104(52), 20961-20966.

      Garrido, M. I., Kilner, J. M., Kiebel, S. J., & Friston, K. J. (2009). Dynamic causal modeling of the response to frequency deviants. Journal of Neurophysiology, 101(5), 2620-2631.

      Holroyd, C. B., & Coles, M. G. (2002). The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 109(4), 679.

      Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V. (1993). Magnetoencephalography—theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of modern Physics, 65(2), 413.

      Kakade, S., & Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks, 15(4-6), 549-559.

      Knight, R. T. (1984). Decreased response to novel stimuli after prefrontal lesions in man. Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section, 59(1), 9-20.

      Knight, R. T. (1997). Distributed cortical network for visual attention. Journal of Cognitive Neuroscience, 9(1), 75-91.

      Knight, R. T., & Scabini, D. (1998). Anatomic bases of event-related potentials and their relationship to novelty detection in humans. Journal of clinical neurophysiology, 15(1), 3-13.

      Kompus, K., Volehaugen, V., Todd, J., & Westerhausen, R. (2020). Hierarchical modulation of auditory prediction error signaling is independent of attention. Cognitive neuroscience, 11(3), 132-142.

      Kutas, M., Kiang, M., & Sweeney, K. (2012). Potentials and Paradigms: Event‐Related Brain Potentials and Neuropsychology. The handbook of the neuropsychology of language, 1, 543-564.

      Liaukovich, K., Ukraintseva, Y., & Martynova, O. (2022). Implicit auditory perception of local and global irregularities in passive listening condition. Neuropsychologia, 165, 108129.

      Lieder, F., Daunizeau, J., Garrido, M. I., Friston, K. J., & Stephan, K. E. (2013). Modelling trial-by-trial changes in the mismatch negativity. PLoS computational biology, 9(2), e1002911.

      Lorca-Puls, D. L., Gajardo-Vidal, A., White, J., Seghier, M. L., Leff, A. P., Green, D. W., Crinion, J. T., Ludersdorfer, P., Hope, T. M., & Bowman, H. (2018). The impact of sample size on the reproducibility of voxel-based lesion-deficit mappings. Neuropsychologia, 115, 101-111.

      Løvstad, A., & Cawley, P. (2011). The reflection of the fundamental torsional guided wave from multiple circular holes in pipes. Ndt & E International, 44(7), 553-562.

      Løvstad, M., Funderud, I., Lindgren, M., Endestad, T., Due-Tønnessen, P., Meling, T., Voytek, B., Knight, R. T., & Solbakk, A.-K. (2012). Contribution of subregions of human frontal cortex to novelty processing. Journal of Cognitive Neuroscience, 24(2), 378-395.

      Naccache, L., Puybasset, L., Gaillard, R., Serve, E., & Willer, J.-C. (2004). Auditory mismatch negativity is a good predictor of awakening in comatose patients: a fast and reliable procedure. Clinical neurophysiology: official journal of the International Federation of Clinical Neurophysiology, 116(4), 988-989.

      Nieuwenhuis, S., Aston-Jones, G., & Cohen, J. D. (2005). Decision making, the P3, and the locus coeruleus--norepinephrine system. Psychological bulletin, 131(4), 510.

      Noonan, M., Kolling, N., Walton, M., & Rushworth, M. (2012). Re‐evaluating the role of the orbitofrontal cortex in reward and reinforcement. European Journal of Neuroscience, 35(7), 997-1010.

      Nourski, K. V., Steinschneider, M., Rhone, A. E., Kawasaki, H., Howard III, M. A., & Banks, M. I. (2018). Processing of auditory novelty across the cortical hierarchy: An intracranial electrophysiology study. Neuroimage, 183, 412-424.

      Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clinical neurophysiology, 115(1), 140-144.

      Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). ‘Primitive intelligence’in the auditory cortex. Trends in neurosciences, 24(5), 283-288.

      Padoa-Schioppa, C., & Assad, J. A. (2006). Neurons in the orbitofrontal cortex encode economic value. Nature, 441(7090), 223-226.

      Pegado, F., Bekinschtein, T., Chausson, N., Dehaene, S., Cohen, L., & Naccache, L. (2010). Probing the lifetimes of auditory novelty detection processes. Neuropsychologia, 48(10), 3145-3154.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Bekinschtein, T. A., & Rowe, J. B. (2015). Hierarchical organization of frontotemporal networks for the prediction of stimuli across multiple dimensions. Journal of Neuroscience, 35(25), 9255-9264.

      Phillips, H. N., Blenkmann, A., Hughes, L. E., Kochen, S., Bekinschtein, T. A., & Rowe, J. B. (2016). Convergent evidence for hierarchical prediction networks from human electrocorticography and magnetoencephalography. cortex, 82, 192-205.

      Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clinical neurophysiology, 118(10), 2128-2148.

      Rosburg, T., Trautner, P., Dietl, T., Korzyukov, O. A., Boutros, N. N., Schaller, C., Elger, C. E., & Kurthen, M. (2005). Subdural recordings of the mismatch negativity (MMN) in patients with focal epilepsy. Brain, 128(4), 819-828.

      Rugg, M. D. (1995). Event-related potential studies of human memory. Schomaker, J., Roos, R., & Meeter, M. (2014). Expecting the unexpected: The effects of deviance on novelty processing. Behavioral neuroscience, 128(2), 146.

      Schultz, W. (2015). Neuronal reward and decision signals: from theories to data. Physiological reviews, 95(3), 853-951.

      Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual review of neuroscience, 23(1), 473-500.

      Sculthorpe, L. D., Stelmack, R. M., & Campbell, K. B. (2009). Mental ability and the effect of pattern violation discrimination on P300 and mismatch negativity. Intelligence, 37(4), 405-411.

      Sergent, C., Baillet, S., & Dehaene, S. (2005). Timing of the brain events underlying access to consciousness during the attentional blink. Nature neuroscience, 8(10), 1391-1400.

      Seymour, B., O'Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., Friston, K. J., & Frackowiak, R. S. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429(6992), 664-667.

      Stalnaker, T. A., Cooch, N. K., & Schoenbaum, G. (2015). What the orbitofrontal cortex does not do. Nature neuroscience, 18(5), 620-627.

      Strauss, M., Sitt, J. D., King, J.-R., Elbaz, M., Azizi, L., Buiatti, M., Naccache, L., Van Wassenhove, V., & Dehaene, S. (2015). Disruption of hierarchical predictive coding during sleep. Proceedings of the National Academy of Sciences, 112(11), E1353-E1362.

      Sul, J. H., Kim, H., Huh, N., Lee, D., & Jung, M. W. (2010). Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron, 66(3), 449-460.

      Swick, D. (2005). 13 ERPs in Neuropsychological Populations. Event-related potentials: A methods handbook, 299.

      Swaab, T. Y. (1998). Event-related potentials in cognitive neuropsychology: Methodological considerations and an example from studies of aphasia. Behavior Research Methods, Instruments, & Computers, 30(1), 157-170.

      Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372(6501), 90-92.

      Tobler, P. N., O’Doherty, J. P., Dolan, R. J., & Schultz, W. (2006). Human neural learning depends on reward prediction errors in the blocking paradigm. Journal of Neurophysiology, 95(1), 301-310.

      Tremblay, L., & Schultz, W. (1999). Relative reward preference in primate orbitofrontal cortex. Nature, 398(6729), 704-708.

      Uhrig, L., Dehaene, S., & Jarraya, B. (2014). A hierarchy of responses to auditory regularities in the macaque brain. Journal of Neuroscience, 34(4), 1127-1132.

      Ungan, P., Karsilar, H., & Yagcioglu, S. (2019). Pre-attentive mismatch response and involuntary attention switching to a deviance in an earlier-than-usual auditory stimulus: an ERP study. Frontiers in Human Neuroscience, 13, 58.

      Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proceedings of the National Academy of Sciences, 108(51), 20754-20759.

      Walton, M. E., Behrens, T. E., Buckley, M. J., Rudebeck, P. H., & Rushworth, M. F. (2010). Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron, 65(6), 927-939.

      Walton, M. E., Behrens, T. E., Noonan, M. P., & Rushworth, M. F. (2011). Giving credit where credit is due: orbitofrontal cortex and valuation in an uncertain world. Annals of the New York Academy of Sciences, 1239(1), 14-24.

      Wessel, J. R., Danielmeier, C., Morton, J. B., & Ullsperger, M. (2012). Surprise and error: common neuronal architecture for the processing of errors and novelty. Journal of Neuroscience, 32(22), 7528-7537.

      Wessel, J. R., Klein, T. A., Ott, D. V., & Ullsperger, M. (2014). Lesions to the prefrontal performance-monitoring network disrupt neural processing and adaptive behaviors after both errors and novelty. Cortex, 50, 45-54.

      Yamaguchi, S., & Knight, R. (1991). Anterior and posterior association cortex contributions to the somatosensory P300. Journal of Neuroscience, 11(7), 2039-2054.

    1. Author Response

      Reviewer #2 (Public Review):

      Major weaknesses:

      1) The biggest weakness of the manuscript is the lack of appropriate explanation and interpretation of these observed cyclin D1 ubiquitination and degradation by at least five different combinations of Cullin-E3 ligases. Are all the five cullin-E3 combinations exclusive and/or redundant to each other for cyclin D1 ubiquitination? What are the speculations in terms of the underlying mechanism? At least a working model should be included to better interpret the data.

      Cyclin D1 has been recognized as an oncogene, which is upregulated in multiple types of cancers. In different types of cells, different E3 ligase may be involved in the process of cyclin D1 protein degradation. Even in the same cells, multiple E3 ligases may be involved in cyclin D1 degradation to make sure that steady-state protein levels of cyclin D1 are under surveillance and fine-tune regulation.

      2) Although a phosphorylation-mutant cyclin D1 (i.e., T286) was included in the manuscript, there is no Lysine residue mutant within cyclin D1 identified and characterized for the critical function of cyclin D1 ubiquitination.

      It was reported that Lysine 269 is essential for cyclin D1 ubiquitination (Barbash et al., 2009). WT or mutant cyclin D1 (K269R) expression plasmids were co-transfected with Keap1, DDB2, and AMBRA1 expression plasmids into HEK293 cells. 48 hours after transfection, changes in cyclin D1 protein levels were detected by the Western blot analysis. We found the expression of WT cyclin D1 was decreased in HEK293 cells with Keap1, DDB2, and AMBRA1 co-transfected, while the expression of K269R mutant cyclin D1 showed no significant decrease in rhe cells co-transfected with co-transfected Keap1, DDB2, and AMBRA1, suggesting that Lysine 269 is essential for cyclin D1 ubiquitination.

      3) The significance of these different Cullin 1-7 and associated E3 ligases (Keap1-CUL3, DDB2-CUL4A/4B, WSB2-CUL2/5, and RBX1-CUL1-7) in cyclin D1 ubiquitination is mainly determined by siRNA-mediated knockdown or overexpression of target cullin/E3 proteins. However, it is not clear whether the observed phenotypes of cyclin D1 are due to these cullin-E3 ligases directly or indirectly. In vitro ubiquitination assay with E1, E2, and E3 should be performed to demonstrate whether recombinant cyclin D1 is ubiquitinated.

      We have performed in vitro ubiquitination assay as the reviewer suggested. The results demonstrated that Keap1, DDB2, and WSB2 can induce cyclin D1 ubiquitination. Especially, Keap1 induced cyclin D1 ubiquitination and formed ubiquitination ladder similar to AMBRA1-induced cyclin D1 ubiquitination ladder. In contrast, no clear ubiquitination ladder was observed in Rbx1 group (Figure S16).

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very exciting manuscript from Meng Wang's lab on lysosomal proteomics. They used several different protein tags to identify the lysosomal proteome. The exciting findings include A) specific lysosomal proteins exist in a tissue-specific manner B) lipl-4 overexpression and daf-2 extend life span using different mechanisms C) identification of novel lysosomal proteins D) demonstration of the function of several lysosomal proteins in regulation lysosome abundance and function.

      We thank the reviewer for finding our manuscript exciting.

      Reviewer #2 (Public Review):

      In this manuscript, Yu and colleagues profile the lysosome content in C. elegans. They implement lysosome immunoprecipitation (Lyso-IP) for C. elegans and they convincingly show that this method successfully isolates lysosomes from whole worms. The authors find that the lysosomes of worms overexpressing the lysosomal lipase lipl4 are enriched for AMPK subunits and nucleoporins and that these proteins are required for the longevity of lipl-4 overexpressing worms. The authors also show that this is specific to this longevity pathway given that another long-lived worm strain (daf2) does not exhibit enrichment for nucleoporins nor does it require them for longevity. The authors go on to express the Lyso-IP tag in different tissues of C. elegans (muscle, hypodermis, intestine, neurons) and identify the tissue-specific lysosome proteomes. Finally, the authors use this method to identify lysosome proteins in mature lysosomes and they find new proteins that regulate lysosomal acidification.

      The authors present a powerful tool to unbiasedly identify lysosome-associated proteins in C. elegans, and they provide an in-depth assessment of how this method can be used to understand longevity pathways and identify novel proteins. Understanding lysosomal differences in specific tissues or in response to different longevity conditions are exciting as it provides new insight into how organelles could control specific homeostasis responses. This tool and proteomics datasets also represent a great resource for the C. elegans community and should pry open new studies on the regulation and role of the lysosome at the organismal level.

      We truly appreciate that the reviewer’s positive comment on our work.

      Addressing the following suggestions would help strengthen this already strong manuscript. First, it would be helpful to validate selected candidates from the tissuespecific Lyso-IP to verify that the protocol is still specific with lower sample amounts. Second, it would be helpful to provide more details on the methods, notably for sample preparation and analysis, so that it can serve as a guideline for the community. Third, the manuscript contains a lot of data and conditions, which is great, but they may also feel disconnected in some cases and it could be helpful to focus the study on the main key findings.

      We thank the reviewer’s comments. As suggested by the reviewer, we have also generated a CRISPR knock-in line for one hypodermis-specific candidate Y58A7A.1 that encodes a copper transporter and validated its hypodermis-specific lysosomal localization (new Supplementary Figure 2E).

      As suggested by the reviewer, we have extended the method section on Lyso-IP to include more details. We believe that the new version should be sufficient for any lab to follow this protocol and conduct their own analyses. We will also take advantage of the eLife “Request a Protocol” feature to share the detailed version of the Lyso-IP method with researchers who are interested.

      We have thoroughly reorganized the manuscript to increase the textual clarity and improve the connection between different analyses and results.

      Reviewer #3 (Public Review):

      The manuscript by Ji et al dissects the important role of lysosomes in cellular metabolism and signaling and their regulation by various associated proteins. The authors utilized deep proteomic profiling in C.Elegans to identify lysosome-associated proteins involved in regulating longevity and discovered the recruitment of AMPK and nucleoporin proteins in response to increased lysosomal lipolysis. Additionally, the authors found lysosomal heterogeneity across different tissues and specific enrichment of the Ragulator complex on Cystinosin-positive lysosomes.

      Strengths of this work include the utilization of deep proteomic profiling to identify novel lysosome-associated proteins involved in longevity regulation, as well as the discovery of lysosomal heterogeneity and specific protein enrichments across different worm tissues. These findings point to a complex interplay between lysosomal protein dynamics, signal transduction, organelle crosstalk, and organism longevity.

      One weakness of this work may be the limited scope of the study, as it focuses primarily on the identification and characterization of lysosome-associated proteins involved in longevity regulation, with limited mechanistic follow-up and some unsubstantiated claims.

      We thank the reviewer for her/his helpful comments and suggestions. The primary goal of this manuscript is to provide new methods and resource to the community. We did have several biological findings from the current study, and mechanistic follow-up with these findings will be interesting future topics but may beyond the scope of the current manuscript. In addition, we have provided new experimental results to further support several claims that the reviewer has commented on.

    1. Author Response

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”.

      We also thank them for a careful reading and useful comments to improve the manuscript. We will build on this input to provide an improved version of the manuscript that will hope to submit soon to eLife along with our point by point answer.

    1. Author Response

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We will work to address each comment and suggestion offered by the Reviewers in a revision.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computer-simulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the range-expansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and male-biased dispersal system, as we discuss in L254–265. Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We will make sure to better introduce this important conceptual information in our revision.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (informationupdating) and lambda (risk-sensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions (we note this does not imply that the two cannot influence one another i.e., co-vary on the latent scale). Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we will incorporate into our revision, clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach. We will do better in our revision. As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We will work to make the above points on the insight afforded by agent-based forward simulation explicitly clear in our revision.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might much-needed population replicates—see L270), but our Bayesian models still allow us to learn a lot from our current data.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L53–56 we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We will work towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65, may apply to animals inhabiting urban environments more broadly.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not specifying that the review we cite in L42 by Lee & Thornton (2021) covers additional studies on cognition in both urban invasive species as well as urban-dwellers versus nonurban counterparts—we will remedy this omission in our revision. We will also revise our labelling of the lizard species. We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urban-dwelling and non-urban counterparts. Finally, the Reviewer’s general suggestion is a good one—we will work to add this biological clarity to our revision.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We will take care in our revision to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect meaningful behavioural or mechanistic population-level differences in grackles’ learning. Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we will revise our wording. As far as how our evolutionary results relate to the rest of the paper, these results suggest successful urban living should favour risk-sensitive learning, and our other analyses in our paper reveal male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—show pronounced risk-sensitive learning, so it appears risk-sensitive learning is a winning strategy for urban-invading male grackles and urban-invasion leaders more generally (we note, of course, other factors undoubtedly contribute to grackles’ urban invasion success, as discussed in ‘Ideas and speculation’; see also our first response to R1). We will work to make these links clearer in our revision. Finally, please see our above response on the inferential sufficiency of our sample size.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. In our revision, we will work to add further clarity, and to temper our tone.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript tried to answer a long-standing question in an important research topic. I read it with great interest. The quality of the science is high, and the text is clearly written. The conclusion is exciting. However, I feel that the phenotype of the transgenic line may be explained by an alternative idea. At least, the results should be more carefully discussed.

      We thank the reviewer #1 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions provided by the reviewer. Here is a point-by-point response to the reviewer's specific and other minor comments.

      Specific comments:

      1) Stability or activity (Fv/Fm) was not affected in PSII with the W14F mutation in D1. If W14F really represents the status of PSII with oxidized D1, what is the reason for the degradation of almost normal D1?

      In this study, we used W14F mutation to mimic Trp-14 oxidation. The W14F mutant did not affect the stability and photosynthetic activity under normal growth conditions. However, the W14F mutant showed increased D1 degradation and reduced Fv/Fm values under high light. These results suggested that the W14F mutant has almost normal D1 protein stability under growth light conditions, as pointed out by the reviewer.

      However, it should be noted that D1 protein in the W14F strain rapidly degraded under high light. In the discussion part, we mentioned the possibility that other OPTMs may have additive effects on D1 degradation. Synergistic effects such as different amino acid oxidations may cause D1 degradation, and among those oxidative damages, W14 oxidation would be a key signal for D1 degradation by FtsH.

      2) To focus on the PSII in which W14 is oxidized, this research depends on the W14F mutant lines. It is critical how exactly the W-to-F substitution mimics the oxidized W. The authors tried to show it in Figure 5. Because of the technical difficulty, it may be unfair to request more evidence. But the paper would be more convincing with the results directly monitoring the oxidized D1 to be recognized by FtsH.

      We agree that confirming the direct interaction of oxidized D1 protein with FtsH provides more robust evidence. However, since FtsH progressively degrades the trapped substrate, it would be quite a challenging attempt to capture that moment. There are also technical limitations to obtaining sufficient substrate using Co-IP to compare its oxidation state. We included your suggested point in the discussion part. Thank you for your valuable suggestion.

      3) Figure 3. If the F14 mimics the oxidized W14 and is sensed by FtsH, I would expect the degradation of D1 even under the growth light. The actual result suggests that W14F mutation partially modifies the structure of D1 under high light and this structural modification of D1 is sensed by FtsH. Namely, high light may induce another event which is recognized by FtsH. The W14F is just an enhancer.

      Our results indicated that W14 oxidation is one of the keys to D1 degradation. On the other hand, we agree with the possibility that the reviewer points out. There is the possibility that factors other than W14 may act synergistically to promote D1 degradation. High light triggered more D1 degradation in W14F, suggesting that unknown factor(s) may be required for D1 degradation, e.g., oxidative modification at other sites and/or conformational changes of PSII under the high light. However, the current data that we have cannot reveal. We have incorporated the reviewer's comment and discussed it in the discussion part.

      Reviewer #2 (Public Review):

      In their manuscript, Kato et al investigate a key aspect of membrane protein quality control in plant photosynthesis. They study the turnover of plant photosystem II (PSII), a hetero-oligomeric membrane protein complex that undertakes the crucial light-driven water oxidation reaction in photosynthesis. The formidable water oxidation reaction makes PSII prone to photooxidative damage. PSII repair cycle is a protein repair pathway that replaces the photodamaged reaction center protein D1 with a new copy. The manuscript addresses an important question in PSII repair cycle - how is the damaged D1 protein recognized and selectively degraded by the membrane-bound ATP-dependent zinc metalloprotease FtsH in a processive manner? The authors show that oxidative post-translational modification (OPTM) of the D1 N-terminus is likely critical for the proper recognition and degradation of the damaged D1 by FtsH. Authors use a wide range of approaches and techniques to test their hypothesis that the singlet oxygen (1O2)-mediated oxidation of tryptophan 14 (W14) residue of D1 to N-formylkynurenine (NFK) facilitates the selective degradation of damaged D1. Overall, the authors propose an interesting new hypothesis for D1 degradation and their hypothesis is supported by most of the experimental data provided. The study certainly addresses an elusive aspect of PSII turnover and the data provided go some way in explaining the light-induced D1 turnover. However, some of the data are correlative and do not provide mechanistic insight. A rigorous demonstration of OPTM as a marker for D1 degradation is yet to be made in my opinion. Some strengths and weaknesses of the study are summarized below:

      We thank reviewer #2 for his/her comments that helped to improve the manuscript. We have incorporated changes to reflect the suggestions pointed out as weaknesses by reviewer #2. Other minor comments were also answered in a point-by-point response.

      Strengths:

      1) In support of their hypothesis, the authors find that FtsH mutants of Arabidopsis have increased OPTM, especially the formation of NFK at multiple Trp residues of D1 including the W14; a site-directed mutation of W14 to phenylalanine (W14F), mimicking NFK, results in accelerated D1 degradation in Chlamydomonas; accelerated D1 degradation of W14F mutant is mitigated in an ftsH1 mutant background of Chlamydomonas; and that the W14F mutation augmented the interaction between FtsH and the D1 substrate.

      2) Authors raise an intriguing possibility that the OPTM disrupts the hydrogen bonding between W14 residue of D1 and the serine 25 (S25) of PsbI. According to the authors, this leads to an increased fluctuation of the D1 N-terminal tail, and as a consequence, recognition and binding of the photodamaged D1 by the protease. This is an interesting hypothesis and the authors provide some molecular dynamics simulation data in support of this. If this hypothesis is further supported, it represents a significant advancement.

      3) The interdisciplinary experimental approach is certainly a strength of the study. The authors have successfully combined mass spectrometric analysis with several biochemical assays and molecular dynamics simulation. These, together with the generation of transplastomic algal cell lines, have enabled a clear test of the role of Trp oxidation in selective D1 degradation.

      4) Trp oxidative modification as a degradation signal has precedent in chloroplasts. The authors cite the case of 1O2 sensor protein EXECUTER 1 (EX1), whose degradation by FtsH2, the same protease that degrades D1, requires prior oxidation of a Trp residue. The earlier observation of an attenuated degradation of a truncated D1 protein lacking the N-terminal tail is also consistent with authors' suggestion of the importance of the D1 N-terminus recognition by FtsH. It is also noteworthy that in light of the current study, D1 phosphorylation is unlikely to be a marker for degradation as posited by earlier studies.

      Weaknesses:

      1) The study lacks some data that would have made the conclusions more rigorous and convincing. It is unclear why the level of Trp oxidation was not analyzed in the Chlamydomonas ftsH 1-1 mutant as done for the var 2 mutant. Increased oxidation of W14 OPTM in Chlamydomonas ftsH 1-1 is a key prediction of the hypothesis.

      We thank the reviewer for this valuable comment. We agree with the reviewer that the analysis of oxidized Trp level will reinforce the importance of Trp oxidation in the N-terminal of D1. In our preliminary experiment, we observed a trend toward increase of the kynurenine in Trp-14 in Chlamydomonas ftsH1-1 strain. However, we found large errors, and we could not conclude that this trend is significant. A possible reason for the large error was that the signal intensity of oxidized Trp was insufficient for quantification in a series of Chlamydomonas experiment. In addition, the fact that the amount of D1 in each culture was not stable also might be one reason. On the other hand, we keep note of a previous result that more fragmentation of D1 protein was observed in the Chlamydomonas ftsH1-1 mutant compared to that in Arabidopsis (Malnoë et al., Plant Cell 2014). This result suggests that an alternative D1 degradation pathway involving other proteases is more active in the Chlamydomonas ftsH1-1 mutant than in Arabidopsis var2 mutant. Furthermore, the Chlamydomonas ftsH1-1 mutant, caused by an amino acid substitution, still has a significant FtsH1/FtsH2 heterohexamer, and the level of FtsH1 and FtsH2 proteins increases significantly under high light irradiation. This is a significant difference from the Arabidopsis var2 mutant lacking FtsH2 subunit and showed reduced protein accumulation. These factors may explain to the lower detection levels of oxidized Trp in Chlamydomonas. We believe that improved sensitivity for detection of oxidized Trp peptides and more sophisticated experimental systems could solve this issue in the future.

      It is also unclear to me what is the rationale for showing D1-FtsH interaction data only for the double mutant but not for the single mutant (W14F).

      We thank the reviewer for the comment. As suggested by the reviewer, the analysis of the mutant crossing ftsH and W14F single mutation will provide more convincing evidence. Fig.3 showed that the photosensitivity in both W14F and W14FW317F was caused by the enhanced D1 degradation observed, which was due to the W14F mutation. Therefore, we crossed the ftsH mutant with W14FW317F, which has a more severe phenotype, to confirm whether FtsH is involved in this D1 degradation.

      Why is the FtsH pulldown of D2 not statistically significant (p value = {less than or equal to}0.1). Wouldn't one expect FtsH pulls down the RC47 complex containing D1, D2, and RC47. Probing the RC47 level would have been useful in settling this.

      For the immunoblot result of D2 and its statistical analysis, we answered in the following comment; No.2 in the reviewer's comment in Recommendations For The Authors.

      We agree with the reviewer's suggestion that further immunoblot analysis for CP47 protein would help our understanding of FtsH and RC47 interaction. Indeed, we attempted the immunoblot analysis of CP47 after the FtsH Co-IP experiment. However, the detection of CP43 protein was not sensitive enough. This reason may be due to the lower titer of the CP47 antibody compared to the D1 and D2 antibodies.

      A key proposition of the authors' is that the hydrogen bonding between D1 W14 and S25 of PsbI is disrupted by the oxidative modification of W14. Can this hypothesis be further tested by replacing the S25 of PsbI with Ala, for example?

      It is an interesting question whether amino acid substitution in PsbI-S25 affects the stability of D1-N-term and its degradation by FtsH. We would like to analyze the possibility in the future. We thank the reviewer for this helpful suggestion.

      2) Although most of the work described is in vivo analysis, which is desirable, some in vitro degradation assays would have strengthened the conclusions. An in vitro degradation assay using the recombinant FtsH and a synthetic peptide encompassing D1 N-terminus with and without OPTM will test the enhanced D1 degradation that the authors predict. This will also help to discern the possibility that whether CP43 detachment alone is sufficient for D1 degradation as suggested for cyanobacteria.

      In vitro experimental systems are interesting. However, FtsH is known to function as a hexamer, which has not yet been successfully reconstituted in vitro. Therefore, it would not be easy to perform an in vitro experimental system using the N-terminal synthetic peptide of D1 as a substrate. Thank you for your valuable suggestions.

      3) The rationale for analyzing a single oxidative modification (W14) as a D1 degradation signal is unclear. D1 N-terminus is modified at multiple sites. Please see Mckenzie and Puthiyaveetil, bioRxiv May 04 2023. Also, why is modification by only 1O2 considered while superoxide and hydroxide radicals can equally damage D1?

      We agree with the possibility that oxidative modifications in other amino acids are also involved in the D1 degradation, as pointed out by the reviewer. We also thank the reviewer for pointing us to the interesting article of Mckenzie and Puthiyaveetil et al. that showed additional oxidations occurred in the D1-Nterminus, which we had yet to be aware of when we submitted our manuscript. It will be interesting to see how these amino acid oxidations work with W14 oxidation on D1 degradation in the future. The oxidation of Trp by 1O2 can serve as a substrate for FtsH, as in the case of EX1, so we focused on the analysis of Trp oxidation. Single oxygen is believed to be the potential reactive species of Trp oxidation. However, the detected oxidative modifications in this study were not exactly sure depended on singlet oxygen. Thus, we changed several sentences that mention tryptophan oxidation by single oxygen.

      4) The D1 degradation assay seems not repeatable for the W14F mutant. High light minus CAM results in Fig. 3 shows a statistically significant decrease in D1 levels for W14F at multiple time points but the same assay in Fig. 4a does not produce a statistically significant decrease at 90 min of incubation. Why is this? Accelerated D1 degradation in the Phe mutant under high light is key evidence that the authors cite in support of their hypothesis.

      In Fig. 4a, the p-value comparing the D1 level at 90 min between control and W14F was 0.1075. This value is slightly larger than 0.1. The result that one of the control experiments showed a decrease in D1 level relative to 0 h might cause this value. Given that the D1 level of the remaining three of the four replicates was unchanged in the control experiments, it can be considered an outlier. We believe the results do not affect our hypothesis that the earlier D1 degradation is occurred in W14F.

      5) The description of results at times is not nuanced enough, for e.g. lines 116-117 state "The oxidation levels in Trp-14 and Trp-314 increased 1.8-fold and 1.4-fold in var2 compared to the wild type, respectively (Fig. 1c)" while an inspection of the figure reveals that modification at W314 is significant only for NFK and not for KYN and OIA.

      In this sentence, we described the result that is compared with the oxidized peptide levels calculated from all Trp-oxidized derivatives. However, as pointed out by the reviewer, it was not correct to explain the result of Fig.1C. We corrected the sentence following the reviewer's suggestion as below;“The levels of Trp-oxidized derivatives, OIA, NFK, and KYN in Trp-14 and the level of KYN in Trp-314 were significantly increased in var2 compared to the wild type, respectively (Fig. 1c). "

      Likewise, the authors write that CP43 mutant W353F has no growth phenotype under high light but Figure S6 reveals otherwise. The slow growth of this mutant is in line with the earlier observation made by Anderson et al., 2002.

      As pointed out by the reviewer, the growth of W353F seems to be a little slow under HL. We have changed our description of the result part. However, we still conclude that CP43 had little impact on the PSII repair, because the impaired growth in W353F is not as severe as those in W14F and W14F/W317F under HL

      In lines 162-163, the authors talk about unchanged electron transport in some site-directed mutants and cite Fig. 2c but this figure only shows chl fluorescence trace and nothing else.

      We agreed with the reviewer's suggestion and changed the sentence. In this study, we did not perform detailed photosynthetic analysis. Based on the analysis of phototrophic growth, oxygen-evolving activity, and Chl fluorescence, we concluded that overall photosynthetic activity was not a significant difference in the mutants.

      6) The authors rightly discuss an alternate hypothesis that the simple disassembly of the monomeric core into RC47 and CP43 alone may be sufficient for selective D1 degradation as in cyanobacteria. This hypothesis cannot yet be ruled out completely given the lack of some in vitro degradation data as mentioned in point 2. Oxidative protein modification indeed drives the disassembly of the monomeric core (Mckenzie and Puthiyaveetil, bioRxiv May 04 2023).

      Thanks for your suggestion. We added a discussion of PSII disassembly by ROS-induced oxidation to the discussion part, and the reference is added.

      Reviewer #3 (Public Review):

      Light energy drives photosynthesis. However, excessive light can damage (i.e., photo-damage) and thus inactivate the photosynthetic process. A major target site of photo-damage is photosystem II (PSII). In particular, one component of PSII, the reaction center protein, D1, is very suspectable to photo-damage, however, this protein is maintained efficiently by an elaborate multi-step PSII-D1 turnover/repair cycle. Two proteases, FtsH and Deg, are known to contribute to this process, respectively, by efficient degradation of photo-damaged D1 protein processively and endoproteolytically. In this manuscript, Kato et al., propose an additional step (an early step) in the D1 degradation/repair pathway. They propose that "Tryptophan oxidation" at the N-terminus of D1 may be one of the key oxidations in the PSII repair, leading to processive degradation of D1 by FtsH. Both, their data and arguments are very compelling.

      The D1 protein repair/degradation pathway in its simplest form can be defined essentially by five steps: (1) migration of damaged PSII core complex to the stroma thylakoid, (2) partial PSII disassembly of the PSII core monomer, (3) access of protease degrading damaged D1, (4) concomitant D1 synthesis, and (5) reassembly of PSII into grana thylakoid. An enormous amount of work has already been done to define and characterize these various steps. Kato et al., in this manuscript, are proposing a very early yet novel critical step in D1 protein turnover in which Tryptophan(Trp) oxidation in PSII core proteins influences D1 degradation mediated by FtsH.

      Using a variety of approaches, such as mass-spectrometry (Table 1), site-directed mutagenesis (Figures 2-4), D1 degradation assays (Figures 3, and 4), and simulation modeling (Figure 5), Kato et al., provide both strong evidence and reasonable arguments that an N-terminal Trp oxidation may be likely to be a 'key' oxidative post-translational modification (OPTM) that is involved in triggering D1 degradation and thus activating the PSII repair pathway. Consequently, from their accumulated data, the authors propose a scenario in which the unraveling of the N-terminal of the D1 protein facilitated by Trp oxidation plays a critical 'recognition' role in alerting the plant that the D1 protein is photo-damaged and thus to kick start the processive degradation pathway initiated possibly by FtsH. Coincidently, Forsman and Eaton-Rye (Biochemistry 2021, 60, 1, 53-63), while working with the thermophilic cyanobacterium, Thermosynechococcus vulcanus, showed that when the N-terminal DE-loop of the D1 protein is photo-damaged that occurs which may serve as a signal for PSII to undergo repair following photodamage. While the activation of the processive degradation pathways in Chlamydomonas versus Thermosynechococcus vulcanus have significant mechanistic differences, it's interesting to note and speculate that the stability of the N-terminal of their respective D1 proteins seems to play a critical role in 'signaling' the PSII repair system to be activated and initiate repair. But it's complicated. For instance, significant Trp oxidation also occurs on the lumen side of other PSII subunits which may also play a significant role in activating the repair processes as well. Indeed, Kato et al.,( Photosynthesis Research volume 126, pages 409-416 (2015)) proposed a two-step model whereby the primary event is disruption of a Mn-cluster in PSII on the lumen side.

      A secondary event is damage to D1 caused by energy that is absorbed by chlorophyll. But models adapt, change, and get updated. And the data provided by Kato et al., in this manuscript, gives us a unique glimpse/snapshot into the importance of the stability of the N-terminal during photo-damage and its role in D1-turnover. For instance, the author's use site-directed mutagenesis of Trp residues undergoing OPTM in the D1 protein coupled with their D1 degradation assays (Figure 3 and 4), provides evidence that Trp oxidation (in particular the oxidation of Trp14) in coordination with FtsH results in the degradation of D1 protein. Indeed, their D1 degradation assays coupled with the use of a ftsh mutant provide further significant support that Trp14 oxidation and FtsH activity are strongly linked. But for FstH to degrade D1 protein it needs to gain access to photo-damaged D1. FtsH access to D1 is achieved by having CP43 partially dissociate from the PSII complex. Hence, the authors also addressed the possibility that Trp oxidation may also play a role in CP43 disassembly from the PSII complex thereby giving FtsH access to D1. Using a site-directed mutagenesis approach, they showed that Trp oxidation in CP43 appeared to have little impact on the PSII repair (Supplemental Figure S6). This result shows that D1-Trp14 oxidation appears to be playing a role in D1 turnover that occurs after CP43 disassembly from the PSII complex. Alternatively, the authors cannot exclude the possibility that D1-Trp14 oxidation in some way facilitates CP43 dissociation. Further investigation is needed on this point. However, D1-Trp14 oxidation is causing an internal disruption of the D1 protein possibly at the N-terminus of the protein. Consequently, the role of Trp14 oxidation in disrupting the stability of the N-terminal domain of the D1 protein was analyzed computationally. Using a molecular dynamics approach (Figure 5), the authors attempted to create a mechanistic model to explain why when D1 protein Trp14 undergoes oxidation the N-terminal domain of D1protein becomes unraveled. Specifically, the authors propose that the interaction between D1 protein Trp14 with PsbI Ser25 becomes disrupted upon oxidation of Trp14. Consequently, the authors concluded from their molecular dynamics simulation analysis that " the increased fluctuation of the first α-helix of D1 would give a chance to recognize the photo-damaged D1 by FtsH protease". Hence, the author's experimental and computational approaches employed here develop a compelling early-stage repair model that integrates 1) Trp14 oxidation, 2) FtsH activation and 3) D1- turnover being initiated at its N-terminal domain. However, a word of caution should be emphasized here. This model is just a snapshot of the very early stages of the D1 protein turnover process. The data presented here gives us just a small glimpse into the unique relationship between Trp oxidation of the D1 protein which may trigger significant N-terminal structural changes of the D1 protein that both signals and provides an opportunity for FstH to begin protease digestion of the D1 protein.

      However, the authors go to great lengths in their discussion section to not overstate solely the role of Trp14 oxidation in the complicated process of D1 turnover. The authors certainly recognize that there are a lot of moving parts involved in D1 turnover. And while Trp14 oxidation is the major focus of this paper, the authors show in Supplemental Fig S4 the structural positions of various additional oxidized Trp residues in the Thermosynecoccocus vulcans PSII core proteins. Indeed, this figure shows that the majority of oxidized Trps are located on the luminal side of PSII complex clustered around the oxygen-evolving complex. So, while oxidized Trp14 may be involved in the early stages of D1 turnover certainly oxidized Trps on the lumen side are also more than likely playing a role in D1 turnover as well. To untangle this complex process will require additional research.

      Nevertheless, identifying and characterizing the role of oxidative modification of tryptophan (Trp) residues, in particular, Trp14, in the PSII core provides another critical step in an already intricate multi-step process of D1 protein turnover during photo-damage.

      We thank reviewer #3 for all the helpful comments and their supportive review of the manuscript.

      We thank the reviewer for raising this interesting study that ROS might disrupt the interaction between the PsbT and D1 in Thermosynechococcus vulcanus. The stroma-exposed DE-loop of D1 is one of the possible cleavage sites by Deg protease. Because the D1 cleavage by Deg facilitates the effective D1 degradation by FtsH under high-light conditions, it is interesting to elucidate Deg and FtsH cooperative D1 degradation further. We added this discussion in the manuscript. Other minor comments were also answered in a point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Other minor points

      4) L227. How do you eliminate the possibility of reduced stability under high light?

      D1 synthesis under HL as pointed out by the reviewer was not tested in this study. Therefore, we can not rule out the possibility of a reduced D1 synthesis rate under HL in the mutant. However, the rate of D1 turnover(coordinated degradation and synthesis) is increased under HL. Since the pulse-labeling experiment is affected D1 degradation as well as D1 synthesis, even if there is a difference in the rate of D1 synthesis under HL, we can not clearly distinguish whether the cause of reduced labeling is the increased D1 degradation seen in the W14F mutant or the delay in D1 synthesis. We thank the reviewer for this valuable comment.

      5) Ls25-26. It would be quite rare that P680 directly absorbs light energy.

      We changed the sentence.

      6) L28. intrinsic antenna? Is this commonly used? core antenna?

      Corrected to “core antenna”

      7) Ls4143. Because the process is described as step iii), it is curious to mention it again as other critical steps.

      We removed the sentence.

      8) L75. Is it correct? Do you mean damage is caused by inhibition?

      We changed the sentence to “…the disorder of photosynthesis…”

      9) Figure 1c. +4, +16 and +32 should be explained in the legend.

      We added the explanation in the legend.

      10) Supplementary Figures S1 and S2. Title. Is it true that oxidation depends on singlet oxygen? This is a question. If it is not experimentally proved, modify the expression.

      In general, singlet oxygen (1O2) is believed to contribute in vivo oxidation of Trp. However, as suggested, these detected oxidative modifications were not exactly sure depends on singlet oxygen. Thus, we changed the title of Fig S1 and S2.

      11) Figure 3. Correct errors in + or - in the Figure.

      Corrected

      12) L328. Cyc > Cys.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      1) A few suggestions on typos and style:

      • Lines 2-3, please rephrase the sentence. The meaning is unclear.

      rephased the sentence to “Photosynthesis is one of the most …”

      • Lines 28-29, "Despite its orchestrated coordination...". Tautology.

      We changed the sentence.

      • Line 31, "...one, known as the PSII repair...". Please rewrite.

      We followed the reviewer suggestion and changed the sentence to “…synthesized one in the PSII repair.”

      • Line 49, "Their family proteins...". Rephrase.

      Rephrased the words.

      • Lines 64-66, please rewrite. I am not sure what the authors imply here. Are they talking about FtsH turnover or regulation of FtsH at the protein or gene level?

      FtsH itself is also degraded under high-light stress. To compensate for this, ftsH gene expression is upregulated and contributes to the proper FtsH level in thylakoid membranes. We rewrote the sentence as follows “increased turnover of FtsH is crucial for their function under high-light stress. That is compensated by upregulated FtsH gene expression”.

      • Line 68, "...to dislocate their substrates..."

      We changed the sentence to “to pull their substrates and push them into the protease chamber by ATPase activity”

      • Line 86, N-formylkymurenine => N-formylkynurenine

      Corrected

      • Lines 111-112, "Consistent with previous results...". Please specify which studies are being referred to and cite them if relevant.

      We added references.

      • Line 114, "...in extracts Arabidopsis..." => "...in extracts of Arabidopsis...".

      Corrected

      • Line 171, "influences in high-light sensitivity." Please rephrase.

      We rephrased the sentence.

      • Line 192, Fv/Fm. "v" and "m" should be subscripts.

      Corrected

      • Line 210, "...encounters...". Unclear meaning.

      We rephrased the sentence.

      • Line 358, hyphen usage. "fine-tuned". This sentence should be rewritten to make the role of phosphorylation clear. "Fine-tuning" is vague.

      We changed the sentence to “…spatiotemporal regulation of D1 degradation”

      • Fig. 6 legend, luminal => lumenal

      Changed to luminal

      2) The statistical notation used for some results is confusing. In Fig. 6b, "*" stands for p = {less than or equal to}0.1 while in fig. 4 it denotes p = {less than or equal to}0.05. If this is not a typo, this usage deviates from the standard one. How is a D2 change in Fig. 6b significant given its p value of {less than or equal to}0.1? The Fig. 6b key for D2 does not correspond with the histogram pattern.

      Thank you for your comments and suggestions. The asterisk in the Figure 6b is not a typo. We revised p value sign for less than 0.05 with a single asterisk to avoid confusion. While the case of p value in less than 0.1, we applied section sign “§” instead of the single asterisk sign to avoid confusion. Generally accepted p value to indicate statistically difference is less than 0.05. We found that D1 was p = 0.03322 and D2 was p = 0.07418. As we suspect these p value differences, the results for D2 protein detection were somewhat fluctuating while not in D1 protein detection as you commented. Still the reason of the fluctuating result of D2 signal intensity is not clear yet, we found the p value was between 0.05 and 0.10. We also rewrite the description in the corresponding result part.

      3) There are no error bars in Fig. 5d while the error bars in Fig. 5e show that there are no significant differences between Cβ distances of W14F and W14ox with WT contrary to the authors' assertion in the text (lines 254-255).

      The reason that there are no error bars in Fig. 5d. is because the fluctuation value in Fig. 5d was calculated from the entire trajectory (i.e., all snapshots) of the MD simulation. In contrast, the Cβ-Cβ distance value can be obtained at each individual snapshot of the simulation. Thus, Fig. 5e shows the averaged distances with the standard deviations (the error bars) over all these snapshots. To prevent any confusion for the reader, we have explicitly described “averaged Cβ-Cβ distance” and added an explanation of the error bars in the caption of Fig. 5e. It is important to note that our focus in the text (lines 254-255) was not on comparing the Cβ-Cβ distance of W14F with that of W14ox but the distance of W14F or W14ox with that of WT.

      4) Figure 3 legends and figure labels do not correspond. Fig. 3b should be labeled as High light - Chloramphenicol and likewise, fig 3c should read growth light + Chloramphenicol to be consistent with the legend.

      Corrected

      5) How are OPTM levels of D1 Trp residues normalized? Is it against unmodified peptides or total proteins?

      Oxidation levels of three oxidative variants of Trp in Trp14 and Trp317 containing peptides were obtained by label-free MS analysis. Fig.1 shows the intensity values of oxidized variants of Trp14 and Trp317. In this analysis, the levels of unoxidized peptides were not significantly changed between var2 and WT.

      6) Fig. 1a cartoon might need work. It looks like the oxygen atom in OIA is misplaced.

      Corrected

      Reviewer #3 (Recommendations For The Authors):

      In regard to Table 1, the sequence of the mass spectra fragment listed for Trp14 (i.e., ENSSL(W)AR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S1 (i.e., ESESLWGR). Likewise, the sequence of the mass spectra fragment listed for Trp317 (i.e., VLNT(W)ADIINR ) in Table 1 is different from the sequence of the mass spectra fragment of Trp14 shown in Supplemental Figure S2 (i.e., VINTWADIINR). This discrepancy, I think can be simply explained.

      Table 1 shows the newly detected peptide of Trp oxidation in PSII core protein in Chlamydomonas. On the other hand, Figures S1 and S2 are the results of MS analysis used for the level of Trp oxidation analysis in Arabidopsis var2 mutant, as shown in Fig. 1C. To avoid confusion, we added in the supplemental figure title that it was detected in Arabidopsis.

      Labeling: In Figure 3, the figure legend states that b, high-light in the absence of CAM; but panel b, shows +CAM conditions. I think this labeling is incorrect and needs to be -CAM. Likewise, the figure legend states that c, growth-light in the presence of CAM. I think this labeling is incorrect and needs to be +CAM.

      Corrected

      This reviewer has a few comments/suggestions on the presentation of the sequence alignments showing the various positions of oxidized Trps within the D1(Figure 1), D2 and CP43 (Supplemental Figure S3) and CP47 (Supplemental Figure S3):

      The authors should consider highlighting in red all the various Trps shown in Table 1 with the corresponding alignments shown in Figure 1 for D1 protein and corresponding alignments in Supplemental Figure S3 (for D2 and CP43) and Supplemental Figure S3 continued (For CP47). Highlighting the locations of oxidized Trps across various species is very informative but as presented here the red labeling somewhat is haphazard, confusing and thus these figures lose some of their impact factor. For instance, in Supplementary Fig. S4, the reader can visualize the structural positions of oxidized Trp residues in the Thermosynecoccocus vulcanus PSII core proteins. When one then looks at the various alignments presented by the authors, one can see that other species have a similar arrangement of oxidized Trp residues as well. Consequently, when you now collectively look at the data presented in Table 1, Figure 1, Supplemental Figure S3 and Supplemental Figure S4, a picture emerges that illustrates how common the phenomenon of overall Trp oxidation is and more specifically how oxidized Trp14 across species is playing a similar role in possibly activating D1 turnover. I think these Figures, if presented in a more comprehensive and unified fashion, will really add to the paper.

      Thank you for your suggestion. In this study, we tried to show the identified oxidized Trp by the MS-MS analysis, the residue conservation in the sequences, and its position in the structure. Since we have to show a lot of information, combining them into one figure is difficult. We hope you understand the reason for this.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We are grateful for the helpful comments of both reviewers and have revised our manuscript with them in mind.

      One of the main issues raised was that readers may by default assume that our models are correct. We in fact made it very clear in our discussion that the models are merely hypotheses that will need testing by “wet” experiments and we do not therefore agree that even readers unfamiliar with AF would assume that the models must be correct. It was also suggested that readers could be reassured by including extensive confidence estimates such as PAE plots. As it happens, every single model described in the manuscript had reasonably high PAE scores and more crucially the entire collection of output files, including PAE data, are readily accessible on Figshare at https://doi.org/10.6084/m9.figshare.22567318.v2, a fact that the reviewers appear to have overlooked. The Figshare link is mentioned three times in the manuscript. Embedding these data within the manuscript itself would in our view add even more details and we have therefore not included them in our revised manuscript. Likewise, it is rather simple for any reader to work out which part of a PAE matrix corresponds to an interaction observed in the corresponding pdb prediction. Besides which, it is our view that the biological plausibility and explanatory power of models is just as important as AF metrics in judging whether they may be correct, as is indeed also the case for most experimental work.

      Another important point was that the manuscript was too long and not readable. Yes, it is long and it could well be argued that we could have written a different type of manuscript, focusing entirely on what is possibly the simplest and most important finding, namely that our AF models suggest that in animal cells Wapl appears to form a quarternary complex with SA, Pds5, and Scc1 in a manner suggesting that a key function of Wapl’s conserved CTD is to sequester Scc1’s Nterminal domain after it has dissociated from Smc3. For right or for wrong, we decided that this story could not be presented on its own but also required 1) an explanation for how Scc1 is induced to dissociate from Smc3 in the first place and 2) how to explain that the quarternary complex predicted for animal cells was not initially predicted for fungi such as yeast. The yeast situation was an exception that clearly needed explaining if the theory was to have any generality and it turned out that delving into the intricate details of the genetics of releasing activity in yeast was eventually required and yielded valuable new insights. We also believe that our work on the recruitment of Eco/Esco acetyl transferases to cohesin and the finding that sororin binds to the Smc3/Scc1 interface also provided important insight into how releasing activity is regulated. We acknowledge that the paper is indeed long but do not think that it is badly written. It is above all a long and complex story that in our view reveals numerous novel insights into how cohesin’s association with chromosomes is regulated and have endeavoured to eliminate any excessive speculation. We feel it is not our fault that cohesin uses complex mechanisms.

      Notwithstanding these considerations, we have in fact simplified a few sections and removed one or two others but acknowledge that we have not made substantial cuts.

      It was pointed out that a key feature of our modelling, namely the predicted association of Wapl’s C-terminal domain with SA/Scc3’s CES is inconsistent with published biochemical data. The AF predictions for this interface are universally robust in all eukaryotic lineages and crucially fully consistent with published and unimpeachable genetic data. We note that any model that explains all findings is bound to be wrong for the very simple reason that some of these findings will prove to be incorrect. There is therefore an art in Science of judging which data must be explained and accommodated and which should be ignored. In this particular case, we chose to ignore the biochemistry. Time will tell whether our judgement proves correct.

      Last but not least, it was suggested that we might provide some experimental support for our proposed SA/Scc3-Pds5-Scc1-WaplC quaternary complex. We are in fact working on this by introducing cysteine pairs (that can be crosslinked in cells) into the proposed interfaces but decided that such studies should be the topic of a subsequent publication. It would be impossible with the resources available to our labs to follow up all of the potential interactions and we therefore decided to exclude all such experiments.

      We are grateful for the detailed comments provided by both reviewers, many of which were very helpful, and in many but not all cases have amended the manuscript accordingly.

      With regard to the more specific comments:

      Reviewer #1 (Recommendations For The Authors):

      1) One concern is that observed interfaces/complexes arise because AF-multimer will aim to pack exposed, conserved and hydrophobic surfaces or regions that contain charge complementarity. The risk is that pairwise interaction screens can result in false positive & non-physiological interactions. It is therefore important to report the level of model confidence obtained for such AF calculations:

      A) The authors should color the key models according to pLDDT scores obtained as reported by AF. This would allow the reader to judge the estimated accuracy of the backbone and side chain rotamers obtained. At least for the key models and interactions it would be important to know if the pLDDT score is >90 (Correct backbone and most rotamers) or >70 (only backbone is correct).

      B) It would also be important to report the PAE plots to allow estimation of the expected position error for most of the important interactions. pLDDT coloring and PEA plots can be shown side-by-side as shown in other published data (e.g. https://pubmed.ncbi.nlm.nih.gov/35679397/ (Supplementary data)

      C) The authors should include a Table showing the confidence of template modeling scores for the predicted protein interfaces as ipTM, ipTM+pTM as reported by AlphaFold-multimer. Ideally, they would also include DockQ scores but this may not be essential. Addition of such scores would help classification into Incorrect, Acceptable or of high quality. For example, line 1073 et seq the authors show a model of a SCC1SA and ESCO1 complex (Fig. 37). Are the modeling scores for these interfaces high? It does not help that the authors show cartoons without side chains? Can the authors provide a close-up view of the two interfaces? Are the amino acids are indeed packed in a manner expected for a protein interface? Can we exclude the possibility that the prediction is obtained merely because the sequence segments (e.g. in ESCO1 & ESCO2) are hydrophobic and conserved?

      We do not agree that including this level of detail to the text/figures of the manuscript would be suitable. All the relevant data for those who may be sceptical about the models are readily available at https://doi.org/10.6084/m9.figshare.22567318.v2. In our view, the cartoon versions of the models are easier for a reader to navigate. Anyone interested in the molecular details can look at the models directly.

      Importantly, no amount of statistical analysis can completely validate these models. What is required are further experiments, which will be the topic of further work from our and I dare from other laboratories.

      D) When they predict an interaction between the SA2:SCC1 complex and Sororin's FGF motif, they find that only 1/5 models show an interaction and that the interaction is dissimilar to that seen of CTCF. Again, it would be helpful to know about modeling scores. Can they show a close-up view of the SORORIN FGF binding interface to see if a realistic binding mode is obtained? Can they indicate the relevant region on the PAE plot?

      Given that AF greatly favours other interactions of Sororin’s FGF motif over its interaction with SA2-Scc1, we do not agree that dwelling on the latter would serve any purpose.

      2) Line 996: AF predicts with high confidence an interaction between Eco1 & SMC3hd. What are the ipTM (& DockQ if available) scores. Would the interface score High, Medium or Acceptable?

      As mentioned, see https://doi.org/10.6084/m9.figshare.22567318.v2.

      3) Line 1034 et seq: Eco1/ESCO1/ESCO2 interaction with PDS5. Interface scores need to be shown to determine that the models shown are indeed likely to occur. If these interactions have low model confidence, Fig. 36 and discussion around potential relevance to PDS5-Eco1 orientation relative to the SMC3 head remains highly speculative and could be expunged.

      See https://doi.org/10.6084/m9.figshare.22567318.v2. It should be clear that the predictions are very similar in fungi and animals. Crucially, we know that Pds5 is essential for acetylation in vivo, so the models appear plausible from a biological point of view.

      4) Considering the relatively large interface between ECO1 and SMC3, would the author consider the possibility that in addition to acetylating SMC3's ATPase domain, ECO1 remains bound to cohesin-DNA complex, as proposed for ESCO1 by Rahman et al (10.1073/pnas.1505323112)?

      This is certainly possible but we would not want to indulge in such speculation.

      5) E.g. Line 875 but also throughout the text: As there is no labeling of the N- and C-termini in the Figures, is frequently unclear what the authors are referring to when they mention that AF models orient chains in a certain manner.

      Good point. This has been amended. However, the positions of N- and C- is all available at https://doi.org/10.6084/m9.figshare.22567318.v2.

      6) Fig19B: PAE plots: authors should indicate which chains correspond to A, B, C. Which segment corresponds to the TYxxxR[T/S]L motif? Can they highlight this section on the PAE plot?

      Good point and amended in the revised manuscript.

      Minor comments:

      1) Line 440: the WAPL YSR motif is not shown in Fig. 14A

      2) Line 691: Scc3 spelling error.

      3) Line 931: Sentence ending '... SCC3 (SCC3N).' requires citation.

      4) Line 1008: Figure reference seems wrong. It should read: Fig. 34A left and right. Fig. 34B does not contain SCC1.

      Many thanks for spotting these. Hopefully, all corrected.

      5) Fig. 41 can be removed as it shows the absence of the interaction of Sororin with SMC1:SCC1. Sufficient to mention in the text that Sororin does not appear to interact with SMC1:SCC1.

      This is possible but we decided to leave this as is.

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      (1) Are there any predicted models in which one of the two dimer interfaces of the hinge is open when the coiled coils are folded back, as seen in the cryo-EM structure of human cohesin-NIPBL complex in the clamped state?

      No AF runs ever predicted half opened hinges. It is possible that the introduction of mutations in one of the two interfaces might reveal a half-opened state and we ought to try this. However, it would not be appropriate for this manuscript, we believe.

      (2) Structures of the SA-Scc1 CES bound to [Y/F]xF motifs from Sgo1 and CTCF have been reported, suggesting that a similar motif could interact with SA/Scc3. Surprisingly, AF did not predict an interaction between Scc3/SA and Wapl FGF motifs, which only bind to the Pds5 WEST region. On the other hand, AF predicted interactions of the Sororin FGF motif with both Pds5 WEST and SA CES. Can the authors comment on this Wapl FGF binding specificity? What will happen if a Wapl fragment lacking the CTD is used in the prediction?

      This seems to be an academic point as the CTD is always present.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study as a concept is well designed, although there are two issues I see in the methodology (these may be just needing further explanation or if I am correct in my interpretation of what was done, may need reanalysis to take into account). Both issues relate to the data that was extracted from the published literature on zoonotic malaria prevalence in the study area.

      1) No limit was set on the temporal range

      With no temporal limit on the range of studies, the landscape in many cases will have changes between the study being conducted and the spatial data. This will be particularly marked in areas where there has been clearing since the zoonotic malaria prevalence study. Also, population changes (either through population growth, decline or movement) will have occurred. All research is limited in what it can do with the available data, so I realise that there may not be much the authors can do to correct this. One possible solution would be to look at the land use change at each site between the prevalence study and the remote sensing data. I'm not sure if this is feasible, but if it is I would recommend the authors attempt this as it will make their results stronger.

      Thank you for the comments. We agree that matching the date of remote sensing data to samples is particularly important for environmental variables that change rapidly (such as forest loss). To clarify, no limit was set on the date range of the studies identified from the literature to ensure no articles were excluded due to arbitrary date restrictions. We have edited the manuscript to clarify this (line 422). Regarding landscape and environmental features, remote sensing data was extracted annually for every year for the full date range of the data (see Table 1 and S11, annual temporal resolution from 2006 to 2020). Forest was then matched contemporaneously (see lines 467–473) meaning that, insofar as it was possible, forest data was extracted for the same year as the data was collected. Where a date range was given for the primate data, the mean year was used. For human population density, covariate data were extracted for multiple years but were found to be relatively stable over the time period for the sites covered, so median year was used (see Supplementary Information, Appendix E and Table S11). Elevation is stable and typically only one time point is used as reference (in this instance the SRTM 90m Digital Elevation model, 2003).

      2) Most studies only gave a geographic area or descriptive location.

      The spatial analysis was based on a 5km and 20km radius of the 'study site' location, but for many of the studies the exact site is not known. Therefore the 'study site' was artificially generated using a polygon centroid. Considering that the polygon could be an administrative boundary (i.e., district/state/country), this is an extremely large area for which a 5km radius circle in the middle of the polygon is being taken as representative of the 'study site'. This doesn't make sense as it assumes that the landscape is uniform across the district, which in most cases it will not be (in rural areas it is going to be a mixture of villages, forest, plantation, crops etc which will vary across the landscape). This might just be a case of misunderstanding what was done (in which case the text needs rewording to make it clearer) or if I have interpreted it correctly the selection of the centroid to represent the study area does not make sense. I am not sure how to overcome this as it probably not possible to get exact locations for the study sites. One possibility could be to make the remote sensing data the same scale as the prevalence data ie if the study site is only identifiable at the polygon level, then the remote sensing data (fragmentation, cover and population) is used at the polygon level.

      Both these issues could have an impact on the study's findings. I would think that in both cases it might make the relationship between the environmental variables and prevalence even clearer.

      We would like to thank the reviewer for their concerns and provide some clarification on the methods used to extract environmental variables:

      • Centroid was initially explored, but not pursued for the same concerns raised by the reviewer. Taking the centroid would be arbitrary and the central point of a large polygon is not likely to be representative of habitat across the entire sampling area and introduces error so this was not pursued(Cheng et al., 2021). We have clarified the wording in the manuscript with reference to centroids to avoid confusion on this point (line 491).

      • We demonstrate a method to account for the lack of precise geolocation by taking 10 ‘pseudo-sampling’ points instead of a single random location, with environmental variables extracted at 5, 10 and 20km for each site (lines 487-500). By including 10 environmental realisations, surveys conducted in smaller or more uniform landscapes will have more consistent covariates and this will lend more weight to the model. Conversely, samples taken from large administrative polygons are likely to be highly variable, and these associations will have less representation in the final model. This approach was used to demonstrate an alternative to using a single arbitrary site to represent the area.

      To further support the validity of this technique:

      • Figures illustrating the variance of the environmental variables across the 10 sampling sites at 5, 10 and 15km for GADM administrative classifications at country level (GID0), state (GID1), district (GID2) and exact coordinates (GPS) are now included in the SI (Figure S12).

      • Sensitivity analyses were conducted, in which final GLMM models were fit again but using only acceptable levels of variance in environmental variables and/or acceptable size of administrative boundary (Table S15 and S16). In sensitivity analyses, forest cover and fragmentation retained a significant effect on prevalence of P. knowlesi in macaques, suggesting this effect is robust to spatial uncertainty.

      We would also like to highlight that the main finding of this research is the novel synthesis of regional prevalence of P. knowlesi in simian reservoirs across Southeast Asia, which was formerly assumed to be ubiquitous high prevalence, and which can now be used to inform regionally specific transmission modelling, better estimate spatial risk and parameterise early warning systems for P. knowlesi malaria in countries approaching elimination of human malarias. The risk factor analysis here is provided to begin to understand what may be driving this geographic heterogeneity in P. knowlesi prevalence at finer scales and demonstrate methods that could be used to accommodate spatial uncertainty in secondary data. We appreciate that this may not have been clear and have edited the manuscript accordingly.

      Reviewer #2 (Public Review):

      This is the first comprehensive study aimed at assessing the impact of landscape modification on the prevalence of P. knowlesi malaria in non-human primates in Southeast Asia. This is a very important and timely topic both in terms of developing a better understanding of zoonotic disease spillover and the impact of human modification of landscape on disease prevalence.

      This study uses the meta-analysis approach to incorporate the existing data sources into a new and completely independent study that answers novel research questions linked to geospatial data analysis. The challenge, however, is that neither the sampling design of previous studies nor their geospatial accuracy are intended for spatially-explicit assessments of landscape impact. On the one hand, the data collection scheme in existing studies was intentionally opportunistic and does not represent a full range of landscape conditions that would allow for inferring the linkages between landscape parameters and P. knowlesi prevalence in NHP across the region as a whole. On the other hand, the absolute majority of existing studies did not have locational precision in reporting results and thus sweeping assumptions about the landscape representation had to be made for the modeling experiment. Finally, the landscape characterization was oversimplified in this study, making it difficult to extract meaningful relationships between the NHP/human intersection on the landscape and the consequences for P. knowlesi malaria transmission and prevalence.

      Thank you for the feedback on the manuscript. We agree that the data was not originally intended for spatial assessment of landscape impact nor represents a full range of landscape conditions across the region. However, we would like to highlight the first set of results from the meta-analysis. Here, the synthesis of all available data allows for the detection of regional disparities and geographic heterogeneity of prevalence in host species, which individual small-scale opportunistic studies are not powered to do, and which had not been identified before this investigation.

      In this context, the risk factor analysis is an exploratory analysis to understand what may be driving the observed geographic variation at broad scales as well as provide a framework for dealing with spatial uncertainty. Landscape data was extracted at a level deemed appropriate given the limitations of the data. The majority were geolocated to district level and sensitivity analysis showed a reasonable consistency of landscape features at our chosen scales (Table S8, Figure S12A). To address some of these concerns, we conducted further analysis to explore the deviation of environmental covariates in each sampling area and ran sensitivity analysis by removing extremely variable datapoints (Table S15 and Table S16). When removing highly uncertain data and/or countrylevel data, effects of canopy cover on non-human primate malaria prevalence is retained, supporting the original findings.

      Despite many study limitations, the authors point to the critical importance of understanding vector dynamics in fragmented forested landscapes as the likely primary driver in enhanced malaria transmission. This is an important conclusion particularly when taken together with the emerging evidence of substantially different mosquito biting behaviors than previously reported across various geographic regions.

      Another important component of this study is its recognition and focus on the value of geospatial analysis and the availability of geospatial data for understanding complex human/environment interactions to enable monitoring and forecasting potential for zoonotic disease spillover into human populations. More multi-disciplinary focus on disease modeling is of crucial importance for current and future goals of eliminating existing and preventing novel disease outbreaks.

      Reviewer #1 (Recommendations For The Authors):

      A couple of minor points

      1) Was the human density and forest cover correlated? If so was this taken into account

      Human density and forest cover at selected scales were not found to be strongly correlated (Spearman’s rank values -0.38 and -0.45 within 5km and 20km buffer radii for human population density respectively).

      In selecting variables for inclusion in the final model, we examined variance inflation factors (VIF) to detect and minimise multicollinearity in the model. VIF measures the correlation and strength of correlation between independent predictors. VIF of each predictor variable was examined starting with a saturated model and sequentially excluding the variable with the highest VIF score from the model. Stepwise selection continued until the entire subset of explanatory variables in the global model satisfied a conservative threshold of VIF ≤6 (Rogerson, 2001), which ensures that the remaining variables included in the final model have minimal correlation. Spearman’s correlation matrices for all variables at all scales and final selected variables (below VIF threshold) are included in the Supplementary Information (Figure S13 and Figure S14).

      2) Reference (Speldewinde et al., 2019) is down as Davidson et al. in the reference list

      Thank you for the thoroughness in this review. There are two similar but separate references, both published in 2019 with the same co-authors, and the (Speldewinde et al, 2019) was incorrectly referenced. They should be (Davidson et al., 2019a) and Davidson et al., 2019b) respectively. This has now been corrected in the manuscript.

      Davidson, G., Chua, T.H., Cook, A. et al. Defining the ecological and evolutionary drivers of Plasmodium knowlesi transmission within a multi-scale framework. Malar J 18, 66 (2019). https://doi.org/10.1186/s12936-019-2693-2

      Davidson G, Chua TH, Cook A, Speldewinde P, Weinstein P. The Role of Ecological Linkage Mechanisms in Plasmodium knowlesi Transmission and Spread. Ecohealth. 2019;16(4):594-610. https://doi:10.1007/s10393-019-01395-6

      Reviewer #2 (Recommendations For The Authors):

      Line 143: "We hypothesise that higher prevalence of P. knowlesi in primate host species is driven by landscape change..." without specifying here the kind of landscape change (e.g. "forest degradation and fragmentation") it is virtually impossible to confirm or reject this hypothesis.

      We agree that the wording of the hypotheses needed to be more specific. We have edited lines 142 – 145 to specify forest fragmentation as our landscape variable of interest, and to more explicitly include the regional meta-analysis of P. knowlesi prevalence.

      Table 1 vs Table S11 discrepancy regarding spatial resolution of Forest cover and fragmentation variables. The original dataset resolution is 30m but I don't think one can compute a PARA index at 30 m since it really requires a polygon that is larger than the single value pixel. Table S11 indicates a 30 km gridcell with some postprocessing of the original datasets.

      We appreciate this being identified. The resolution refers to the input layer (tree canopy cover, 30m). PARA was calculated from the binary forest cover layer (30m resolution) within each buffer radii 5, 10 and 20km. We have edited both Table 1 and Table S11 to help clarify this.

      It would be very helpful if you provided justification for selecting specific metrics to represent the key landscape variables. How are these particular landscape variables relevant? Why not other land cover/land use components?

      We have now included a paragraph in the Supplementary Information (Appendix D) to explain the choice of environmental covariates. Elevation was chosen as an important proxy for vector distribution (but was not retained in model selection). Human population density was chosen as a measure of proximity to human settlement, rather than relying on qualitative assessment of rural/peri-urban/urban. Tree canopy cover and fragmentation indices are key determinants of primate habitat selection and of vector breeding habitat, and justification for the use of perimeter: area ratio is included in the methods section (section beginning line 462).

      I think the other issues present substantial weaknesses that you cannot address without redoing the study. I will list those below just for reference.

      1) If the forest is so dominant (which I would agree with based on my understanding of macaque ecology), how does it make sense to select completely random points (especially at the country or even state level) to represent landscape covariates? At a minimum, I would suggest getting random points within the forest or better yet forest edge habitat. But even then, I doubt that these points would be at all representative of the conditions of a specific study. The geospatial uncertainty is just too large. The dataset simply doesn't support the analysis that is attempted here.

      On the point of selecting from only within forest: forest is a dominant habitat, but Long-tailed macaques are anthropophilic and not exclusively found in forest (Stark et al., 2019), and a proportion of the more opportunistic and nuisance samples caught were found in areas more associated with human activity (Li et al., 2021). As such, random points only within forested areas is also unlikely to capture the true habitat of the primates sampled and selecting only from forested areas would bias the results.

      Whilst fully georeferenced samples would be the ideal scenario, the idea behind selecting random points from the sampling polygon is that for smaller areas (with higher spatial certainty), habitat would be more consistent between random points and lend more weight to the final model, whereas large polygons with high uncertainty are likely to vary and lend less weight to the final model. In response to these comments, we have further supported this by running regression models only on samples within a reasonable administrative boundary size and on samples within reasonable threshold of uncertainty (i.e., data points are removed if the deviation of environmental covariates across the 10 random points is so high that the sample is uninformative, or if datapoints can only be geolocated to country-level). In these sensitivity analyses, forest cover and species are retained as factors associated with higher malarial prevalence in non-human primates (Table S15S16).

      2) Hansen et al. dataset reflects "tree cover" - which is not the same as "forest cover" since it would also include plantations that are very widely distributed across Southeast Asia. If the animal use of plantations differs from that of natural forests, it will present a large issue for the study.

      In this analysis the feature of interest was habitat configuration (fragmentation) and deforestation (forest loss) rather than specific land class. We have defined forest as >50% canopy cover, which considers canopy density given historical forest loss and has precedence in other work (Fornace et al.,, 2016). In addition to importance to macaque ecology, forest (canopy) cover, forest loss and forest edge are noted to be key determinants of vector breeding and vector habitat (Byrne et al., 2021, Chua et al., 2019). For this reason, these are important variables to include in analyses. More specific landscape variables were explored, but the temporal and spatial range of the data precluded fine-scale land classification data. To investigate preliminary links to landscape configuration and habitat fragmentation at broad scales this is felt to be sufficient. We have also amended the manuscript to be more discerning with the use of ‘forest’ to avoid confusion throughout.

      3) Tree regrowth in the ecosystems of monsoonal Asia is very rapid. Based on the study description, tree regrowth was not accounted for in the study which could potentially lead to a very large underestimation of tree cover if only tree loss since 2000 was monitored. Again unless there is a reason to assume that macaques do not use young successional forests or use it at a highly reduced rate. Both of these points are acknowledged as limitations at the end of the discussion section but in my opinion they have a very strong impact on the study, making the results non-significant.

      This is an interesting suggestion. Macaques do forage in plantations and cultivated landscapes to supplement food, but preferentially roost and range in forest edges and interior forest, though ranging behaviour will be complex and vary across Southeast Asia. In this study the primary interest was in deforestation (forest loss) and fragmentation of old growth forested landscapes, which are key variables both for macaque ecology and for vector breeding sites. Therefore, it was felt that forest loss (transition from >50% canopy cover to <50% canopy cover since 2000) was sufficient to capture this. Ranging behaviour of individual animals and macaque troops would not be captured at this scale, and higher spatial and temporal resolution would be required to characterise relationships with tree regrowth and young plantations which is outside the scope of this study. In all regions, purposeful fine scale follow-up studies would be required to unpick fine scale relationships across a habitat gradient.

      I am not 100% sure I understand the geospatial design fully. The pieces are distributed between different subsections and it was challenging to string together the processing chain between subsections of the manuscript and the supplemental information. I would help to add a figure (a flowchart, perhaps?) to the supplemental section that walks through the entire geospatial covariates assembly. E.g.

      • GPS location create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer - I still don't understand the 30m or 30 km spatial resolution reference for forest and PARA in this context.

      This was an error in the table in the Supplementary Information and has been corrected – the forest cover raster has a resolution of 30m, and the perimeter: area ratio is calculated within 5, 10 and 20km buffers.

      • landscape covariates receive the full weight (1) in the model. - This is defensible even though not ideal

      This is equivalent, but we felt more intuitive, to sampling GPS points x10 and inputting with equal weights to the areal data.

      • No GPS location assign to the best identifiable administrative unit (country, state, or district) generate 10 random points within the administrative unit create 5, 10, and 20 km buffers mean elevation, mean population, %(?) Forest, PARA(?) for each buffer landscape covariates from each point receive the proportional weight (0.1) in the model. I do not believe that this approach is representative of macaque habitat/macaque human interaction characterization.

      In other examples dealing with spatial uncertainty, the centroid is taken to be representative of an area. This method generates considerable bias and uncertainty – particularly if the uncertainty is not then accounted for by weighting subsequent models (Cheng, 2021). In this exploratory analysis, pseudo-sampling from 10 random sites generates a more realistic generalised environmental realisation than taking a centroid/random point. This was used as an exploratory analysis to explain broad regional trends in prevalence between, which can be used to guide further investigation on fine scale studies which are required to completely describe disease dynamics in specific macaque habitats.

      Thank you for this useful suggestion – we have taken this advise and added a flowchart of data processing to the Supplementary Information (Appendix D, Figure S8).

      Discussion:

      Based on information in Table S4, sampled NHPs were predominantly from human-dominated (peridomestic, agricultural, and urban) landscapes. In forested landscapes, only macaques that live in forest edge habitats were likely sampled in the first place just simply due to extreme challenges in getting to macaques in remote inaccessible areas. There is a very substantial spatial bias in sampling will undoubtedly reflect that fragmented habitat is a key landscape component impacting the prevalence of Pk in NHP, especially as the authors point out in the later part of the discussion, the critical vectors for transmission are also associated with forest edge habitats. High forest fragmentation is also linked to the presence/ increase in migrant human workers (logging or plantation activities) - a population also strongly associated with higher malaria prevalence for a variety of P spp (although I am not aware of studies that are specific to Pk malaria). However, the living conditions for migrant workers have frequently been implicated in higher rates of malaria transmission and thus those could, hypothetically, also contribute to Pk infection rates in NHP. Ultimately, the discussion appears to suggest that the biggest gap in our understanding is within vector ecology and understanding the NHP-vector-human dynamics within local landscape settings. It is an interesting finding. However, my overall conclusion would be that the sampling strategy (both for NHP and geospatial covariates) renders this study as "exploratory" at maximum and that all findings would need to be tested and verified through independent and more rigorously designed studies.

      Thank you to the reviewer for a comprehensive assessment. We would first like to highlight the regional meta-analysis, which was one of the main findings. This is a novel result for P. knowlesi literature; being the first demonstration of regional differences in prevalence that correlate to regional hotspots of human incidence, the force of infection from NHP may drive hotspots of P. knowlesi in human populations.

      We include a risk factor analysis that suggests a method for dealing with high spatial uncertainty, and an exploratory analysis that finds landscape complexity may be a contributory factor to broad regional heterogeneity. These associations are robust to sensitivity analysis where data with extreme variability in environmental variables is removed (Table S15-S16).

      Habitat descriptions in original studies are qualitative, likely subjective, and whilst there is likely to be an important sampling bias there was also evident differences in prevalence between the NHP sampled in different environments from the available data that we have further characterised. Risk factors for human P. knowlesi do include forest loss (reduction in canopy cover) within 5 years and within 2km, as well as contact with macaques and occupations in plantations (Fornace et al., 2014; Fornace et al., 2016). Reverse spillover from humans to NHP is an interesting suggestion, but outside the scope and scale of the study. Given known links of deforestation (forest loss) with human incidence of P. knowlesi and also with increased vector breeding sites (Byrne et al., 2021), this analysis explores whether deforestation is linked to prevalence in reservoir species thus contributing to the force of infection at broad scales.

    1. Author Response:

      We are sorry that both eLife and the Reviewers feel that our submitted studies are currently insufficient to support our hypothesis that loss of H2-O function affects thymic Treg selection. As this is the first study directly evaluating loss of H2-O in the thymus we do not feel that we overstated our finding as suggested by Reviewer 1. We hope that a revised version of the manuscript can satisfy the reviewers’ criticisms.

      -Reviewer 1 is asking us to address the presumed discrepancies between our previous work (Welsh et al 2020, https://doi.org/10.1371/journal.pbio.3000590) and data from Lee et al. 2021 (https://doi.org/10.4049/jimmunol.2100650) in this current manuscript, which does not report on the development of EAE in DO-KO and DO-WT mice. All experiments here are on naïve mice. As such, we wish to justify our lack of discussion of Lee et al (2021) findings.

      Lee et al (2021) reported the effects of DO on both EAE and SLE development, they used mainly H2-Oβ KO mice. As we have never used these CRISPR generated mice, we cannot have a direct in-house comparison. However, we did note that reported disease curve for female H2-Oβ KO mice had a similar trend indicating increased EAE disease development, similar to what we have reported back in our 2020 paper (Welsh et al PLoS Biology). In the single experiment that utilized H2-Oβ KO mice for EAE development, Lee et al found a different disease trend than ours. However, Lee et al (2021)’s tested only 4-5 mice per group in the single experiment and their measurement of the disease development solely relied on visual assessment of the limbs and tail functionality. Our study verified EAE disease development by multiple approached including analyses of MOG-specific tetramer staining of the CNS CD4 lymphocyte infiltrate, and in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody probe specific to MBP. We had repeated our experiments on the disease development greater than 15 times using 5-8 mice per group. Below is an excerpt from our Results Section of Welsh et al PLoS Biology, clearly explaining how many experiments were performed and the number of mice per group per experiment:

      “From these studies, we found that DO-KO mice had an accelerated onset of disease compared to DO-WT mice (Fig 7A). Disease symptoms (Score 1) appeared around Day 8–10 and quickly progressed to advanced disease (Score 3–4) by Day 14–16 in DO-KO. In contrast, DO-WT mice started showing symptoms around Day 12 and progressed to advanced disease scores by Day 20. Total cell infiltration into the CNS tissue was slightly higher in DO-KO mice, but no change in total brain weight was observed (S5 Fig). To further correlate the state of disease with CD4 infiltration, we performed in vivo NIRF whole-body imaging on diseased DO-WT and DO-KO mice using an antibody (Ab) probe specific to myelin basic protein (MBP). The Ab reacts with MBP only when the myelinated glia cells are damaged during disease development [56]. Thus, by detecting demyelination, whole-body imaging allowed us to fully visualize the co-localization of CD4 T cells at the sites of demyelination occurring in diseased mice. Interestingly, when mice of various disease scores were imaged, we found increased co-localization of infiltrating CD4 T cells with anti-MBP staining in DO-KO mice, but not in DO-WT mice (Fig 7B). These data not only confirmed the flow cytometric findings that diseased DO-KO mice have a greater influx of lymphocytes into their CNS tissue (S5 Fig), it also verified the massive demyelination that occurs during the disease”

      And again in the Legend to Figure 7;

      “Representative curves showing the time course of disease development in DO-KO (red) and DO-WT mice (white). N = 5 mice per group, representative of >15 repeat experiments. Score system: 0 = no symptoms, 1 = limp tail, 2 = limp tail + partial hind limb paralysis, 3 = limp tail + total hind limb paralysis, 4 = limp tail + total hind limb paralysis + partial forelimb paralysis. Data represented as mean ± SEM.”

      Despite clarity of the description of our experiments, Lee et al have publicly slandered us and grossly misrepresented our work by stating the following:

      “A recent study (11-Welsh et al) found that B6.Oa−/− mice were more susceptible to EAE than control B6J animals. However, that conclusion was based on a single experiment, in which control B6J mice developed very mild EAE disease with an average score of 1, which is far lower than the disease scores published by other groups (30–32) and also observed in our study. Thus, in this inducible model of autoimmunity, H2-O deficiency does not contribute to either disease development or severity.”

      -Another important variable between our studies and Lee et al (Lee et al 2021) was the use of a commercially available disease induction kit versus our immunization solutions that followed the established protocols by Nancy Ruddle et al (J Exp Med. 1997 Oct 20; 186(8): 1233–1240. doi: 10.1084/jem.186.8.1233). Notoriously, EAE disease development could vary widely based upon the quantities and purity of, a) MOG peptide, b) amount of tuberculosis antigen in the CFA, c) quantity of pertussis toxin and injection strategies, as well as many other uncontrollable factors. While a comparison these two results are irrelevant to our current study, we will be more than happy to compare our results from the previously published work with Lee et al. in the discussion.

      -We want to emphasize that we did follow Hogquists et al’s gating strategy for detecting auditing vs deleted thymocytes by subdividing total thymocytes into “Non-signaled” (TCR-β-, CD5-/inter) and “Signaled” (TCR-β+ CD5+/hi) populations before further gating on only medulla localized CD4 T cells. The “CCR7+ CD4+” label in Figure 1 was meant to orient the reader without overwhelming the figure with numerous flow plots. To address this concern, we will be including (1) updated Supplemental figures showing the complete gating strategy, (2) updated figure legends and text to emphasize the fact that auditing/deletion gating came from CD4 T cells which passed positive selection (i.e. TCR-β+ CD5+/hi), and (3) including representative flow plots for all Figure 1 panels to the revise manuscript.

      -Also, regarding “discrepancies between our data and Liljedahl et al 1998”;

      H2-O KO mice used by Liljedahl et al were on a 129/Ola genomic background. The H2-O KO mice used for both of our papers have been completely backcrossed to C57BL/6J. Clearly, non-MHC genes contribute to the impacts of MHC proteins, yet how the 129/Ola genomic background could affect the H2-O genes remains to be discovered. And (B), no data was shown supporting their published statement below:

      “The proportions of B cells as well as of CD4+ and CD8+ T cells in the lymph node, spleen, and thymus were similar in H2-Oa–deficient and wild-type mice (data not shown)”. (Liljedahl et al 1998).

      Reviewer 2:

      scRNA-Seq analysis was performed by the Computational Biology Computing Core at Johns Hopkins School of Medicine. We missed including this acknowledgement as our core facility does not request authorship or acknowledgements. The sentence has been edited for the correct terminology.

      -About truncated bar graph, in the entire paper we have only two bar graphs, neither of which is truncated. So, we are puzzled by the reviewer’s comment as to what figure he/she is referring to. -We would like to remind the Reviewer 2 that since DO works together with DM and functions differently on peptide of different sequences, the reported data on cumulative effects of DO in vivo have notoriously been rather minor. Especially, since our current study focuses on the naïve mice, major changes were not expected.

      -Regarding leaving out gating strategies, we missed out on providing the gating strategies for all the figure in the original version. However, full FACS gating strategies have now been provided in the new supplemental figures and representative FACS plots have been added to ALL main figures.

    1. Author Response

      We would like to express our gratitude to the reviewers for their insightful comments and suggestions on our manuscript. We appreciate the time and effort they have devoted to evaluating our work. In response to their valuable feedback, we will undertake a comprehensive revision of our manuscript to address their concerns and enhance the clarity of our findings.

      Reviewer #1 has raised the important point of the need for a more thorough exploration of how ELF3 promotes cell tolerance to DNA damage.

      Just as mentioned by the reviewer, we totally agreed that genomic instability is key to cell transformation. In the original manuscript, we proposed that ELF3 might be an important factor for cells to tolerate the lethal genomic instability caused by BRCA1 deficiency, keeping an “appropriate” level of genomic instability, thus fueling cell transformation. And we acknowledge the limitation that the mechanism of how ELF3 promotes cell to tolerate DNA damage remains further exploration. To address this, ELF3 overexpression and knockdown experiments in more BRCA1 wildtype or deficient breast cell lines are planned. In addition, since ELF3 is an inherent transcription factor, we suspect the function of ELF3 to promote cell tolerance to DNA damage is mediated by transcription, and more downstream genes of ELF3 will be explored as well.

      Regarding the concerns raised by Reviewer #2, we acknowledge that our manuscript may have contained gaps and limitations of the datasets used.

      We appreciate the reviewer's feedback regarding the limitations of our cell models and their representativeness of LP cells. While we have utilized MCF10A cells for the knockdown experiments, we understand that these may not be a perfect representation of LP cells. To address this concern, we will incorporate a discussion on the limitations of our cell models and their relevance to LP cells, along with potential plans in LP cells that may be included in future studies.

      We will also clarify the rationale for focusing on ELF3 and discuss the other genes identified in our analysis for completeness. Regarding to ELF3 functions in cells other than LP, in our analysis, ELF3 is highly expressed in LPs compared to other cell populations in mammary gland, making ELF3 a previously undefined LP gene. Thus, we suspect that ELF3 functions may be more significant in LP cells. We are also interested in ELF3 functions in cells other than LP cells and will further explore

      We agree that different pathogenic variants of BRCA1 may cause diverse impacts on its function and tumorigenesis. We will add detailed information and discussion about BRCA1 pathogenic variants of patients in our single-cell RNA-seq. Also, to enhance the overall clarity of our manuscript, we will revise the figure legends to include critical details that were previously omitted. This will ensure that readers can better evaluate the presented data.

    1. Author Response

      We appreciate the feedback from all the reviewers. We will incorporate their comments into the revised manuscript.

      In response to reviewer three's suggestion regarding complementary approaches for identifying rootlet components, we'd like to provide further insight into the strategies we explored.

      We performed mass spectrometry on our purified rootlets. This identified the rootlet components rootletin and CCDC102B and various axonemal components, due to the association between the rootlet and axoneme. However, due to the limitations in quantifying components using mass spectrometry, we were unable to confidently identify novel rootlet constituents present in quantities comparable to rootletin.

      We further attempted cross-linking mass spectrometry on the rootlets to gain deeper insights to the interactions between rootletin molecules. Unfortunately, this effort resulted in a completely insoluble sample despite extended digestion times, leading to issues with mass spectrometry column clogging and rendering our results inconclusive.

      We attempted to express rootlet components recombinantly and were able to purify fibres, but they did not contain the characteristic repeat pattern seen in native rootlets. We also considered purifying native rootlets from cultured cells, but realized the yield would be too low for cryo-ET studies.

      We therefore regret that other approaches to validate our model are outside the scope of this current work.

    1. Author Response

      1) The analysis of Shh deletion in mossy cells and influences of aging related NSC pool decline is not well connected with the rest of the study on the expression/requirement of Shh in mossy cells to regulate seizure-induced neurogenesis. To promote cohesion, the authors should examine/discuss what happens to mossy cells during aging - it is similar or different to what happens to mossy cell neuronal activity during seizures?

      We believe that both are similar mechanisms. Seizure induced neurogenesis increases NSC proliferation, which increases demand of Shh to increase self-renewal. Similarly, we assume that increased NSC decline in Shh cKO mice is due to the increased demand of Shh for self-renewal of NSC with aging. It has been shown that NSCs in young mice generally don’t self-renew and instead are consumed after one or two rounds of cell division. On the other hand, NSCs in old mice are known to undergo more rounds of cell division compared with younger mice. This suggests that NSCs may be more dependent on signals driving self-renewal in aged-mice. Our suggestion is that Shh from mossy cells contributes to minimising the NSC pool decline with aging, and therefore loss of Shh from mossy cells results in increased decline of the NSC pool in aged-Shh cKO mice. This aligns with our hypothesis that Shh from mossy cells contributes to maintenance of the NSC pool.

      What is the exact mechanism regulating the shift of proliferation capacity of NSC with aging remains unclear and would be an interesting topic for future studies. In addition, whether mossy cell neuronal activity is decreased with age or Shh release/expression is compromised in aged animals remains to be elucidated. Considering these factors together, the brain region(s) and other factors that regulate neuronal activity of mossy cell thereby controlling Shh release and how these are dysregulated in pathological conditions and in aging will be important studies for future research.

      2) Only male mice were analyzed in the seizure induction experiments, leaving open the possibility of sex differences since previous reports suggest sex differences in adult neurogenesis.

      Seizure induced neurogenesis was observed in both male and female mice. Considering that, we assumed that mossy cell derived Shh regulates seizure induced neurogenesis also in female mice. However, we agree with the reviewers’ comments. We can not exclude the possibility that female mice reacts to KA or seizures differently from male mice, or that Shh from mossy cells might have distinct effects in female mice in that paradigm. It is also an interesting possibility that female specific behaviors may affect mossy cell activation and also regulate neurogenesis though Shh. Because these are large and unresolved questions, we elected to leave potential sex difference in mossy cell regulated neurogenesis for future research.

      3) Several control groups are missing:

      -For seizure induction: missing vehicle (instead of no KA treatment).

      -For TAM induction: missing corn oil only to check leakiness and specificity of transgene.

      -For DREADD experiment: missing vehicle (to control for hM3 non-specific effects)

      About missing vehicles in KA treatments, we used saline (0.9% NaCl) as a vehicle for Kainic acid, which is commonly used as a vehicle for water soluable reagents in adult neurogenesis experiments. In addition, the average volume of KA solution that mice received intrapenitorially for seizure induction was less than 500ul, which is less than recommended maximum volume in NIH and UCSF. We have not tested if the saline injection makes a difference in our experiments but based on previous reports using saline, we believe that saline would not affect our experimental results.

      About Tamoxifen injections, the Gli1-CreER mice have been widely used for fate tracing analysis including in our previous research where Gli1-CreER mice have shown specific recombination in Gli1-expressing NSCs. Our results in this study have shown consistently that Gli1-CreER;;Ai14 mice label NSCs in the dentate gyrus. Given this, we believe that our result using Gli1-CreER line are not affected by non-specific recombination without tamoxifen.

      About Clozapine (CZL) injection, we decided to administer CLZ in both control and DREADD animals considering the possible side-effects of CLZ. We agree with the reviewer that our experiment cannot exclude the possibility that expression of hM3Dq affects neurogenesis without CLZ or CNO. However, although we have not included the analysis using saline as a control in our experiments, we have tested that both transgenic and virus-injected mice DREADD expressing mice respond to CLZ and activate neuronal activity of mossy cells compared with control animals. Therefore, we believe that it does not affect the interpretation of our data that mossy cell neuronal activity controls neurogenesis.

      We appreciate reviewers' carefully considered comments and we will apply suggested controls to our future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive feedback and very helpful comments. We agree that this manuscript focuses primarily on functional outcomes and phenotypes. The studies were designed to address an important clinical question, i.e., repurposing dantrolene for the treatment of ventricular tachyarrhythmias and the prevention of sudden cardiac arrest. Thus, the current manuscript emphasizes in vivo studies over in vitro studies.

      However, we also acknowledge the need for additional mechanistic studies. We are in the final stages of submitting a second manuscript in which we dissect the underlying mechanisms through detailed in vitro studies of mitochondrial antioxidant capacity, reactive oxygen species, phosphorylation of ryanodine receptors, autonomic dysfunction, beta-adrenergic signaling, etc. that are beyond the scope of the current manuscript.

      Additionally, a third manuscript in progress focuses on the mechanistic link between ion channels, dispersion of repolarization, and sudden cardiac death. We previously reported the preliminary results in abstract form (Circulation Research. 2019;125:A102). Briefly, current-voltage relationships from patch clamp studies of isolated LV myocytes revealed that pressure-overload stress strongly reduced K currents, including IK1, IKs, and IKr. These changes were driven by downregulation of K channels and their components at the mRNA level. As expected, the reduced K currents destabilized the resting membrane potential, especially in phases II and II of the cardiac action potential, and reduced repolarization reserve. Scavenging mitochondrial ROS stabilized repolarization, suggesting mROS is the upstream driver of K channel downregulation. However, we have not specifically tested whether dantrolene stabilizes repolarization via the same mechanism. As such, we agree that "lability" or "dispersion" are more precise terms than "reserve" for the phenomenon reported in the present manuscript, and we have made these changes. Thank you for pointing this out. We have also changed the title accordingly.

      The present study investigates the effect of dantrolene on male animals. We agree that we need to evaluate the effect on females, especially because females have historically been underrepresented in studies of sudden cardiac arrest. Based on our preliminary studies, female animals exhibit increased variability in their phenotypic response to pressure-overloaded stress. Given the importance of this issue, we will examine the sex differences in carefully controlled future experiments, including the effect of dantrolene in females controlled for hormonal effects (e.g., with and without oophorectomy).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript focused on roles of a key fatty-acid synthesis enzyme, acetyl-coA-carboxylase 1 (ACC1), in the metabolism, gene regulation and homeostasis of invariant natural killer T (NKT_ cells and impact on these T cells' roles during asthma pathogenesis. The authors presented data showing that the acetyl-coA-carboxylase 1 enzyme regulates the expression of PPARg then the function of NKT cells including the secretion of Th2-type cytokines to impact on asthma pathogenesis. The results are clearcut and data were logically presented.

      Thank you for your input into our work. Your comments have been very helpful in enhancing our work.

      Reviewer #2 (Public Review):

      In this study the authors sought to investigate how the metabolic state of iNKT cells impacts their potential pathological role in allergic asthma. The authors used two mouse models, OVA and HDM-induced asthma, and assessed genes in glycolysis, TCA, B-oxidation and FAS. They found that acetyl-coA-carboxylase 1 (ACC1) was highly expressed by lung iNKT cells and that ACC1 deficient mice failed to develop OVA-induced and HDM-induced asthma. Importantly, when they performed bone marrow chimera studies, when mice that lacked iNKT cells were given ACC1 deficient iNKT cells, the mice did not develop asthma, in contrast to mice given wildtype NKT cells. In addition, these observed effects were specific to NKT cells, not classic CD4 T cells. Mechanistically, iNKT cell that lack AAC1 had decreased expression of fatty acid-binding proteins (FABPs) and peroxisome proliferator-activated receptor (PPAR)γ, but increased glycolytic capacity and increased cell death. Moreover, the authors were able to reverse the phenotype with the addition of a PPARg agonist. When the authors examined iNKT cells in patient samples, they observed higher levels of ACC1 and PPARG levels, compared to healthy donors and non-allergic-asthma patients.

      Thank you for your thorough analysis of our work.

      Reviewer #1 (Recommendations For The Authors):

      1) I suggest the authors to remove one copy of the sentence "It should be noted that CD4-CreAcc1fl/fl mice lack ACC expression in both conventional CD4+ T cells and iNKT cells." in Lines 421-423.

      We have removed the redundant sentence originally shown in LINES 421-423. Thank you for pointing this out.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a very strong study with few concerns.

      1) Are there tissue specific differences in the iNKT cell populations? The authors examined lung iNKT cells in the Figs 1-3, and used liver NKT cells for the mechanistic studies in Fig 4-5. The studies shown in Fig S2 suggest that ACC1 deficient iNKT cells have developmental defects and impaired homeostatic proliferative capacity. Does ACC1 impact lung and liver iNKT cells similarly and is the lack of allergic asthma in ACC1 deficient iNKT cells due to defective iNKT cell trafficking to the lungs or a failure to survive after transfer (Fig 3)?

      2) Similarly, are chemokine receptor expression patterns similar between WT and ACC1 deficient iNKTs (Fig 4)?

      3) The authors data suggest that Tregs are not playing a major role in the regulation of asthma induction in their ACC1 deficient mice, based on FoxP3 expression. Did the authors perform suppressor assays to show that the Tregs function similarly in WT and ACC1 deficient mice?

      In the revised manuscript, the authors addressed my major concerns.

      Thank you for your previous comments. They were very helpful in upgrading our scientific work here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate very much the comments and suggestions on our manuscript "Cylicins are a structural component of the sperm calyx being indispensable for male fertility in mice and human". According to the comments, we performed a series of further experiments, re-worded and re-wrote several paragraphs and re-structured the manuscript according to the reviewers’ comment. We think that the manuscript is now improved and are looking forward to the further evaluations. We provide a point by point response to all comments and have prepared a version.

      Recommendations for the authors:

      Editor’s comment:

      1) As pointed out by all three reviewers, it is critical to show the specificity of the antibodies used. The authors should clarify how the specificity of antibodies is tested. Western blot analysis to show the absence of the protein in the knockout is essential.

      As suggested by all reviewers, we additionally performed Western Blot analysis on cytoskeletal protein fractions to further verify the specificity of generated antibodies and the generation of functional knockout alleles. Results nicely confirm the results of the IF staining, however, both anti-bodies detected the bands lower than the predicted molecular weight. In addition, Mass Spectrometry was performed to search for the presence of peptides in the cytoskeletal protein fractions. The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested. The section reads now as follows:

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings (IHC), showing a specific signal in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      2) Re-structuring/streamlining of the figures is recommended. Please consider the flow suggested by reviewer #2 and shorten the evolutionary analysis which takes up more space than it adds to the value of the story.

      We thank the reviewers and editor for the valuable suggestion. We re-structured the figures as suggested and rewrote the results section accordingly. The evolutionary analysis was significantly shortened.

      3) Provide statistics for the imaging analysis such as TEM as only a single representative image is shown.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – supplement 1). Furthermore, we quantified the manchette length of step 10-13 spermatids to prove the increased elongation of the manchette in Cylc2-/- and Cylc1-/y Cylc2-/- spermatids (Fig. 5 A-B).

      4) Please consider other points raised by the reviewers below to improve the manuscript and provide responses on how the authors have addressed them.

      We thank all reviewers for the detailed review of our manuscript and their valuable suggestions, which helped a lot to improve the manuscript. We considered all points raised by the reviewers to the best of our knowledge and hope that the reviewers will approve the manuscript ready for publication. We added a point-by-point discussion of all comments/suggestions hereafter.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Antibody specificity: Fig 1E - there are some unspecific binding in Cylc2-/- for CYLC2 and in Cylc1/y Cylc2+/- for CYLC1 in the testis (and elongating spermatids in Figure 1 – Supplement 4). Could authors elaborate/comment on this? Western blot analysis would be also helpful to further support the antibody specificity.

      The very weak unspecific staining in the testis for CYLC2 (in Cylc2-/-) and CYLC1 (in Cylc1-/y Cylc2+/-) is only present in the lumen of the seminiferous tubules and/or the residual bodies of the testicular sperm cells and can be referred to as background signal. Importantly, the signal is entirely lost in the PT region, proving specificity of the generated antibodies. We added the following paragraph to the results section:

      Line 124-127: The generated antibodies did not stain testicular tissue and mature sperm of Cylc1- and Cylc2-deficient males, except for a very weak unspecific background staining in the lumen of seminiferous tubules and the residual bodies of testicular sperm (Fig. 1 F).

      Specificity of antibodies was additionally proven by immunohistochemical staining, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining in testis sections only, but not in any other organ tested (Figure 1 – supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining. No unspecific bands were detected in the Western Blot, further supporting the notion that the weak unspecific signals in IF resemble staining artifacts.

      The paragraph reads now as follows:

      Line 127-132: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-.

      (2) Please provide more interpretation of the gene dosage effect of Cylicin 2. It is not common to see a gene dosage effect in the sperm phenotype as transcripts and proteins can be shared between haploids due to syncytium formation during spermatogenesis.

      We agree and we apologize for the misinterpretation. In Cylc2+/- mice expression of Cylc2 was reduced by half but there was no altered phenotype observed. The sentence now reads as follows:

      Line 112: In Cylc2+/- animals expression of Cylc2 was reduced by 50 %.

      (3) Line 194-196 - the authors say that the sperm are smaller, with shorter hooks and increased circularity of the nuclei, and reduced elongation. Are these statistically significant? There seems to be a high variation in the graph in S2D and the statistical analysis is not given.

      We agree, performed statistical analyses, and highlighted significantly altered values for sperm head elongation and circularity in Figure 2 – Supplement 3.

      (4) Line 153-164 It is interesting that the absence of Cylc2 affected many parts of sperm structure. I think some ratios of sperm always have a morphological defect in diverse ways, so it is hard to confirm the finding only with a single sperm image. I think that it will be important to do some statistical analysis or at the minimum show more TEM images from TEM.

      We agree that the observed morphological defects require a detailed statistical evaluation. TEM analysis was performed to confirm the results from optical microscopy and representative images with high magnification are shown for a detailed visualization of the defects. For additional quantification, we included statistics for IF stainings against calyx proteins CCIN and CapZa (Fig. 2 I-J). For TEM, we added additional images to the supplement (Figure 3 – Supplement 1).

      (5) Line 236-242 - I believe that the phenotype described applies to the sperm from Cylc2-/- and Cylc1/y Cylc2-/- animals; however, I think that the Cylc1-/y Cylc2+/- has a more subtle, intermediate phenotype between the WT and the genotypes missing both Cylc-/- alleles.

      We agree and we added a quantification of manchette length at step 10-13 to visualize the differences between the genotypes. The section reads now as follows: Line 268-272: Manchette length was measured starting from step 10 until step 13 spermatids and the mean was obtained, showing that the average manchette length was 76-80 nm in wildtype, Cylc1-/Y and Cylc2+/- while for Cylc2-/- and Cylc1-/Y Cylc2-/- spermatids mean manchette length reached 100 nm (Fig. 5 B). Cylc1-/Y Cylc2+/- spermatids displayed an intermediate phenotype with a mean manchette length of 86 nm.

      (6) Since CYLC1 staining is absent in Fig 5B, does that mean that the mutation resulted in protein degradation/instability? Is RNA present? Additional biochemical studies of Cyclins demonstrating the deleterious nature of the mutations would strengthen the molecular pathogenesis of the human mutations.

      Thank you for raising these important questions. The CYLC1 variant c.1720G>C is predicted to cause the amino acid substitution p.(Glu574Gln). It is, thus, conceivable that the RNA is present but either the protein is degraded or misfolded and, therefore, not detectable by IF. Unfortunately, for personal reasons of the patient, it is currently not possible to receive additional semen samples, preventing additional analyses of the semen, e.g. analysis of Cylicin transcript level.

      (7) Strongly suggest shortening the evolutionary analysis - all the corresponding materials are in supplemental while texts are extensive- or even consider entirely omitting. It does not add a lot to the current study.

      We agree that the evolutionary analysis was very detailed. However, we think that the results are important to understand the role of Cylicins for male reproduction in general. The results obtained from the mouse model might be transferable to other species, including humans. Further, the results present a possible explanation for the subfertility of Cylc1-deficient mice, in contrast to infertility of Cylc2-deficient males. We shortened the section, the paragraph reads as follows:

      Line 287-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6).

      Minor comments:

      (1) Line 114, 115, 118 à Figure 1D is already well-described in the previous paragraph and thus redundant. Ref Fig 1D, E; but only figure E shows IF. Maybe supposed to be E and F or just 1E?

      We apologize for the mix-up with the subfigures. The mentioned paragraph refers to Fig. 1 E-F, which was corrected accordingly.

      Line 117-123: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E). The signal was first detectable in the subacrosomal region as a cap-like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3). As the spermatids elongate, CYLC1 and CYLC2 move across the PT towards the caudal part of the cell (Figure 1 – supplement 4). At later steps of spermiogenesis, the localization in the subacrosomal part of the PT faded, while it intensified in the postacrosomal calyx region (Fig. 1 E-F).

      (2) Figure 1F - Arguably, IF images show expression of both CYLC1 and CYLC2 to reach/include the acrosome/hook portion of the sperm head, but the diagram does not reflect that. Why is that?

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      (3) Line 124 - PAS staining mentioned on line 124, is not explained (Periodic acid Schiff staining) until line 605

      We agree and introduced the abbreviation accordingly. The PAS staining was moved to Fig. 4. The paragraph reads now as follows:

      Line 220-222: To study the origin of observed structural sperm defects, spermiogenesis of Cylicin deficient males was analyzed in detail. PNA lectin staining and Periodic Acid Schiff (PAS) staining of testicular tissue sections were performed to investigate acrosome biogenesis.

      (4) Some figures are hard to read due to being very small (S1B, 3F).

      We agree and we increased the figure size. For former Figure 3F (now figure 4A), insets with higher magnification of representative sperm were added. Insets are additionally shown in Figure 4 – Supplement 1 in higher resolution.

      (5) Line 139 Please specify whether the sperm was capacitated or not.

      Analysis of the flagellar beat was performed with non-capacitated sperm. We clarified this in the main text:

      Line 203: The SpermQ software was used to analyze the flagellar beat of non-capacitated Cylc2-/- sperm in detail 22.

      As described in the Material and Methods section, sperm were only activated in TYH medium, prior to analysis:

      Line 732-733: Sperm samples were diluted in TYH buffer shortly before insertion of the suspension into the observation chamber.

      (6) Line 142-145; The sentence is interrupted strangely, perhaps the authors meant to write: "Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high-frequency beating occurs at the flagellar tip"

      We corrected the sentence accordingly.

      Line 206-208: Interestingly, we observed that the flagellar beat of Cylc2-/- sperm cells was similar to wildtype cells, however, with interruptions during which midpiece and initial principal piece appeared stiff whereas high frequency beating occurs at the flagellar tip (Fig. 3 C, Video 1, Video 2).

      (7) Line 142 -Wrong Figure number. Figure S4A is a phylogenic analysis.

      We regret the mix up and corrected the Figure reference accordingly. Line 204-205: Cylc2-/- sperm showed stiffness in the neck and a reduced amplitude of the initial flagellar beat, as well as reduced average curvature of the flagellum during a single beat (Figure 3 – supplement 2).

      (8) L146-147 Better placed in Discussion.

      We agree, and we omitted this sentence from the results part.

      (9) Line 154-156 - The white arrowheads are present in both WT and KO sperm, thus it appears they denote the basal plate, not necessarily the dislocation/parallel position as the current text seems to suggest. Furthermore, the position of the WT and KO sperm is somewhat different with the tail coiling differently, so it is hard to see whether the two are comparable.

      We agree and we removed the white arrowhead in the WT sperm picture, and it now depicts only the dislocation of the basal plate in the Cylc2-/- sperm. Due to the morphological anomalies of Cylc2-/- sperm cells, it’s difficult to determine the exact angle of the depicted cell. However, we added more TEM pictures of the sperm cells (3 for WT and 6 for Cylc2-/-) in Figure 3 – Supplement 1.

      (10) Line 164 Please describe in detail what mitochondrial damage the readers expect to see from the TEM image.

      We evaluated the observed mitochondrial damage in more detail. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation, and we deleted this section in the manuscript.

      (12) Figure S2A - no WT comparison, difficult to compare without it (mitochondria in Cylc2-/-)

      See (10). We evaluated the observed mitochondrial damage in more detail and in comparison to WT. Unfortunately, the defects described initially seem to be an artifact of apoptotic sperm cells and could not be identified in vital sperm cells in either of the knockout mouse models. We apologize for this misinterpretation and we deleted this section in the manuscript.

      (13) Line 172-173 - Fig 3C denotes quantification of abnormal acrosome only, however, the text mentions sperm coiled tail being quantified within this graph - which is it? Is it both of them? Or only one of them?

      Figure 3 C (now Figure 2G) showed the percentage of abnormal sperm in general comprising acrosomal as well as flagellar defects. We modified the figure and evaluated acrosomal defects and tail defects separately. The results section was changed accordingly and reads now as follows:

      Line 152-159: Loss of Cylc1 alone caused malformations of the acrosome in around 38% of mature sperm, while their flagellum appeared unaltered and properly connected to the head. Cylc2+/- males showed normal sperm tail morphology with around 30% of acrosome malformations. Cylc2-/- mature sperm cells displayed morphological alterations of head and mid-piece (Fig. 2 F-G). 76% of Cylc2-/- sperm cells showed acrosome malformations, bending of the neck region, and/or coiling of the flagellum, occasionally resulting in its wrapping around the sperm head in 80% of sperm (Fig. 2 F). While 70% of Cylc1-/Y Cylc2+/- sperm showed these morphological alterations, around 92% of Cylc1-/YCylc2-/- sperm presented with coiled tail and abnormal acrosome (Fig. 2 F-G).

      (14) Fig 3D - CCIN in the text, cylicin in the figure - this should be consistent. Furthermore, since only the head is being shown, is CCIN ever detected in the WT sperm tail?

      We apologize for the inconsistency, and we added the abbreviation “CCIN” to the figure. CCIN is solely detectable in the sperm head of wildtype sperm as published previously. Irregular staining patterns showing signals in the tail region are only observed upon Cylicin deficiency.

      (15) Line 199-200 - To say that head of Cylc2-deficient sperm appears less concave seems redundant, likely the observed increased circularity is contributed to by sperm head being less concave in this region; unless there is an extra point that the authors are trying to make and if so, this needs to be elaborated on

      We agree and we deleted the sentence from the manuscript.

      (16) Figure legend of Fig S3 is wrong. Only S3A and S3B are present, and in the figure legend S3C corresponds to figure S3B.

      We agree and corrected the Figure legends accordingly. Due to the re-structuring of the manuscript, Figures and Supplementary figures were re-ordered as well.

      (17) Figure 4B - figure legend and/or text should specify that lectin is green and HOOK1 is in red

      We specified the figure legend as well as the main text accordingly: Line: 279-281: Co-staining of the spermatids with antibodies against PNA lectin (green) and HOOK1 (red) revealed that abnormal manchette elongation and acrosome anomalies simultaneously occurred in elongating spermatids of Cylc2-/- male mice (Fig. 5 C).

      Line: 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (18) Line 261-263 - It is difficult to see what is going on with microtubules in these images, as the resolution is low

      We increased the pictures and improved their quality. Microtubules are also depicted with letter ‘m’

      (19) Line 265-266 - It seems that there is a persistence of manchette, rather than elongation. From these images, I cannot see gaps, and I am not sure where to look for them. This needs to be labelled further and higher-resolution images could be included for clarity.

      We agree, although we observed both excessive elongation and persistence of the manchette. The average length of the manchette is now shown in figure 5B.

      The paragraph now reads as follows:

      Line 235-239: Microtubules appeared longer on one side of the nucleus than on the other, displacing the acrosome to the side and creating a gap in the PT (Fig. 4 C). Whereas elongated spermatids at step 14-15 in wildtype sperm already disassembled their manchette and the PT appeared as a unique structure that compactly surrounds nucleus, in Cylc2-/- spermatids, remaining microtubules failed to disassemble.

      The gaps in the perinuclear theca are better visible in TEM micrographs and the description is now in the paragraph describing TEM.

      (20) Line 269 Please include the information of "White arrowhead".

      We added the information accordingly.

      Line 240-242: In addition, at step 16, the calyx was absent, and an excess of cytoplasm surrounded the nucleus and flagellum (Fig. 4 C, white arrowhead).

      (21) Line 276-280 This should be placed in the Discussion.

      We agree, and we deleted this concluding remark from the results section.

      (22) Is Cylc1 and/or Cylc2 conserved/expressed amongst species other than rodents and primates?

      Yes, Cylc1 and Cylc2 homologs were identified in C. elegans for example. We added a schematic to the introduction showing the protein structure of human, mouse and C. elegans CYLC1 and CYLC2 (Figure 1 – supplement 1).

      The section reads now as follows:

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1- supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysine-glutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices 14. Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1-supplement 1).

      (23) The whole chapter "Cylc2 coding sequence is slightly more conserved among species than Cylc1" references only supplemental figures/tables. I find this unusual.

      We agree, and in order to show the results of the evolutionary analysis more clearly, we moved the panel to main Figure 6.

      Line 286-302: To address why Cylc2 deficiency causes more severe phenotypic alterations than Cylc1deficiency in mice, we performed evolutionary analysis of both genes. Analysis of the selective constrains on Cylc1 and Cylc2 across rodents and primates revealed that both genes’ coding sequences are conserved in general, although conservation is weaker in Cylc1 trending towards a more relaxed constraint (Fig. 6 A). A model allowing for separate calculation of the evolutionary rate for primates and rodents, did not detect a significant difference between the clades, neither for Cylc1 nor for Cylc2, indicating that the sequences are equally conserved in both clades.

      To analyze the selective pressure across the coding sequence in more detail, we calculated the evolutionary rates for each codon site across the whole tree. According to the analysis, 34% of codon sites were conserved, 51% under relaxed selective constraint, and 15% positively selected. For Cylc2, 47% of codon sites conserved, 44% under neutral/relaxed constraint, and 9% positively selected. Interestingly, codon sites encoding lysine residues, which are proposed to be of functional importance for Cylicins, are mostly conserved. For Cylc1, 17% of lysine residues are significantly conserved and 35% of significantly conserved codons encode for lysine. For Cylc2, this pattern is even more pronounced with 27.9% of lysine codons being significantly conserved and 24.3% of all conserved sites encoding for lysine (Fig. 6 B).

      (24) Line 332 - CYCL2 should be CYLC2

      We corrected the typo accordingly.

      (25) Line 340 The ratio of head defects is different from Table 1 (98% here and 99 % in the table). Please check this information.

      We apologize for the inconsistency. We checked the raw data and corrected the table accordingly.

      (26) Line 344-345 From figure 5C it is difficult to determine whether the sperm are "headless" or whether the heads are attached to the highly coiled tails next to them

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. Furthermore, we added an arrowhead to figure 6C to highlight headless sperm. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      (27) L367-368 I agree with the authors' logic of this sentence. Although, it is better to show the co-localization of proteins using multi-channel immunocytochemistry. As you mentioned on L369 this will make your finding more obvious. If it is available, please include the colocalization images of the proteins.

      We performed the multi-channel staining against Cylicin1 and Calicin, as well as Cylicin2 and Calicin on mouse epipidymal sperm and it’s shown in Figure 2 – supplement 4. Unfortunately, we did not manage to obtain stainings of tissue sections since antibodies against Cylicins and Calicin require different sample processing.

      The sentence was added in the section describing calyx integrity:

      Line 168-169: In epididymal sperm, CCIN co-localizes with both CYLC1 and CYLC2 in the calyx (Figure 2 – supplement 4).

      (28) Line 376 Please keep the abbreviation. "Calicin" "CCIN".

      We included the abbreviation accordingly.

      Line 377-378: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins.

      (29) Line 377-378 "Based on ~". The authors did not prove the interaction between CCIN and Cylicins in this article. The mislocalization of CCIN might be resulted in the loss of Cylicins, without any "interaction". To reach this conclusion, a more direct result should be provided.

      We agree that we overinterpreted this as we and others did not prove the interaction between CCIN and Cylicins so far. We therefore weakened this statement and formulated it as a hypothesis.

      Line 377-381: CCIN is shown to be necessary for the IAM-PT-NE complex by establishing bidirectional connections with other PT proteins. Zhang et al. found CYLC1 to be among proteins enriched in PT fraction 7. Based on their speculation that CCIN is the main organizer of the PT, we hypothesize that both CCIN and Cylicins might interact, either directly or in a complex with other proteins, in order to provide the ‘molecular glue’ necessary for the acrosome anchoring.

      (30) Line 499 Please specify which is the target of the immunostaining on the Figure legend. (Tubulin à acetylated α-tubulin)

      We specified that α-Tubulin was stained. The figure legend reads now as follow: Line 555-557: Immunofluorescence staining of α-Tubulin to visualize manchette structure in squash testis samples of WT, Cylc1-/y, Cylc2+/-, Cylc2-/-, Cylc1 -/y Cylc2+/- and Cylc1-/y Cylc2-/- mice.

      (31) Line 502 Please specify which color indicates which target protein (not only cellular structure).

      Line 560-562: Co-staining of the manchette with HOOK1 (red) and acrosome with PNA-lectin (green) is shown in round, elongating and elongated spermatids of WT (upper panel) and Cylc2-/- mice (lower panel).

      (32) Line 509 Please include scale bar information in the figure legend like Figure 4 (The magnifications of Figure 5 B, C, and D seem different).

      We included the scale bar information accordingly (now Figure 6).

      Line 575-588: Figure 6: Cylicins are required for human male fertility

      (A) Pedigree of patient M2270. His father (M2270_F) is carrier of the heterozygous CYLC2 variant c.551G>A and his mother (M2270_M) carries the X-linked CYLC1 variant c.1720G>C in a heterozygous state. Asterisks (*) indicate the location of the variants in CYLC1 and CYLC2 within the electropherograms.

      (B) Immunofluorescence staining of CYLC1 in spermatozoa from healthy donor and patient M2270. In donor’s sperm cells CYLC1 localizes in the calyx, while patient’s sperm cells are completely missing the signal. Scale bar: 5 µm.

      (C) Bright field images of the spermatozoa from healthy donor and M2270 show visible head and tail anomalies, coiling of the flagellum as well as headless spermatozoa who carry cytoplasmatic residues without nuclei. Heads were counterstained with DAPI. Scale bar: 5 µm.

      (D-E) Quantification of flagellum integrity (D) and headless sperm (E) in the semen of patient M2270 and a helathy donor.

      (F-G) Immunofluorescence staining of CCIN (F) and PLCz (G) in sperm cells of patient M2270 and a healthy donor. Nuclei were counterstained with DAPI. Scale bar: 3 µm.

      (33) S2A is not clear. Please describe specifically what the left panel and right panel are about to show with a clear indication of what is PM, mitochondria, etc. On the right, in only one cross-section that shows both mitochondria and the 9+2 axoneme, they look both unaltered whereas on the left, there are unpacked, not aligned mitochondria but the tail boundary is not clear to grasp at first sight.

      We apologize for the bad quality of the TEM pictures showing the axonemes and the missing labeling. We recorded and included new images showing an intact 9+2 microtubular structure in Cylc2-/-. Furthermore, we added an image for the wildtype control.

      (34) S2D: colors of the last three plots of each graph are too close to tell apart

      We agree and changed the color scheme for better visualization.

      Reviewer #2 (Recommendations For The Authors):

      However, I find the manuscript a bit messy, and I will propose to reorganize the figures: following figure 1, showing the reproductive phenotype, I would continue with a figure showing the morphology of sperm in optical microscopy and showing the morphological defect of the nucleus (Fig 3B and 3E), followed with one figure focusing on the flagellum, with images obtained with optical and electronic microscopies, allowing to present the abnormalities of the flagellum and finally the impact on sperm motility and flagellum beating (mix of figure 2FG/3A); next, one figure focusing on acrosome. After that, I would present all data concerning spermiogenesis, starting with figure 2C.

      We thank the reviewer for the valuable suggestion, which helps a lot to improve the structure and comprehensibility of the manuscript. We re-organized the figures and the results section accordingly.

      Major remarks

      1) Line 111. The specificity of raised Ab is not clear. Please specify if antibodies are specific: what immune-decorates anti-CYLC1: only CYLC1 or CYLC1 and CYLC2. Same question for anti-CYLC2

      Both antibodies were raised against specific peptides of the CYLC1 or CYLC2 protein, respectively. The antigen peptides used for immunization are depicted in the Material and Methods section (AESRKSKNDERRKTLKIKFRGK and KDAKKEGKKKGKRESRKKR peptides for CYLC1; KSVGTHKSLASEKTKKEVK and ESGGEKAGSKKEAKDDKKDA for CYLC2). The peptides used for immunization are specific as they do not resemble the highly conserved and repetitive KKD/KKE motives present in both, Cylc1 and Cylc2.

      The specificity of raised antibodies was validated by IF staining of wildype and Cylicin-deficient testis sections. The results clearly show, that CYLC1 signal is absent in Cylc1-deficient spermatids and CYLC2 signal being absent in Cylc2 deficient spermatids.

      Specificity of antibodies was additionally proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested.

      Line 115-117: Specificity of antibodies was proven by immunohistochemical stainings, showing a specific staining only in testis sections but not in any other organ tested (Figure 1 - supplement 2)

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      The paragraph reads now as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      2) Line 115 and figure 1D. From the images presented in figure 1D, it is not clear where CYLC1 and CYLC2 are localized in the round and in elongated spermatids. Please make double staining using a second Ab to identify the acrosome such as DPY19L2 (best option) or SP56 and the manchette such as acetylated alpha-tubulin.

      We agree, and we added a double staining of CYLC1/CYLC2 and SP56 to the supplement (Figure 1 - supplement 3), showing co-localization of the developing acrosome and Cylicins. Manchette staining was not performed due to antibodies being available in same species as those against Cylicins (anti-rabbit).

      Line 117-120: Immunofluorescence staining of wildtype testicular tissue showed presence of both, CYLC1 and CYLC2 from the round spermatid stage onward (Fig. 1 E, Figure 1 – supplement 3). The signal was first detectable in the subacrosomal region as a cap like structure, lining the developing acrosome (Fig. 1 E-F, Figure 1 – supplement 3).

      3) Line 118 and figure 1. The drawing showing the localization of Cylicin in mature sperm does not fit with the experimental data. Cylicins are located on the whole ventral face of the sperm.

      We agree and apologize for the inconsistency. The illustration was adjusted according to the experimental data showing localization of Cylicins in the whole ventral part of the sperm.

      4) Figure 1: Change "expression of Cylicin" to "localization of cylicin" (green)

      We changed the legend accordingly.

      5) Line 124 and figure 2C. In the figure provided, the PAS staining seems defective. The acrosomes do not seem stained (in pink as expected for a PAS staining). It may be due to the low quality of the pdf file, nevertheless, it is important to provide in supplementary data, an enlargement of the spermatid region showing the staining of the acrosome.

      We apologize for the bad quality of the PDF file and the low magnification. We restructured the subfigure showing PAS stained spermatids at different steps of spermiogenesis at higher magnification. According to the initial reviewer’s suggestion, the PAS staining was moved to figure 4. The PAS staining in figure 2 was replaced by HE-stained overview testis sections in Figure 3 – supplement 1 showing intact spermatogenesis in all genotypes.

      6) Line 130. Please indicate a reference for the lower limit of 58%. If this lower limit corresponds to human sperm, it should be omitted.

      Indeed, the given reference limit of 58% is only valid for human sperm samples. Therefore, we omitted the reference limit. The paragraph reads now as follows: Line 144-146: Eosin-Nigrosin staining revealed that the viability of epididymal sperm from all genotypes was not severely affected (Fig. 2 D, Figure 2 – supplement 2).

      7) line 152 Sperm morphology. Before showing the ultrastructure of the sperm, it would be important to show sperm morphology observed by optical microscopy. Therefore, I recommend including figure S2 as a principal figure, with a mix of Figures 3B and 3E.

      We thank the reviewer for the suggestion. The results section was re-structured accordingly, first showing results of optical microscopy (Fig. 2), followed by an in-depth ultrastructural investigation of morphological defects and their effects on sperm motility. Brightfield images of epididymal sperm were moved from former Figure S2 to main Figure 2.

      8) Line 164. figure S2A, showing that the 9+2 pattern is normal in KO sperm, is not convincing. Enlargement is required to conclude that the axoneme structure is normal; from the pictures, it rather seems that some doublets are missing.

      We apologize for the bad quality of the TEM pictures showing the axonemes. We recorded and included new images showing an intact 9+2 microtubular structure.

      9) Line 196. I would suggest removing the term "mild globozoospermia". Globozoospermia is rather complete (100% of round sperm heads) or incomplete (<90 % of round sperm heads). The anomalies observed on sperm heads, sperm motility, and the decrease in sperm concentration are rather suggestive of an OAT.

      We agree and we omitted the term “mild globozoospermia”. Instead, we added a concluding remark to the section, summarizing the described defects as OAT syndrome. The section reads now as follows:

      Line 215-217: Taken together, observed anomalies of sperm heads, impaired sperm motility, and the decrease in epididymal sperm concentration show that Cylc deficiency results in a severe OAT phenotype (Oligo-Astheno-Teratozoospermia-syndrome) described in human.

      10) Line 248. It is not clear from the data of figure 4B that "the developing acrosome lost its compact adherence to the nuclear envelope". From this figure, only defective morphologies of the acrosome are observed

      We agree and we omitted the sentence. Furthermore, it does not add additional information to the manuscript, since defects in the attachment of the acrosome to the nuclear envelope have been described in detail in Figure 4C.

      11) line 260-264. Manchette defects appear at stages 9-10. At this stage, the HTCA is already attached to the nucleus of the spermatid. see for instance figure 2 from Shang Y, Zhu F, Wang L, Ouyang YC, Dong MZ, Liu C, Zhao H, Cui X, Ma D, Zhang Z, Yang X, Guo Y, Liu F, Yuan L, Gao F, Guo X, Sun QY, Cao Y, Li W. Essential role for SUN5 in anchoring sperm head to the tail. Elife. 2017 Sep 25;6:e28199. doi: 10.7554/eLife.28199 . Therefore, the hypothesis that "abnormal attachment of the developing flagellum to the basal plate and consequently flipping of the head and coiling of the tail in mature spermatozoa" is unlikely and I suggest modifying this paragraph. In the HOOK paper, the manchette defects occurred earlier.

      We read the suggested literature and we agree to this reviewer’s comment. Manchette defects that we observe appear at later stages and are probably not responsible for the morphological anomalies of the mature sperm. We also re-analyzed all the manchette staining pictures and didn’t find any defects at earlier stages, so we decided to delete the sentence from the manuscript.

      12) Line 344. Please indicate a percentage of headless spermatozoa. Many sperm is too vague.

      We agree and we quantified the percentage of sperm showing abnormal flagella and a headless phenotype. The paragraph reads now as follows:

      Line 335-339: Bright field microscopy demonstrated that M2270’s sperm flagella coiled in a similar manner compared to flagella of sperm from Cylicin deficient mice. Quantification revealed 57% of M2270 sperm displaying abnormal flagella compared to 6% in the healthy donor (Fig. 7 D). Interestingly, DAPI staining revealed that 27% of M2270 flagella carry cytoplasmatic bodies without nuclei and could be defined as headless spermatozoa (Fig. 7 C, white arrowhead; Fig. 7 E).

      13) Any data concerning the success of ICSI for this patient?

      Yes, the outcome of the ICSI were added to the main text. Line 309-311: The couple underwent one ICSI procedure which resulted in 17 fertilized oocytes out of 18 retrieved. Three cryo-single embryo transfers were performed in spontaneous cycles, but no pregnancy was achieved.

      14) Finally, it would be interesting to study the localization of PLCzeta in this model, since its localization in the perinuclear theca has been clearly shown (Escoffier et al, 2015 doi:10.1093/molehr/gau098 )

      We thank the reviewer for the valuable suggestion and performed PLCzeta staining on human sperm, clearly showing an irregular PT staining pattern in sperm of patient M2270 compared to healthy control sperm. Of note, staining was not possible in the mouse due to the antibody being reactive only for human samples.

      The section reads as follows:

      Line 343-349: Testis specific phospholipase C zeta 1 (PLCζ1) is localized in the postacrosomal region of PT in mammalian sperm (Yoon and Fissore, 2007) and has a role in generating calcium (Ca²⁺) oscillations that are necessary for oocyte activation (Yoon, 2008). Staining of healthy donor’s spermatozoa showed a previously described localization of PLCζ1 in the calyx, while sperm from M2270 patient presents signal irregularly through the PT surrounding sperm heads (Fig. 7 G). These results suggest that Cylicin deficiency can cause severe disruption of PT in human sperm as well, causing male infertility.

      Reviewer #3 (Recommendations For The Authors):

      1) Why the Cylc1-/y Cylc2+/- males were infertile? It would be helpful to show the homologue of the two proteins;

      To elaborate more on the homology of CYLC1 and CYLC2, we added a more detailed section about the protein and domain structure to the introduction.

      Line 73-78: In most species, two Cylicin genes, Cylc1 and Cylc2, have been identified (Figure 1supplement 1). They are characterized by repetitive lysine-lysine-aspartic acid (KKD) and lysine-lysineglutamic acid (KKE) peptide motifs, resulting in an isoelectric point (IEP) > pH 10 14, 15. Repeating units of up to 41 amino acids in the central part of the molecules stand out by a predicted tendency to form individual short α-helices (Hess et al., 1993). Mammalian Cylicins exhibit similar protein and domain characteristics, but CYLC2 has a much shorter amino-terminal portion than CYLC1 (Figure 1supplement 1).

      Speculations about the infertility of Cylc1-/y Cylc2+/- males was added to the discussion:

      Line 410-413: Interestingly, Cylc1-/Y Cylc2+/- males displayed an “intermediate” phenotype, showing slightly less damaged sperm than Cylc2-/- and Cylc1-/Y Cylc2-/- animals. This further supports our notion, that loss of the less conserved Cylc1 gene might be at least partially compensated by the remaining Cylc2 allele.

      2) Western blot is important to show the absence of the two proteins in the mouse models;

      To further verify the specificity of generated antibodies and the generation of functional knockout alleles, we additionally performed Western Blot analysis on cytoskeletal protein fractions, confirming the results of the IF staining.

      A paragraph was added to the manuscript and reads as follows:

      Line 127-134: Additionally, Western Blot analyses confirmed the absence of CYLC1 and CYLC2 in cytoskeletal protein fractions of the respective knockout (Fig. 1 G), thereby demonstrating i) specificity of the antibodies and ii) validating the gene knockout. Of note, the CYLC1 antibody detects a double band at 40-45 KDa. This is smaller than the predicted size of 74 KDa as, but both bands were absent in Cylc1-/y. Similarly, the CYLC2 Antibody detected a double band at 38-40 KDa instead of 66 KDa. Again, both bands were absent in Cylc2-/-. Next, Mass spectrometry analysis of cytoskeletal protein fraction of mature spermatozoa was performed detecting both proteins in WT but not in the respective knockout samples (Figure 1 – supplement 5; Figure 1 – supplement 6).

      3) On Page 7, line 227 and line 243, was the acetylated α-tubulin or α-tubulin antibody used?

      For all stainings α-tubulin antibody was used. We corrected this accordingly. Line 257-259: We used immunofluorescence staining of α-tubulin on squash testis samples containing spermatids at different stages of spermiogenesis to investigate whether the altered head shape, calyx structure, and tail-head connection anomalies originate from possible defects of the manchette structure.

      4) Fig. 2S: A cartoon showing the elongation and circularity of nuclei for evaluation is helpful; The TEM images from the control and Cylc1 KO mice are needed;

      Cylc1-/Y TEM picture was added in Figure 3A.

      5) The discussion should be rewritten. The current version is to repeat the experiments/findings. The authors should discuss more about the potential mechanisms.

      We discussed the observed defects of Cylc-deficient animals and discussed this in relation to other published mouse models deficient in Calyx components. Furthermore, we speculated about potential interaction partners of Cylicins and the importance of these protein complexes for male fertility. However, to this point, we think that it is too farfetched to speculate about potential mechanisms without any evidence for Cylc interaction partner or their exact molecular function. This requires further research.

    1. Author Response

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not.

      The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Rai1 encodes the transcription factor retinoic acid-induced 1 (RAI1), which regulates expression of factors involved in neuronal development and synaptic transmission. Rai1 haploinsufficiency leads to the monogenic disorder Smith-Magenis syndrome (SMS), which is associated with excessive feeding, obesity and intellectual disability. Consistent with findings in human subjects, Rai1+/- mice and mice with conditional deletion of Rai1 in Sim+ neurons, which are abundant in the paraventricular nucleus (PVN), exhibit hyperphagia, obesity and increased adiposity. Furthermore, RAI1-deficient mice exhibit reduced expression of brain-derived neurotrophic factor (BDNF), a satiety factor essential for the central control of energy balance. Notably, overexpression of BDNF in PVN of RAI1-deficient mice mitigated their obesity, implicating this neurotrophin in the metabolic dysfunction these animals exhibit. In this follow up study, Javed et al. interrogated the necessity of RAI1 in BDNF+ neurons promoting metabolic health.

      Consistent with previous reports, the authors observed reduced BDNF expression in the hypothalamus of Rai1+/- mice. Moreover, proteomics analysis indicated impairment in neurotrophin signaling in the mutants. Selective deletion of Rai1 in BDNF+ neurons in the brain during development resulted in increased body weight, fat mass and reduced locomotor activity and energy expenditure without changes in food intake. There was also a robust effect on glycemic control, with mutants exhibiting glucose intolerance. Selective depletion of RAI1 in BDNF+ neurons in PVN in adult mice also resulted in increased body weight, reduced locomotor activity, and glucose intolerance without affecting food intake. Blunting RAI1 activity also leads to increases and decreases in the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN.

      Strengths:

      Overall, the experiments are well designed and multidisciplinary approaches are employed to demonstrate that RAI1 deficits in BDNF+ neurons diminish hypothalamic BDNF signaling and produce metabolic dysfunction. The most significant advance relative to previous reports is the finding from electrophysiological studies showing that blunting RAI1 activity leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN. Furthermore, that intact RAI1 function is required in BDNF+ neurons for the regulation of glucose homeostasis.

      Weaknesses:

      Some of the data need to be reconciled with previous findings by others. For example, the authors report that more than 50% of BDNF+ neurons in PVN also express pTrkB whereas about 20% of pTrkB+ cells contain BDNF, raising the possibility that autocrine mechanisms might be at play. This is in conflict with a previous study by An et al, (2015) showing that these cell populations are largely non-overlapping in the PVN.

      We fully agree with this assessment. Given the difficulty of using immunostaining to characterize the expression of membrane proteins in vivo, and the specificity of the pTrkB antibody in different tissues remains unknown, it is difficult to interpret the signals we observed. We have excluded the data because the histological analysis of p-TRKB and BDNF autocrine/paracrine signalling is not a focus of the present study. Future studies using a more advanced genetic method (i.e., Ntrk2CreER/+; Ai9 mouse line as used by An et al., 2015) is more suitable and should be used in the future to investigate the function of Rai1 in the TRKB+ neurons.

      Another issue that deserves more in-depth discussion is that diminished BDNF function appears to play a minor part driving deficits in energy balance regulation. Accordingly, both global central depletion of Rai1 in BDNF+ neurons during development and deletion of Rai1 in BDNF+ neurons in the adult PVN elicited modest effects on body weight (less than 18% increase) and did not affect food intake. This contrasts with mice with selective Bdnf deletion in the adult PVN, which are hyperphagic and dramatically obese (90% heavier than controls). Therefore, the results suggest that deficits in RAI1 in PVN or the whole brain only moderately affect BDNF actions influencing energy homeostasis and that other signaling cascades and neuronal populations play a more prominent role driving the phenotypes observed in Rai1+/- mice, which are hyperphagic and 95% heavier than controls. The results from the proteomic analysis of hypothalamic tissue of Rai1 mutant mice and controls could be useful in generating alternative hypotheses. Depleting RAI1 in BDNF+ neurons had a robust effect compromising glycemic control. However, as the approach does not necessarily impact BDNF exclusively, there should be a larger discussion of alternative mechanisms.

      We thank the reviewer for these insightful comments. We want to highlight that global deletion of Rai1 from BDNF neurons did induce food intake increase in male mice (Fig 2figure supplement 4K). We have incorporated the following paragraphs into the discussion section.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent hypothalamic cell types residing in brain regions other than PVH regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1-expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Reviewer #2 (Public Review):

      Understanding disease conditions often yields valuable insights into the physiological regulation of biological functions, as well as potential therapeutic approaches. In previous investigations, the author's research group identified abnormal expression of brain-derived neurotrophic factor (BDNF) in the hypothalamus of a mouse model exhibiting Smith-Magenis syndrome (SMS), which is caused by heterozygous mutations of the Rai1 gene. Human SMS is associated with distinct facial characteristics, sleep disturbances, behavioral issues, and intellectual disabilities, often accompanied by obesity. Conditional knockout (cKO) of the Bdnf gene from the paraventricular hypothalamus (PVH) in mice led to hyperphagic obesity, while overexpression of the Bdnf gene in the PVH of Rai1 heterozygous mice restored the SMS-like obese phenotype. Based on these preceding findings, the authors of the present study discovered that homozygous Rai1 cKO restricted to Bdnf-expressing cells, or Rai1 gene knockdown solely in Bdnf-positive neurons in the PVH, induced obesity along with intricate alterations in adipose tissue composition, energy expenditure, locomotion, feeding patterns, and glucose tolerance, some of which varied between sexes. Additionally, the authors demonstrated that a brain-penetrating drug capable of activating the TrkB pathway, a downstream signaling pathway of BDNF, partially alleviated the SMS-like obesity phenotype in female mice with Rai1 heterozygous mutations. Although the specific (neural) cell type responsible for this TrkB signaling remains an open question, the present study unequivocally highlights the importance of Rai1 gene function in PVH Bdnf neurons for the obesity phenotype, providing valuable insights into potential therapeutic strategies for managing obesity associated with SMS.

      In the proteomic analysis (Fig. 1), the authors elucidated that multiple phospho-protein signaling pathways, including Akt and mTOR pathways, exhibited significant attenuation in the SMS model mice. Of significance, the manifestation of haploinsufficiency of the Rai1 gene exclusively within the BDNF+ cells demonstrated negligible impact on body weight (Fig. 2supple 3D), despite observing a reduction in BDNF levels in the heterozygous Rai1 mutant (Fig. 1A). Conversely, the homozygous Rai1 cKO in the BDNF+ cells prominently displayed an obesity phenotype, suggesting substantial dissimilarities in the gene expression profiles between Rai1 heterozygous and homozygous conditions within the BDNF+ cell population. It would be advantageous to precisely identify the responsible differentially expressed genes, possibly including Bdnf itself, in the homozygous cKO model. The observed reduction in the excitability of PVH BDNF+ cells (Fig. 3) is presumably attributed to aberrant gene expression other than Bdnf itself, which may serve as a prospective target for gene expression analysis. Notably, the Rai1 homozygous cKO mice in BDNF+ cells exhibited some sexual dimorphisms in feeding and energy expenditures, as evidenced by Fig. 2 and related figures. Exploring the potential relevance of these sexual differences to human SMS cases and investigating the underlying cellular/molecular mechanisms in the future would provide valuable insights.

      Although the CRISPR-mediated knockdown of the Rai1 gene (Fig. 4) appears to be highly effective, given the broad transduction of AAV serotype 9, it may be helpful to exclude the possibility of other brain regions adjacent to the PVH, such as the DMH or VMH, being affected by this viral procedure. If the PVH-specificity is established, the majority of Rai1 cKO effects in Bdnf+ cells are primarily attributed to PVH-Bdnf+ cells based on the similarity of phenotypes observed. With regards to the apparent rescue of the body weight phenotype in Rai1 heterozygous mutants using a selective TrkB activator, the specific biological processes, and neurons responsible for this effect remain unclear to this reviewer. Elucidating these aspects would be significant when considering potential applications to human SMS cases.

      We appreciate the reviewer's insightful comments. We agree that the logical next step would be to identify the profile of the differentially expressed genes in our homozygous conditional knockout model. We have included the following paragraphs in the discussion.

      Lines 364-384: “Notably, mice lacking one copy of Rai1 in the BDNF-producing cells do not exhibit obesity, whereas SMS patients and SMS mice show pronounced obesity (Burns et al., 2010; Huang et al., 2016; Smith et al., 2005). This indicates that although reduced Bdnf expression and BDNF-producing neurons contribute to regulating body weight, additional molecular changes and other hypothalamic populations also play important roles in regulating body weight homeostasis in SMS. Our RPPA data suggest that mTOR signalling is also misregulated in addition to the reduced activation of the neurotrophin downstream cascades. Hypothalamic mTORC1 is crucial to regulate glucose release from the liver, peripheral lipid metabolism, and insulin sensitivity (Burke et al., 2017; Caron et al., 2016; Smith et al., 2015), while mTORC2 regulates glucose tolerance and fat mass (Kocalis et al., 2014). How the impaired mTOR signalling contributes to energy homeostasis defects in SMS and the therapeutic potential of targeting this pathway to treat SMS-related obesity remains unclear and warrants future investigation.

      What additional Rai1-dependent non-PVH hypothalamic cell types regulate obesity in SMS? Other important cell types such as TRKB neurons within the PVH (An et al., 2020) and several RAI1expressing hypothalamic nuclei including the arcuate nucleus, ventromedial nucleus of the hypothalamus (VMH), and lateral hypothalamus all play important roles in regulating energy homeostasis. POMC- and AGRP-expressing neurons within the arcuate nucleus are known to regulate food intake and glucose and insulin homeostasis (Quarta et al., 2021; Vohra et al., 2022). Therefore, Rai1 function in these neurons could contribute to obesity in SMS, a topic that awaits future investigation.”

      Lines 409-418: “It is plausible that RAI1 regulates the expression of genes encoding inward rectifier K+ channels, which regulate neuronal activity and potentially energy homeostasis. For instance, KIR6 (a family of ATP-sensitive potassium channels, KATP) is widely expressed in the hypothalamus. Deleting the hypothalamic KIR6.2 subunit impairs KATP channel function and glucose tolerance (Miki et al., 2001; Parton et al., 2007). Moreover, reduced expression of hypothalamic GIRK4 (encoding an inwardly rectifying potassium channel) causes obesity (Perry et al., 2008). GABAergic neurotransmission from arcuate AGRP-expressing neurons to the PVH neurons is important to increase appetite by favouring hyperphagia (Atasoy et al., 2012). Disrupting the composition of these ion channels could contribute to reduced PVHBDNF neuronal firing, which awaits further investigations.”

      Moreover, to facilitate the future exploration of the potential relevance of sexual differences to human SMS cases, we have incorporated the following explanation in the discussion section.

      Lines 419-426: “Female mice with a conditional knockout of Rai1 from BDNF-producing neurons do not display a noteworthy difference in food intake. Conversely, their male counterparts exhibit a significant increase in food intake. Although SMS individuals of both genders tend to overeat, male patients who are obese show significantly higher food consumption than their female counterparts (Gandhi et al., 2022). This observation raises the possibility that Rai1 regulates eating behaviours through multiple cell types in the hypothalamus and that a male-specific involvement of BDNF-producing neurons in regulating food intake, potentially provides a neurobiological basis for the observed pattern in SMS patients (Gandhi et al., 2022).”

      To exclude the possibility of other brain regions adjacent to the PVH (such as VMH and arcuate nucleus) being affected by our AAV-CRISPR-mediated Rai1 knockout, we have analyzed other hypothalamic regions including VMH and arcuate nucleus from the same slides used to confirm PVH viral expression and we confirmed that the AAV was not expressed in these regions. We have incorporated a representative image (Figure 4 suppl 1F) depicting limiting AAV expression in these nuclei.

      Regarding LM22A-4: It is possible that LM22A-4 functions directly through binding to TRKB or indirectly engages TRKB downstream molecules through activating other receptors such as GPCR. LM22A-4 appears to engage neurotrophin downstream PI3KAKT pathway, which was identified by our RPPA analysis to be downregulated in the hypothalamus of Rai1-deficient mice. Reduced AKT activity is associated with insulin resistance and obesity in mice. Restoration of functional activity of AKT by LM22A-4 could be the primary mode of action for this drug in the brain. However, since we observed that this drug only partially rescued the body weight defect, future research exploring more potent TrkB agonists or utilizing a combination therapy that targets both the neurotrophin and mTOR pathways might yield improved responses to the pharmacological interventions. We have included the following paragraph in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      Overall, the present study represents a valuable addition to the authors' series of high-quality molecular genetic investigations into the in vivo functions of the Rai1 gene. This reviewer particularly commends their diligent efforts to enhance our comprehension of SMS and contribute to the future development of more effective therapies for this syndrome.

      We thank the reviewer for finding our study valuable in advancing the understanding of RAI1 function.

      Reviewer #3 (Public Review):

      Summary:

      Smith-Magenis syndrome (SMS) is associated with obesity and is caused by deletion or mutations in one copy of the Rai1 gene which encodes a transcriptional regulator. Previous studies have shown that Bdnf gene expression is reduced in the hypothalamus of Rai1 heterozygous mice. This manuscript by Javed et al. further links SMS-associated obesity with reduced Bdnf gene expression in the PVH.

      Strengths:

      The authors show that deletion of the Rai1 gene in all BDNF-expressing cells or just in the PVH BDNF neurons postnatally caused obesity. Interestingly, mutant mice displayed sexual dimorphism in the cause for the obesity phenotype. Overall, the data are well presented and convincing except the data from LM22A-4.

      Weaknesses:

      1) The most serious concern is about data from LM22A-4 administration experiments (Figure 5 and associated supplemental figures). A rigorous study has demonstrated that LM22A-4 does not activate TrkB (Boltaev et al., Science Signaling, 2017), which is consistent with unpublished results from many labs in the neurotrophin field. It is tricky to interpret body weight data from pharmacological studies because compounds always have some side effects, which can reduce body weight non-specifically.

      We thank this reviewer for their valuable comments. Indeed, the precise mechanism by which LM22A-4 exerts its effect is not entirely clear and there has been mixed evidence regarding its identity as a TRKB agonist in vitro. We have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increasing AKT phosphorylation in vivo. We have modified the title to remove TRKB, and the following changes have been made in the discussion:

      Lines 451-461: “ We recognize that while several in vivo studies have demonstrated the potential of LM22A-4 in targeting neurotrophin downstream signalling (Kron et al., 2014; Li et al., 2017), an in vitro analysis failed to demonstrate the ability of LM22A-4 to activate TrkB directly (Boltaev et al., 2017). Therefore, the precise mechanism by which LM22A-4 enhances AKT cascades in the mammalian brain remains unclear and awaits further investigations. In the hypothalamus of SMS mice, LM22A-4 could indirectly engage neurotrophin downstream PI3KAKT pathway through the G protein-coupled receptor-dependent transactivation of the TRKB receptor (Domeniconi & Chao, 2010) or other unknown mechanisms. Moreover, while LM22A4 may have potential side effects, we found that wild-type mice treated with LM22A-4 did not show a further decrease in body weight, suggesting limited side effects regarding body weight regulation.”

      2) The resolution of all figures are poor, and thus I could not judge the quality of the micrographs.

      We have updated with higher resolution images.

      3) Citation of the literature is not precise. The study by An et al. (2015) shows that deletion of the Bdnf gene in the PVH leads to obesity due to increased food intake and reduced energy expenditure (not just hyperphagic obesity; Line 72). Furthermore, the study by Unger et al. (2017) carried out Bdnf deletion in the VMH and DMH using AAV-Cre and did not discuss SF1 neurons at all (Line 354). The two studies by Yang et al. (Mol Endocrinol, 2016) and Kamitakahara et al. (Mol Metab, 2015) did use SF1-Cre to delete the Bdnf gene and did not observe any obesity phenotype.

      We thank the reviewer for bringing this to our attention. We have revised the text to ensure accurate representation of the cited publications. The following changes have been made: Lines 348-350: “ Although BDNF is required in the VMH and DMH to regulate body weight (Unger et al., 2007), embryonic deletion of Bdnf from the SF1-lineage populations including the VMH did not result in obesity (Kamitakahara et al., 2016; Yang et al., 2016).”

      4) Animal number is not described in many figure legends.

      We thank the reviewer for pointing it out. We have revised the manuscript to incorporate the missing animal numbers.

      Reviewer #1 (Recommendations For The Authors):

      Additional points:

      1) The data provided indicating increased inhibitory tone onto BDNF neurons in PVN of Rai1 mutant mice are not convincing that inhibitory drive is significantly affected.

      We have modified the sentences as follows, we have also deleted these conclusions from the abstract and discussion:

      Lines 215-220: “We observed a slight rightward shift of the probability of miniature inhibitory postsynaptic current (mIPSC) frequency in cKO PVHBDNF neurons, although the average frequency (Fig 3K) was not significantly different between groups. The probability of mIPSC amplitude also showed a right shift without a significant change (Fig 3L, Figure 3—figure supplement 1D). However, we observes a significant increased area under the curve (Fig 3M).”

      2) Fig. 3C - Was outlier analysis performed for these data? One of the data points for the control group looks like an outlier that might be skewing the data.

      We performed an outlier analysis and found that indeed one data point was an outlier, after removing this data point, the data remained statistically significant (*p<0.05) and the new manuscript has been updated.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript would benefit from improved usage and precise descriptions of statistics. The authors often provided only general statements such as "one or two-way ANOVA" without specifying the exact statistical tests used. It is important to differentiate between one-way and two-way ANOVA, particularly when using the latter, by clearly indicating the within-group effects and interaction effects. The representation of p-values associated with ANOVA using asterisks requires clarification, specifying which statistics indicate ANOVA results and which ones correspond to post hoc analysis. It is advisable to assess the normality of the distribution before employing t-tests or consider non-parametric comparisons such as Wilcoxon's rank sum test if normality assumptions are not met. Additionally, it is essential to specify whether the tests are one-sided or two-sided and whether they are paired or unpaired. In some figure panels, such as Fig. 2H and K, the statistical tests used were not indicated at all.

      We have clarified the exact statistical tests in the figure legend for each figure.

      2) Rearranging the figures to facilitate a direct comparison of the sexual phenotypes (Fig. 2 and Fig. 2-supple 4) within the same figures would greatly improve reader comprehension.

      We have decided to keep the figure arrangement because of the focus on female mice in the main figures.

      3) To improve the comprehension of the figures and text, the following points should be addressed:

      • Fig. 1D: The definition of the expression level in the color code is not clear.

      Explanation for the color code has been added in the method section.<br /> Lines 652-656: “The vertical axis of the dendrogram represents the dissimilarity (measured as distance) between protein expressions, and the horizontal axis represents the individual test samples. The colour code (ranging from red to yellow to green) specifies the expression levels of different proteins, where red indicates nifies low expression, yellow indicates intermediate expression, and green indicates high expression.”

      • Fig. 1F: One parenthesis is missing from the figure label.

      Fixed

      • Fig. 2C: It is unclear why there are so many dots for just n = 3 animals. It would be better to specify the conditions or use "animals" as a unit of measurement.

      The dots represent percentage cells quantified per sliced from 3 animals. It has been clarified in the figures.

      • Fig. 2F: There seems to be an unnecessary label "I" in the middle of the panel.

      Fixed

      • It is not completely clear if the data in Fig. 2E-L were all obtained at 26 weeks of age.

      To clarify, following line has been added to the method section:

      Lines 517-518: “After the 25th week, mice were subjected to body composition analysis.”

      • In Fig. 2-Supple 1, the legend should read "G-J." Additionally, please provide a definition for the arrowheads.

      Line 1086: “yellow arrowheads indicate Ai9 marked BDNF cells co-expressing endogenous BDNF.”

      • It is not completely clear if the data in Fig. 3 were all obtained from female mice.

      It is explained in the legend of Fig 3.

      • The description of the number of animals seems to be missing in Fig. 4

      The description for the number of animals has been added in the figure legend. Line 1004: “(Ctrl group: n=5, Exp group: n =5)”

      • On line 280-281, "Fig 4A." should be corrected to "Fig. 5A."

      Corrected.

      • In Fig. 5C-E, it is uncertain if multiple pairwise comparisons for three groups are statistically appropriate. At the very least, multiple comparisons should be corrected.

      We performed two-way ANOVA where mean body weight of age-matched groups were compared with each other (i.e. between control saline-injected and SMS saline-injected, SMS saline-injected and LM22A-4 -saline injected, and Control saline-injected and SMS LM22A-4 injected). We used Šidák’s multiple comparisons test, where statistical significance was indicated with p<0.05, p < 0.01, p<0.001, **p < 0.0001. We have clarified this in the figure 5 legends.

      • The unit of measurement should be standardized across figures, if possible, to facilitate better side-by-side comparisons. For example, most bodyweight figures use "g" (grams), but "mg" (milligrams) is used in Fig. 5.

      All measurements are corrected to be consistent (in grams).

      • It is unclear if nM (not mM) of glucose was actually measured in the glucose tolerance test (Fig. 2L and Fig. 4L).

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      1) The authors can remove the LM22A-4 data without much detrimental effects on the conclusion of the manuscript. Otherwise, the authors have to demonstrate that LM22A-4 activates TrkB, does not have any toxicity, and does not cause aversion.

      We thank this reviewer the valuable comments and we acknowledge the valid concern. Indeed, the precise mechanism by which LM22A-4 exert its effects is not clear and there has been mixed opinions regarding its function as TRKB agonist in in-vitro assays. To clarify, we have refrained from stating LM22A-4 as a partial agonist of TRKB, and instead have focused on highlighting the potential of this drug in activating neurotrophin downstream signalling through increased AKT phosphorylation, in-vivo.

      We have also modified the title of our article to exclude the word “TRKB Signalling”. The new title is as follows:

      “Smith-Magenis syndrome protein RAI1 regulates body weight homeostasis through hypothalamic BDNF-producing neurons and neurotrophin downstream signalling”

      2) Line 50: "40% > 95th percentile weight, 40% > 85th percentile weight" should be "40% > 95th percentile weight, 80% > 85th percentile weight".

      Corrected.

      3) Abbreviations for brain-derived neurotrophic factor: Bdnf for gene and BDNF for protein.

      Abbreviations have been corrected throughout the manuscript.

      4) Need to specify the animal age when viruses were injected into the PVH to inactivate the Bdnf gene.

      Line 235: Virus was injected at 3 weeks of age. It has been specified in the main text.

      5) Line 832: "3 technical triplicates" can be simplified as "3 technical repeats" because 3 and triplicates are redundant.

      Corrected.

      6) Figure 2B: The "O" in cKO is misplaced.

      Fixed.

      7) Figure 3: The black legends in E and F should include Ctrl.

      Fixed in the Figure 3.

    1. Author Response

      The data we produce are not criticized as such and thus, do not require revision; the criticisms concern our interpretation of them. General themes of the reviews are that i) genetic signatures do not matter for defining neuronal types (here sympathetic versus parasympathetic); ii) that a cholinergic postganglionic autonomic neuron must be parasympathetic; and iii) that some physiology of the pelvic region would deserve the label “parasympathetic”. We answered the latter argument in (Espinosa-Medina et al., 2018) to which we refer the interested reader; and we fully disagree with the first two. Of note, part of the last sentence of the eLife assessment is misleading and does not reflect the referees’ comments. Our paper analyses genetic differences between the cranial and sacral outflow and uses them to argue that they cannot be both parasympathetic. The eLife assessment acknowledges the “genetic differences” but concludes that, somehow, they don’t detract from a common parasympathetic identity. We take issue with this paradox, of course, but it is coherent with the referee’s comments. On the other hand, the eLife assessment alone pushes the paradox one step further by stating that “functional differences” between the cranial and sacral outflows can’t either prevent them from being both parasympathetic. We would also object to this, but the only “functional differences” used by the referees to dismiss our diagnostic of a sympathetic-like character (rather than parasympathetic) for the sacral outflow are between noradrenergic and cholinergic, and between sympathetic and parasympathetic (and we also disagree with those, see above, and below) —not between cranial and sacral.

      We will thus use the opportunity offered by eLife to keep the paper as it is (with a few minor stylistic changes). We respond below to the referees’ detailed remarks and hope that the publication, as per eLife new model, of the paper, the referees’ comments and our response will help move the field forward.

      Public review by Referee #1

      “Consistently, the P3 cluster of neurons is located close to sympathetic neuron clusters on the map, echoing the conventional understanding that the pelvic ganglia are mixed, containing both sympathetic and parasympathetic neurons”.

      The greater closeness of P3 than of P1/2/4 to the sympathetic cluster can be used to judge P1/2/4 less sympathetic than P3 (and more… something else), but not more parasympathetic. There is no echo of the “conventional understanding” here.

      “A closer look at the expression showed that some genes are expressed at higher levels in sympathetic neurons and in P2 cluster neurons ” [We assume that the referee means “in sympathetic neurons and in P3 cluster neurons”] but much weaker in P1, P2, and P4 neurons such as Islet1 and GATA2, and the opposite is true for SST. Another set of genes is expressed weakly across clusters, like HoxC6, HoxD4, GM30648, SHISA9, and TBX20.

      These statements are inaccurate; On the one hand, the classification is not based on impression by visual inspection of the heatmap, but by calculations, using thresholds. Admittedly, the thresholds have an arbitrary aspect, but the referee can verify (by eye inspection of heatmap) that genes which we calculate as being at “higher levels in sympathetic neurons and in P3 cluster neurons, but much weaker in P1, P2, and P4 neurons” or vice versa, i.e. noradrenergic or cholinergic neurons (genes from groups V and VI, respectively), have a much bigger difference than those cited by the referee, indeed are quasi-absent from the weaker clusters or ganglia. In addition, even by subjective eye inspection:

      Islet is equally expressed in P4 and sympathetics.

      SST is equally expressed in P1 and sympathetics.

      Tbx20 is equally expressed in P2 and sympathetics.

      HoxC6, HoxD4, GM30648, SHISA9 are equally expressed in all clusters and all sympathetic ganglia.

      “Since the pelvic ganglia are in a caudal body part, it is not surprising to have genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa (to have genes expressed in sphenopalatine ganglia, but not in pelvic ganglia), according to well recognized rostro-caudal body patterning, such as nested expression of hox genes.”

      We do not simply show “genes expressed in pelvic ganglia, but not in rostral sphenopalatine ganglia, and vice versa”, i.e. a genetic distance between pelvic and sphenopalatine, but many genes expressed in all pelvic cells and sympathetic ones, i.e. a genetic proximity between pelvic and sympathetic. This situation can be deemed “unsurprising”, but it can only be used to question the parasympathetic nature of pelvic cells (as we do), or considered irrelevant (as the referee does, because genes would not define cell types, see our response to an equivalent stance by Referee#2). Concerning Hox genes, we do take them into account, and speculate in the discussion that their nested expression is key to the structure of the autonomic nervous system, including its division into sympathetic and parasympathetic outflows.

      It is much simpler and easier to divide the autonomic nervous system into sympathetic neurons that release noradrenaline versus parasympathetic neurons that release acetylcholine, and these two systems often act in antagonistic manners, though in some cases, these two systems can work synergistically. It also does not matter whether or not pelvic cholinergic neurons could receive inputs from thoracic-lumbar preganglionic neurons (PGNs), not just sacral PGNs; such occurrence only represents a minor revision of the anatomy. In fact, it makes much more sense to call those cholinergic neurons located in the sympathetic chain ganglia parasympathetic.

      This “minor revision of the anatomy” would make spinal preganglionic neurons which are universally considered sympathetic (in the thoraco-lumbar chord), synapse onto large numbers of parasympathetic neurons (in the paravertebral chains for sweat glands and periosteum, and in the pelvic ganglion), robbing these terms of any meaning.

      Thus, from the functionality point of view, it is not justified to claim that "pelvic organs receive no parasympathetic innervation".

      There never was any general or rigorous functional definition of the sympathetic and parasympathetic nervous systems — it is striking, almost ironic, that Langley, creator of the term parasympathetic and the ultimate physiologist, provides an exclusively anatomic definition in his Autonomic Nervous System, Part I. Hence, our definition cannot clash with any “functionality point of view”. In fact, as we briefly say in the discussion and explore in (Espinosa-Medina et al., 2018), it is the “sacral parasympathetic” paradigm which is unjustified from a functionality point of view, for implying a functional antagonism across the lumbo-sacral gap, which has been disproven repeatedly. It remains to be determined which neurons are antagonistic to which on the blood vessels of the external genitals; antagonism within one division of the autonomic nervous system would not be without precedent (e.g. there exist both vasoconstrictor and vasodilator sympathetic neurons, and both, inhibitor and activator enteric motoneurons). The way to this question is finally open to research, and as referee#2 says “it is early days”.

      Public review by Referee #2

      This work further documents differences between the cranial and sacral parasympathetic outflows that have been known since the time of Langley - 100 years ago.

      We assume that the referee means that it is the “cranial and sacral parasympathetic outflows” which “have been known since the time of Langley”, not their differences (that we would “further document”): the differences were explicitly negated by Langley. As a matter of fact, the sacral and cranial outflows were first likened to each other by Gaskell, 140 years ago (Gaskell, 1886). This anatomic parallel (which is deeply flawed (Espinosa-Medina et al., 2018)) was inherited wholesale by Langley, who added one physiological argument (Langley and Anderson, 1895) (which has been contested many times (Espinosa-Medina et al., 2018) and references within).

      In addition, the sphenopalatine and other cranial ganglia develop from placodes and the neural crest, while sympathetic and sacral ganglia develop from the neural crest alone.

      Contrary to what the referee says, the sphenopalatine has no placodal contribution. There is no placodal contribution to any autonomic ganglion, sympathetic or parasympathetic (except an isolated claim concerning the ciliary ganglion (Lee et al., 2003)). All autonomic ganglia derive from the neural crest as determined a long time ago in chicken. For the sphenopalatine in mouse, see our own work (Espinosa-Medina et al., 2014).

      One feature that seems to set the pelvic ganglion apart is […] the convergence of preganglionic sympathetic and parasympathetic synapses on individual ganglion cells (Figure 3). This unusual organization has been reported before using microelectrode recordings (see Crowcroft and Szurszewski, J Physiol (1971) and Janig and McLachlan, Physiol Rev (1987)). Anatomical evidence of convergence in the pelvic ganglion has been reported by Keast, Neuroscience (1995).

      Contrary to what the referee says, we do not provide in Figure 3 any evidence for anatomic convergence, i.e. for individual pelvic ganglion cells receiving dual lumbar and sacral inputs. We simply show that cholinergic neurons figure prominently among targets of the lumbar pathway. This said, the convergence of both pathways on the same pelvic neurons, described in the references cited by the referee, is another major problem in the theory of the “sacral parasympathetic” (as we discussed previously (Espinosa-Medina et al., 2018)).

      It should also be noted that the anatomy of the pelvic ganglion in male rodents is unique. Unlike other species where the ganglion forms a distributed plexus of mini-ganglia, in male rodents the ganglion coalesces into one structure that is easier to find and study. Interestingly the image in Figure 3A appears to show a clustering of Chat-positive and Th-positive neurons. Does this result from the developmental fusion of mini ganglia having distinct sympathetic and parasympathetic origins?

      The clustering of Chat-positive and Th-positive cells could arise from a number of developmental mechanisms, that we have no idea of at the moment. This has no bearing on sympathetic and parasympathetic.

      In addition, Brunet et al dismiss the cholinergic and noradrenergic phenotypes as a basis for defining parasympathetic and parasympathetic neurons. However, see the bottom of Figure S4 and further counterarguments in Horn (Clin Auton Res (2018)).

      The bottom of Figure S4 simply indicates which cells are cholinergic and adrenergic. We have already expounded many times that noradrenergic and cholinergic do not coincide with sympathetic and parasympathetic. Henry Dale (Nobel Prize 1936) demonstrated this. Langley himself devoted several pages of his final treatise to this exception to his “Theory on the relation of drugs to nerve system” (Langley, 1921) (p43) (which was actually a bigger problem for him than it is for us, for reason which are too long to recount here; it is as if the theoretical difficulties experienced by Langley had been internalized to this day in the form of a dismissal of the cholinergic sympathetic neurons as a slightly scandalous but altogether forgettable oddity). (Horn, 2018), reviews the evidence that the thoracic cholinergic sympathetic phenotype is brought about by a secondary switch upon interaction with the target and argues that this would be a fundamental difference with the sacral “parasympathetic”. But in fact the secondary switch is preceded by co-expression of ChAT and VAChT with Th in most sympathetic neurons (reviewed in (Ernsberger and Rohrer, 2018)); and we have no idea of the dynamic in the pelvic ganglion. It may also be mentioned in this context that target-dependent specification of neuronal identity has also been demonstrated of other types of sympathetic neurons ((Furlan et al., 2016)

      What then about neuropeptides, whose expression pattern is incompatible with the revised nomenclature proposed by Brunet et al.?

      There was never any neuropeptide-inspired criterion for a nomenclature of the autonomic nervous system.

      Figure 1B indicates that VIP is expressed by sacral and cranial ganglion cells, but not thoracolumbar ganglion cells.

      Contrary to what the referee says, there are VIP-positive cells in our sympathetic data set and even strongly positive ones, except they are scattered and few (red bars on the UMAP). They correspond to cholinergic sympathetics, likely sudomotor, which are known to contain VIP (e.g.(Anderson et al., 2006)(Stanke et al., 2006)). In other words, VIP is probably part of what we call the cholinergic synexpression group (but was not placed in it by our calculations, probably because of a low expression level even in sympathetic noradrenergic cells).

      The authors do not mention neuropeptide Y (NPY). The immunocytochemistry literature indicates that NPY is expressed by a large subpopulation of sympathetic neurons but never by sacral or cranial parasympathetic neurons.

      Contrary to what the referee says, Keast (Keast, 1995) finds 3.7% of pelvic neurons double stained for NPY and VIP in male rats, and says (Keast, 2006) that in females “co-expression of NPY and VIP is common” ( thus in cholinergic neurons that the referee calls “parasympathetic”). Single cell transcriptomics is probably more sensitive than immunochemistry, and in our dichotomized data set (table S1), NPY is expressed in all pelvic clusters and all sympathetic ganglia. In other words, it is one more argument for their kinship. It does not appear in the heatmap because it ranks below the 100 top genes.

      References

      Anderson, C. R., Bergner, A. and Murphy, S. M. (2006). How many types of cholinergic sympathetic neuron are there in the rat stellate ganglion? Neuroscience 140, 567–576.

      Ernsberger, U. and Rohrer, H. (2018). Sympathetic tales: subdivisons of the autonomic nervous system and the impact of developmental studies. Neural Dev 13, 20.

      Espinosa-Medina, I., Outin, E., Picard, C. A., Chettouh, Z., Dymecki, S., Consalez, G. G., Coppola, E. and Brunet, J. F. (2014). Neurodevelopment. Parasympathetic ganglia derive from Schwann cell precursors. Science 345, 87–90.

      Espinosa-Medina, I., Saha, O., Boismoreau, F. and Brunet, J.-F. (2018). The “sacral parasympathetic”: ontogeny and anatomy of a myth. Clin Auton Res 28, 13–21.

      Furlan, A., La Manno, G., Lübke, M., Häring, M., Abdo, H., Hochgerner, H., Kupari, J., Usoskin, D., Airaksinen, M. S., Oliver, G., et al. (2016). Visceral motor neuron diversity delineates a cellular basis for nipple- and pilo-erection muscle control. 19, 1331–1340.

      Gaskell, W. H. (1886). On the Structure, Distribution and Function of the Nerves which innervate the Visceral and Vascular Systems. J Physiol 7, 1-80.9.

      Horn, J. P. (2018). The sacral autonomic outflow is parasympathetic: Langley got it right. Clin Auton Res 28, 181–185.

      Jänig, W. (2006). The Integrative Action of the Autonomic Nervous System: Neurobiology of Homeostasis. Cambridge: Cambridge University Press.

      Keast, J. R. (1995). Visualization and immunohistochemical characterization of sympathetic and parasympathetic neurons in the male rat major pelvic ganglion. Neuroscience 66, 655–662.

      Keast, J. R. (2006). Plasticity of pelvic autonomic ganglia and urogenital innervation. International Review of Cytology - a Survey of Cell Biology, Vol 248 248, 141-+.

      Langley, J. N. (1921). In The autonomic nervous system (Pt. I)., p. Cambridge: Heffer & Sons ltd.

      Langley, J. N. and Anderson, H. K. (1895). The Innervation of the Pelvic and adjoining Viscera: Part II. The Bladder. Part III. The External Generative Organs. Part IV. The Internal Generative Organs. Part V. Position of the Nerve Cells on the Course of the Efferent Nerve Fibres. J Physiol 19, 71–139.

      Lee, V. M., Sechrist, J. W., Luetolf, S. and Bronner-Fraser, M. (2003). Both neural crest and placode contribute to the ciliary ganglion and oculomotor nerve. Developmental biology 263, 176–190.

      Stanke, M., Duong, C. V., Pape, M., Geissen, M., Burbach, G., Deller, T., Gascan, H., Parlato, R., Schütz, G. and Rohrer, H. (2006). Target-dependent specification of the neurotransmitter phenotype:cholinergic differentiation of sympathetic neurons is mediated in vivo by gp130 signaling. Development 133, 141–150.

      Zeisel, A., Hochgerner, H., Lönnerberg, P., Johnsson, A., Memic, F., van der Zwan, J., Häring, M., Braun, E., Borm, L. E., La Manno, G., et al. (2018). Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Response: Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below.

      Briefly, regarding clearer explanations of the methods, we added additional analyses (e.g., commonality analyses on ridge regression and on multiple regressions with a quadratic term for chronological age) to address some of the concerns and additional details in text and figures to ensure that the reader can fully understand our methodological procedures. Regarding the critical evaluation of the conceptual basis of the different models, we added discussions to help with interpretations and the scope of the generalisability of our findings. For instance, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them in the ability to explain fluid cognition, we now treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition (for this particular issue, please see our response to Reviewer 3 Public Review #4).

      Reviewer 1:

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address which mostly relate to clarity and interpretation.

      Reviewer 1 Public Review #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain-age models more generally. Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, there may be limits to the interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest that the authors consider and comment on these issues.

      Response: Thank you Reviewer 1 for pointing out these important issues. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 (see below).

      Reviewer 1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. Stacked models can be prone to overfitting when combined with cross-validation. This is because the predictions from the first-level models (i.e. the features that are provided to the second level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand what was actually done. Please provide more information to enable the reader to better understand the stacked regression models. If the authors are not using an approach that fully preserves training and test separability, they need to do so.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #2 (see below). Briefly, we now made it clearer that training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Reviewer 1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 1 Public Review #4:

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods, and bias-correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Response: Thank you Reviewer 1. We addressed this issue in our response to Reviewer 1 Recommendations For The Authors #5-#6. Briefly, we followed your advice and add all of the suggested details.

      Reviewer 2 (Public Review):

      Reviewer 2 Public Review Overall:

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration. The study employs suitable data and methods, albeit with some limitations, to address the research questions. A more detailed discussion of methodological limitations in relation to the study's aims is required. For instance, the current commonality analysis may not sufficiently address potential multicollinearity issues, which could confound the findings. Importantly, given that the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. This is particularly relevant to their novel index, brain-cognition, given that brain-age has been validated extensively elsewhere. In addition, the paper's rationale for using elastic net, which references previous fMRI studies, seemed somewhat unclear. The discussion could be more nuanced and certain conclusions appear speculative.

      Response Thank you for your encouragement. We have now added discussion of methodological limitations (see below). Regarding potential multicollinearity issues, we addressed this comment using Ridge regressions (see our response to Reviewer 2 Recommendations For The Authors #2). Regarding external validation, we now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations (see Reviewer 2 Recommendations For The Authors #1). Regarding Brain Cognition, we also added previous studies showing similarly high prediction for cognition functioning (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We added a discussion about Elastic Net (see Reviewer 1 Recommendations For The Authors #6)

      Discussion

      “There are several potential limitations of this study. First, we conducted an investigation relying only on one dataset, the Human Connectome Project in Aging (HCP-A) (Bookheimer et al., 2019). While HCP-A used state-of-the-art MRI methodologies, covered a wide age range from 36 to 100 years old and used several task-fMRI from different tasks that are harder to find in other bigger databases (e.g., UK Biobank from Sudlow et al., 2015), several characteristics of HCP-A might limit the generalisability of our findings. For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here. Similarly, HCP-A also excluded participants with neurological conditions, possibly making their participants not representative of the general population. Next, while HCP-A’s sample size is not small (n=725 and 504 people, before and after exclusion, respectively), other datasets provide a much larger sample size (Horien et al., 2020). Similarly, HCP-A does not include younger populations. But as mentioned above, a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) also found small effects of the adjusted Brain Age Gap in explaining cognitive functioning. And the disagreement between the predictive performance of age-prediction models and the utility of Brain Age found here is largely in line with the findings across different phenotypes seen in a recent systematic review (Jirsaraie, Gorelik, et al., 2023).”

      Reviewer 2 Public Review #1:

      The authors aimed to evaluate how brain-age and brain-cognition indices capture cognitive decline (as mentioned in their title) but did not employ longitudinal data, essential for calculating 'decline'. As a result, 'cognition-fluid' should not be used interchangeably with 'cognitive decline,' which is inappropriate in this context.

      Response Thank you for raising this issue. We now no longer used the word ‘cognitive decline’.

      Reviewer 2 Public Review #2:

      In their first aim, the authors compared the contributions of brain-age and chronological age in explaining variance in cognition-fluid. Results revealed much smaller effect sizes for brain-age indices compared to the large effects for chronological age. While this comparison is noteworthy, it highlights a well-known fact: chronological age is a strong predictor of disease and mortality. Has the brain-age literature systematically overlooked this effect? If so, please provide relevant examples. They conclude that due to the smaller effect size, brain-age may lack clinical significance, for instance, in associations with neurodegenerative disorders. However, caution is required when speculating on what brain-age may fail to predict in the absence of direct empirical testing. This conclusion also overlooks extant brain-age literature: although effect sizes vary across psychiatric and neurological disorders, brain-age has demonstrated significant effects beyond those driven by chronological age, supporting its utility.

      Response For aim 1, we focused our claims on cognitive functioning and not on any clinical significance for neurodegenerative disorders. We now made it clearer that the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with a study with a larger sample in older adults (Cole, 2020) and studies in younger populations (8-22 years old) (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023).

      We believe this issue of the utility of brain age on cognitive functioning vs neurological/psychological disorders requires another consideration, namely the discrepancy in the training and test samples typically used for studies focusing on neurological/psychological disorders. We made this point in the discussion now (see below).

      Discussion

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Public Review #3:

      The second aim's results reveal a discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in cognition-fluid. The authors suggest that if the ultimate goal is to capture cognitive variance, brain-age predictive models should be optimized to predict this target variable rather than age. While this finding is important and noteworthy, additional analyses are needed to eliminate potential confounding factors, such as correlated noise between the data and cognitive outcome, overfitting, or the inclusion of non-healthy participants in the sample. Optimizing brain-age models to predict the target variable instead of age could ultimately shift the focus away from the brain-age paradigm, as it might optimize for a factor differing from age.

      Response We discussed the issue regarding the discrepancy between the accuracy of their brain-age models in estimating age and the brain-age's capacity to explain variance in fluid cognition in our response to Reviewer 3 Public Review #9 (see below). This issue is found to be widespread in a recent systematic review (Jirsaraie, Gorelik, et al., 2023). We now provided several strategies to mitigate this issue to improve the utility of Brain Age in explaining other phenotypes based on our current work and others, using different MRI modalities as well as modelling techniques (Bashyam et al., 2020; Jirsaraie, Kaufmann, et al., 2023; Rokicki et al., 2021).

      Regarding potential confounding factors, we are not sure what the reviewer meant by “correlated noise between the data and cognitive outcome”. The current study, for instance, used ICA-FIX (Glasser et al., 2016) to remove noise in functional MRI. It is unclear how much ‘noise’ is still left and might confound our findings. More importantly, we are not sure how to define ‘noise’ as referred to by Reviewer 2 here. As for overfitting, we used nested cross-validation to ensure that training and test sets were separate from each other (see Reviewer 1 Recommendations For The Authors #2). If overfitting happened as suggested, we should see a ‘lower’ predictive performance of age-prediction and cognitive-prediction models since the models would fit well with the training set but would not generalise well to the test set. This is not what we found. The predictive performance of our age-prediction and cognitive-prediction models was high and consistent with the literature. Regarding the inclusion of non-healthy participants in the sample, we discussed this above in our response to Reviewer 2 Public Review #2).

      Reviewer 2 Public Review #4:

      While a primary goal in biomarker research is to obtain indices that effectively explain variance in the outcome variable of interest, thus favouring models optimized for this purpose, the authors' conclusion overlooks the potential value of 'generic/indirect' models, despite sacrificing some additional explained variance provided by ad-hoc or 'specific/direct' models. In this context, we could consider brain-age as a 'generic' index due to its robust out-of-sample validity and significant associations across various health outcome variables reported in the literature. In contrast, the brain-cognition index proposed in this study is presumed to be 'specific' as, without out-of-sample performance metrics and testing with different outcome variables (e.g., neurodegenerative disease), it remains uncertain whether the reported effect would generalize beyond predicting cognition-fluid, the same variable used to condition the brain-cognition model in this study. A 'generic' index like brain-age enables comparability across different applications based on a common benchmark (rather than numerous specific models) and can support explanatory hypotheses (e.g., "accelerated ageing") since it is grounded in its own biological hypothesis. Generic and specific indices are not mutually exclusive; instead, they may offer complementary information. Their respective utility may depend heavily on the context and research or clinical question.

      Response Thank you Reviewer 2 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 3 (Public Review #4) bought up a similar issue. We agreed with Reviewer 2 that both 'specific/direct' index and Brain Age as a 'generic/indirect' index have merit in their own right. We made a discussion about this issue in our response to Reviewer 3 Public Review #4 (please see this response below).

      Briefly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, we now examined the extent to which Brain Age missed the variation in the brain MRI that could explain fluid cognition. We also made a discussion about using our commonality approach to test for this missing variation in future work:

      Discussion

      “Finally, researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest. As demonstrated here, one straightforward method is to build a prediction model using a phenotype of interest as the target (e.g., fluid cognition) and incorporate the predicted value of this model (e.g., Brain Cognition), along with Brain Age and chronological age, into a multiple regression for commonality analyses. The unique effect of this predicted value will inform the missing variation in the brain MRI from Brain Age. If this unique effect is large, then researchers might need to reconsider whether using Brain Age is appropriate for a particular phenotype of interest.”

      Reviewer 2 Public Review #5:

      The study's third aim was to evaluate the authors' new index, brain-cognition. The results and conclusions drawn appear similar: compared to brain-age, brain-cognition captures more variance in the outcome variable, cognition-fluid. However, greater context and discussion of limitations is required here. Given the nature of the input variables (a large proportion of models in the study were based on fMRI data using cognitive tasks), it is perhaps unsurprising that optimizing these features for cognition-fluid generates an index better at explaining variance in cognition-fluid than the same features used to predict age. In other words, it is expected that brain-cognition would outperform brain-age in explaining variance in cognition-fluid since the former was optimized for the same variable in the same sample, while brain-age was optimized for age. Consequently, it is unclear if potential overfitting issues may inflate the brain-cognition's performance. This may be more evident when the model's input features are the ones closely related to cognition, e.g., fMRI tasks. When features were less directly related to cognitive tasks, e.g., structural MRI, the effect sizes for brain-cognition were notably smaller (see 'Total Brain Volume' and 'Subcortical Volume' models in Figure 6). This observation raises an important feasibility issue that the authors do not consider. Given the low likelihood of having task-based fMRI data available in clinical settings (such as hospitals), estimating a brain-cognition index that yields the large effects discussed in the study may be challenged by data scarcity.

      Response Given the use of nested cross-validation, we do not consider the good predictive performance of Brain Cognition found here as overfitting. In fact, we found a similar level of predictive performance of Brain Cognition on another database with younger participants in the past (Tetereva et al., 2022). However, we agreed with Reviewer 2 that the prediction of fluid cognition might be driven by MRI modalities that are different from those that drive the prediction of chronological age. In our own work with other age groups, including young adults (Tetereva et al., 2022) and children (Pat, Wang, Anney, et al., 2022), cognitive functioning seems to be predicted well from task-based functional MRI. And Reviewer 2 is right that task-based fMRI is not commonly used in clinics, making it harder to translate our results. However, given our results, clinicians should be encouraged to use task-based fMRI if their goal is to predict cognitive functioning. Nevertheless, as suggested, we listed data scarcity as one of the limitations of our approach.

      Discussion “For instance, the tasks used in task-based fMRI in HCP-A are not used widely in clinical settings (Horien et al., 2020). This might make it challenging to translate the approaches used here.”

      Reviewer 2 Public Review #6:

      This study is valuable and likely to be useful in two main ways. First, it can spur further research aimed at disentangling the lack of correspondence reported between the accuracy of the brain-age model and the brain-age's capacity to explain variance in fluid cognitive ability. Second, the study may serve, at least in part, as an illustration of the potential pros and cons of using indices that are specific and directly related to the outcome variable versus those that are generic and only indirectly related.

      Response We are thankful for the encouragement. For the discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker for fluid cognition, we made a detailed discussion in our response to Reviewer 3 Public Review #9. More specifically, to ensure that readers can benefit from our findings, we made suggestions on how to ensure the utility of Brain Age indices as a biomarker for other phenotypes by drawing from our own strategy, as well as strategies used by Rokicki and colleagues (2021), Jirsaraie and colleagues (2023) and Bashyam and colleagues (2020).

      As for the pros and cons between generic vs specific biomarkers, we made a detailed discussion in our response to Reviewer 3 Public Review #4. We also made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers (see Reviewer 2 Public Review #4, above).

      Reviewer 2 Public Review #7:

      Overall, the authors effectively present a clear design and well-structured procedure; however, their work could have been enhanced by providing more context for both the brain-age and brain-cognition indices, including a discussion of key concepts in the brain-age paradigm, which acknowledges that chronological age strongly predicts negative health outcomes, but crucially, recognizes that ageing does not affect everyone uniformly. Capturing this deviation from a healthy norm of ageing is the key brain-age index. This lack of context was mirrored in the presentation of the four brain-age indices provided, as it does not refer to how these indices are used in practice. In fact, there is no mention of a more common way in which brain-age is implemented in statistical analyses, which involves the use of brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates. The latter is used to account for the regression-to-the-mean effect. The 'corrected brain-age delta' the authors use does not include a non-linear term, which perhaps is an additional reason (besides the one provided by the authors) as to why there may be small, but non-zero, common effects of both age and brain-age in the 'corrected brain-age delta' index commonality analysis. The context for brain-cognition was even more limited, with no reference to any existing literature that has explored direct brain-cognitive markers, such as brain-cognition.

      Response Regarding Brain Age and negative health outcomes, we addressed this in our response to Reviewer 1 Recommendations For The Authors #1 (see below). Briefly, we now discussed (1) the consistency between our findings on fluid cognition and other recent works on negative health outcomes, (2) the differences between Brain Age studies focusing on negative health outcomes vs. cognitive functioning and (3) suggested solutions to optimise the utility of brain age for both cognitive functioning and negative health outcomes.

      Regarding how Brain Age was used in practice, we addressed this in our response to Reviewer 3 Public Review #2 (see below). Our argument resonates Butler and colleagues’ (2021) suggestion that the common practice for Brain Age analysis should be re-evaluated: “The MBAG and performance on the complex cognition tasks were not associated (r =  .01, p = 0.71). These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016). (p. 4097).”

      Importantly, we also implemented “brain-age delta as the variable of interest, along with linear and non-linear terms of age as covariates” in our additional analyses along with other implementations (see Reviewer 2 Recommendations For The Authors #3). Of particular note, we found that adding a non-linear term (i.e., a quadratic term for chronological age) barely changed the results of commonality analyses.

      We now wrote this paragraph to recommend how future research should implement Brain Age:

      Discussion

      “First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to their recommendation (Butler et al., 2021), we suggest future work focus on Corrected Brain Age Gap or, better, unique effects of Brain Age indices after controlling for chronological age in multiple regressions. In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). “

      Regarding brain cognition, we now expanded our explanation about Brain Cognition on how it might be relevant to Brain Age and on Brain Cognition’s predictive performance found previously.

      Introduction

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      Discussion

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022).”

      Reviewer 2 Public Review #8:

      While this paper delivers intriguing and thought-provoking results, it would benefit from recognizing the value that both approaches--brain-age indices and more direct, specific markers like brain-cognition--can contribute to the field.

      Response Thank you so much for recognising the value of our work. As we mentioned above in our response to Reviewer 2 Public Review #4 and #6, we made some suggestions on how to make use of the difference in the ability between generic vs specific biomarkers.

      Reviewer 3 (Public Review):

      Reviewer 3 Public Review Overall:

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" While this question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age, the authors are currently missing an opportunity to convey the inevitability of their results, given how brain-age and the brain-age gap are calculated. They also argue that brain-cognition is somehow superior to brain-age, but insufficient evidence is provided in support of this claim.

      Response We addressed the concerns below. The inevitability of our results is not obvious to many researchers who might be interested in Brain Age. We hope our findings might make many issues surrounding Brain Age more obvious, and we now make many suggestions on how to address some of these issues. We no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Specific comments follow:

      Reviewer 3 Public Review #1:

      • "There are many adjustments proposed to correct for this estimation bias" (p3). Regression to the mean is not a sign of bias. Any decent loss function will result in over-predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including "correcting" the brain age gap by regressing out age.

      Response: Thank you so much for raising this issue. We used the word ‘bias’ following many articles in the field. For instance,

      de Lange and Cole (2020) wrote: “brain-age estimation also involves a frequently observed bias: brain age is overestimated in younger subjects and underestimated in older subjects, while brain age for participants with an age closer to the mean age (of the training dataset) are predicted more accurately (Cole, Le, Kuplicki, McKinney, Yeh, Thompson, Paulus, Investigators, et al., 2018, Liang, Zhang, Niu, 2019, Niu, Zhang, Kounios, Liang, 2019, Smith, Vidaurre, Alfaro-Almagro, Nichols, Miller, 2019).”

      Cole (2020) wrote: “As recent research has highlighted a proportional bias in brain-age calculation, whereby the difference between chronological age and brain-predicted age is negatively correlated with chronological age (Le et al., 2018, Liang et al., 2019, Smith et al., 2019), an age-bias correction procedure was used. This entailed calculating the regression line between age (predictor) and brain-predicted age (outcome) in the training set, then using the slope (i.e., coefficient) and intercept of that line to adjust brain-predicted age values in the testing set (by subtracting the intercept and then dividing by the slope). After applying the age-bias correction the brain-predicted age difference (brain-PAD) was calculated; chronological age subtracted from brain-predicted age.”

      Beheshiti and colleagues (2019) used bias in their title: “Bias-adjustment in neuroimaging-based brain age frameworks: a robust scheme”

      More recently, Cumplido-Mayoral and colleagues (2023) wrote: “As recent research has shown that brain-age estimation involves a proportional bias (de Lange et al., 2020a; Le et al., 2018; Liang et al., 2019; Smith et al., 2019), we applied a well-established age-bias correction procedure to our data (de Lange et al., 2020a; Le et al., 2018).”

      Still, we agree with Reviewer 3 that using ‘bias’ might lead to misinterpretation. As Butler and colleagues (Butler et al., 2021) pointed out, ”It is important to note that regression toward the mean is not a failure, but a feature, of regression and related methods.“ We rewrote the paragraph and clarified the “regression towards the mean” issue. We no longer used the word “bias” here:

      Introduction

      “Note researchers often subtract chronological age from Brain Age, creating an index known as Brain Age Gap (Franke & Gaser, 2019). A higher value of Brain Age Gap is thought to reflect accelerated/premature aging. Yet, given that Brain Age Gap is calculated based on both Brain Age and chronological age, Brain Age Gap still depends on chronological age (Butler et al., 2021). If, for instance, Brain Age was based on prediction models with poor performance and made a prediction that everyone was 50 years old, individual differences in Brain Age Gap would then depend solely on chronological age (i.e., 50 minus chronological age). Moreover, Brain Age is known to demonstrate the “regression towards the mean” phenomenon (Stigler, 1997). More specifically, because Brain Age is a predicted value of a regression model that predicts chronological age, Brain Age is usually shrunk towards the mean age of samples used for training the model (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018). Accordingly, Brain Age predicts chronological age more accurately for individuals who are closer to the mean age while overestimating younger individuals’ chronological age and underestimating older individuals’ chronological age. There are many adjustments proposed to correct for the age dependency, but the outcomes tend to be similar to each other (Beheshti et al., 2019; de Lange & Cole, 2020; Liang et al., 2019; Smith et al., 2019). These adjustments can be applied to Brain Age and Brain Age Gap, creating Corrected Brain Age and Corrected Brain Age Gap, respectively. Corrected Brain Age Gap in particular is viewed as being able to control for age dependency (Butler et al., 2021). Here, we tested the utility of different Brain Age calculations in capturing fluid cognition, over and above chronological age.”

      Reviewer 3 Public Review #2:

      • "Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021)" (p3). This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading the Methods, I noticed that the authors use a metric from Le et al. (2018) for the "Corrected Brain Age Gap". If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of the present manuscript, and cross-comparisons between the two.

      Response: We thank Reviewer 3 for pointing out the issues surrounding our choices of wording: "corrected" and "biases". We share the same frustration with Reviewer 3 in that different brain-age articles use different terminologies, and we tried to make sure our readers understand our calculations of Brain Age indices in order to compare our results with previous work.

      We commented on the word “bias” in our response to Reviewer 3 Public Review #1 above and refrained from using this word in the revised manuscript. Here we commented on the use of the word “Corrected Brain Age Gap". And by doing so, we clarified how we calculated it.

      Reviewer 3 is right that we cited the work of Butler and colleagues (2021), but wasn’t accurate to say that we used “a metric from Le et al. (2018) for the "Corrected Brain Age Gap". We, instead, used a method described in de Lange and Cole’s (2020) work. We now added equations to explain this method in our Materials and Method section (see below).

      It is important to note that Butler and colleagues (2021) did not come up with any adjustment methods. Instead, Butler and colleagues (2021) discussed three adjustment methods:

      1) A method proposed by Beheshiti and colleagues (2019). Butler and colleagues (2021) called the result of this method, Modified Brain Age Gap (MBAG). Importantly, Butler and colleagues (2021) discouraged the use of this method due to “researchers misinterpreting the reduced variability of the MBAG as an improvement in prediction accuracy.” Accordingly in our article, we performed methods (2) and (3) below.

      2) A method proposed by de Lange and Cole (2020). We used this method in our article (see below for the equations). Briefly, we first fit a regression line predicting the Brain Age from a chronological age in each training set. We then used the slope and intercept of this regression line to adjust Brain Age in the corresponding test set, resulting in an adjusted index of Brain Age. Butler and colleagues (2021) called this index, “Revised Predicted Age.”, while de Lange and Cole’s (2020) originally called this Corrected Brain Age, “Corrected Predicted Age”. Butler and colleagues (2021) then subtracted the chronological age from this index and called it, “Revised Brain Age Gap (RBAG)”. We would like to follow the original terminology, but we do not want to use the word “Predicted Age” since chronological age can be predicted by other variables beyond the brain. We then settled with the word, "Corrected Brain Age" and “Corrected Brain Age Gap". We listed the terminologies used in the past in our article (see below).

      3) A method proposed by Le and colleagues (2018). Here, Butler and colleagues (2021) referred to one of the approaches done by Le and colleagues: “include age as a regressor when doing follow-up analyses.” Essentially this is what we did for the commonality analysis. Le and colleagues (2018)’ approach is the same as examining the unique effects of Brain Age in a multiple regression analysis with Chronological Age and Brain Age as regressors.

      While indexes from de Lange and Cole’s (2020) and Le and colleagues’ (2018) methods show poor performance in capturing fluid cognition in the current work, we need to stress that many research groups do not believe that these methods are meaningless. In fact, de Lange and Cole’s method (2020) is one of the most commonly implemented methods that can be seen elsewhere (e.g., Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). This index just does not seem to work well in the case of fluid cognition.

      Here is how we described how we calculated Brain Age indexes in the revised manuscript:

      Methods

      “ Brain Age calculations: Brain Age, Brain Age Gap, Corrected Brain Age and Corrected Brain Age Gap In addition to Brain Age, which is the predicted value from the models predicting chronological age in the test sets, we calculated three other indices to reflect the estimation of brain aging. First, Brain Age Gap reflects the difference between the age predicted by brain MRI and the actual, chronological age. Here we simply subtracted the chronological age from Brain Age:

      Brain Age Gapi = Brain Agei - chronological agei , (2)

      where i is the individual. Next, to reduce the dependency on chronological age (Butler et al., 2021; de Lange & Cole, 2020; Le et al., 2018), we applied a method described in de Lange and Cole’s (2020), which was implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022):

      In each outer-fold training set: Brain Agei = 0 + 1 chronological agei + εi, (3)

      Then in the corresponding outer-fold test set: Corrected Brain Agei = (Brain Agei - 0)/1, (4)

      That is, we first fit a regression line predicting the Brain Age from a chronological age in each outer-fold training set. We then used the slope (1) and intercept (0) of this regression line to adjust Brain Age in the corresponding outer-fold test set, resulting in Corrected Brain Age. Note de Lange and Cole (2020) called this Corrected Brain Age, “Corrected Predicted Age”, while Butler (2021) called it “Revised Predicted Age.”

      Lastly, we computed Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Cole et al., 2020; de Lange & Cole, 2020; Denissen et al., 2022):

      Corrected Brain Age Gap = Corrected Brain Age - chronological age, (5)

      Note Cole and colleagues (2020) called Corrected Brain Age Gap, “brain-predicted age difference (brain-PAD),” while Butler and colleagues (2021) called this index, “Revised Brain Age Gap”.

      Reviewer 3 Public Review #3:

      • "However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age" (p3). I largely agree with this statement. I would be really careful to distinguish between brain-age and the brain-age gap here, as the former is a predicted value, and the latter is the residual times -1 (i.e., predicted age - age). Therefore, together they explain all of the variance in age. Changing the first sentence to refer to the brain-age gap would be more accurate in this context. The brain-age gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response: Thank you so much for pointing this out. We agree to change “Brain Age” to “Brain Age Gap” in the mentioned sentence.

      Reviewer 3 Public Review #4:

      • "Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?". This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. Upon reading the Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as the authors refer to it, brain-cognition) is the same as the measure of fluid cognition that you are trying to assess how well brain-cognition can predict. Assuming the brain parameters can predict fluid cognition at all, it is then inevitable that brain-cognition will predict fluid cognition. Therefore, it is inappropriate to use predicted values of a variable to predict the same variable.

      Response: Thank you Reviewer 3 for pointing out this important issue. Reviewer 1 (Recommendations For The Authors #4) and Reviewer 2 (Public Review #4) bought up a similar issue. While Reviewer 3 felt that “it is inappropriate to use predicted values of a variable to predict the same variable,“ Reviewer 2 viewed Brain Cognition as a 'specific/direct' index and Brain Age as a 'generic/indirect' index. And both have merit in their own right.

      Similar to Reviewer 2, we believe that the specific index is as important and has commonly been used elsewhere in the context of biomarkers. For instance, to obtain neuroimaging biomarkers for Alzheimer’s, neuroimaging researchers often build a predictive model to predict Alzheimer's diagnosis (Khojaste-Sarakhsi et al., 2022). In fact, outside of neuroimaging, polygenic risk scores (PRSs) in genomics are often used following “to use predicted values of a variable to predict the same variable” (Choi et al., 2020). For instance, a PRS of ADHD that indicates the genetic liability to develop ADHD is based on genome-wide association studies of ADHD (Demontis et al., 2019).

      Still, we now agreed that it may not be fair to compare the performance of a specific index (Brain Cognition) and a generic index (Brain Age) directly (as pointed out by Reviewer 3 Public Review #6 below). Accordingly, in the revision, as opposed to treating Brain Cognition and Brain Age as separate biomarkers and comparing them, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. In other words, the strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition. And consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age. According to Reviewer 2, a generic index (Brain Age) “sacrificed some additional explained variance provided” compared to a specific index (Brain Cognition). Here, we used the commonality analyses to quantify how much scarifying was made by Brain Age. See below for the re-conceptualisation of Brain Age vs. Brain Cognition in the revision:

      Abstract

      “Lastly, we tested how much Brain Age missed the variation in the brain MRI that could explain fluid cognition. To capture this variation in the brain MRI that explained fluid cognition, we computed Brain Cognition, or a predicted value based on prediction models built to directly predict fluid cognition (as opposed to chronological age) from brain MRI data. We found that Brain Cognition captured up to an additional 11% of the total variation in fluid cognition that was missing from the model with only Brain Age and chronological age, leading to around a 1/3-time improvement of the total variation explained.”

      Introduction:

      “Third and finally, certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in the brain MRI that is related to fluid cognition and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. Consequently, the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age indicate what is missing from Brain Age -- the amount of co-variation between brain MRI and fluid cognition that cannot be captured by Brain Age.”

      “Finally, we investigated the extent to which Brain Age indices missed the variation in the brain MRI that could explain fluid cognition. Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition.“

      Discussion

      “Third, how much does Brain Age miss the variation in the brain MRI that could explain fluid cognition? Brain Age and chronological age by themselves captured around 32% of the total variation in fluid cognition. But, around an additional 11% of the variation in fluid cognition could have been captured if we used the prediction models that directly predicted fluid cognition from brain MRI.

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer 3 Public Review #5:

      • "However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, "Stacked: All excluding Task Contrast", generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid" (p7). This is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): y=(y-y ̂ )+y ̂. Let's say that age explains 60% of the variance in fluid cognition, and predicted age (y ̂) explains 40% of the variance in fluid cognition. Then the brain age gap (-(y-y ̂)) should explain 20% of the variance in fluid cognition. If by "Corrected Brain Age" you mean the modified predicted age from Butler et al (2021), the "Corrected Brain Age" result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel (a) should be flat and high (about as high as the predictive value of age for fluid cognition). So it is unclear how "Corrected Brain Age" is calculated. It looks like you might be regressing age out of brain-age, though from your description in the Methods section, it is not totally clear. Again, I highly recommend using the terminology and metrics of Butler et al (2021) throughout to reduce confusion. Please also clarify how you used the slope and intercept. In general, given how brain-age metrics tend to be calculated, the following conclusion is inevitable: "As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models" (p10).

      Response: We agreed that the results are ‘inevitable’ due to the transformations from Brain Age to other Brain Age indices. However, the consequences of these transformations may not be very clear to readers who are not very familiar with Brain Age literature and to the community at large who think about the implications of Brain Age. This is appreciated by Reviewer 1, who mentioned “While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community.”

      Note we made clarifications on how we calculated each of the Brain Age indices above (see<br /> Reviewer 3 Public Review #2), including how we used the slope and intercept. We chose the terminology closer to the one originally used by de Lange and Cole (2020) and now listed many terminologies others have used to refer to this transformation.

      Reviewer 3 Public Review #6:

      "On the contrary, the unique effects of Brain Cognition appeared much larger" (p10). This is not a fair comparison if you do not look at the unique effects above and beyond the cognitive variable you predicted in your brain-cognition model. If your outcome measure had been another metric of cognition other than fluid cognition, you would see that brain-cognition does not explain any additional variance in this outcome when you include fluid cognition in the model, just as brain-age would not when including age in the model (minus small amounts due to penalization and out-of-sample estimates). This highlights the fact that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #7:

      "First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little" (p12). This is a really important point, but the paper requires an in-depth discussion of the inevitability of this result, as discussed above.

      Response We agree that the tight relationship between Brain Age and chronological age is inevitable. We mentioned this from the get-go in the introduction:

      Introduction “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.”

      To make this point obvious, we quantified the overlap between Brain Age and chronological age using the commonality analysis. We hope that our effort to show the inevitability of this overlap can make people more careful when designing studies involving Brain Age.

      Reviewer 3 Public Review #8:

      "Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age" (p12). I suggest controlling for the cognitive measure you predicted in your brain-cognition model. This will show that brain-cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response This point is similar to Reviewer 3 Public Review #6. Again please see our response to Reviewer 3 Public Review #4 above. Briefly, we no long made this comparison and said whether Brain Cognition is ‘better’ than Brain Age. Instead, we now viewed the unique effects of Brain Cognition as a way to test how much Brain Age missed the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Public Review #9:

      "Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond" (p13). I whole-heartedly agree with the first two sentences, but strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain-age paradigm). As of now, your results do not suggest that researchers should keep going down the brain-age path. While it is difficult to prove that there is no transformation of brain-age or the brain-age gap that will be useful, I am nearly sure this is true from the research I have done. If you would like to suggest that the field should continue down this path, I suggest presenting a very good case to support this view.

      Response Thank you for your comments on this issue.

      Since the submission of our manuscript, other researchers also made a similar observation regarding the disagreement between the predictive performance of age-prediction models and the utility of Brain Age. For instance, in their systematic review, Jirasarie and colleagues (2023, p7) wrote this statement, “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest. As a point of illustration, seven of the twenty studies in this review only evaluated the utility of their most accurate model, which in all cases was trained using multimodal features. This approach has also led to researchers to exclusively use T1-weighted and diffusion-weighted MRI scans when developing brain age models36 since such modalities have been shown to have the largest contribution to a model’s predictive power.2,67 However, our review suggests that model accuracy does not necessarily provide meaningful insight about clinical utility (e.g., detection of age-related pathology). Taken with prior studies,16,17 it appears that the most accurate models tend to not be the most useful.”

      We now discussed the disagreement between the predictive performance of age-prediction models and the utility of Brain Age, not only in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) but also in the context of neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). Following Reviewer 3’s suggestion, we also added several possible strategies to mitigate this problem of Brain Age, used by us and other groups. Please see below.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 (Recommendations For The Authors):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline using the HCP aging dataset by performing a commonality analysis in a downstream regression. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain-cognition') as an alternative that explains more unique variance in the downstream regression.

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. While the main message will not come as a surprise to anyone with hands-on experience of using brain-age models, I think it is nonetheless an important message to convey to the community. With that said, I have some comments that I believe the authors ought to address before publication.

      Reviewer 1 Recommendations For The Authors #1:

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. This is undeniably important, but is only one application area for brain age models. They are also used for example to provide biomarkers for many brain disorders. What would the results presented here have to say about these application areas? Further, I think that since brain-age models by construction confound relevant biological variation with the accuracy of the regression models used to estimate them, my own opinion about the limits of interpretation of (e.g.) the brain-age gap is as a dimensionless biomarker. This has also been discussed elsewhere (see e.g. https://academic.oup.com/brain/article/143/7/2312/5863667). I would suggest the authors nuance their discussion to provide considerations on these issues.

      Response Thank you Reviewer 1 for pointing out two important issues.

      The first issue was about applications for brain disorders. We now made a detailed discussion about this, which also addressed Reviewer 3 Public Review #9. Briefly, we now bought up

      1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      2) under-fitted age-prediction models from Brain Age studies focusing on neurological/psychological disorders when applied to participants with neurological/psychological disorders because the age-prediction models were built from largely healthy participants,

      and 3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      The second issue was about “the brain-age gap as a dimensionless biomarker.” We are not so clear on what the reviewer meant by “the dimensionless biomarker.” One possible meaning of the “dimensionless biomarker” is the fact that Brain Age from the same algorithm and same modality can be computed, such that Brain Age can be tightly fit or loosely fit with chronological age. This is what Bashyam and colleagues (2020) did in the article Reviewer 1 referred to. We now wrote about this strategy in the above paragraph in the Discussion.

      Alternatively, “the dimensionless biomarker” might be something closer to what Reviewer 2 viewed Brain Age as a “generic/indirect” index (as opposed to a 'specific/direct' index in the case of Brain Cognition) (see Reviewer 2 Public Review #4). We discussed this in our response to Reviewer 3 Public Review #4.

      Reviewer 1 Recommendations For The Authors #2:

      Second, from a methods perspective, I am quite suspicious of the stacked regression models the authors are using to combine regression models and I suspect they may be overfit. In my experience, stacked models are very prone to overfitting when combined with cross-validation. This is because the predictions from the first level models (i,e. the features that are provided to the second-level 'stacked' models) contain information about the training set and the test set. If cross-validation is not done very carefully (e.g. using multiple hold-out sets), information leakage can easily occur at the second level. Unfortunately, there is not sufficient explanation of the methodological procedures in the current manuscript to fully understand what was done. First, please provide more information to enable the reader to better understand the stacked regression models and if the authors are not using an approach that fully preserves training and test separability, please do so.

      Response: We would like to thank Reviewer 1 for the suggestion. We now made it clearer in texts and new figure (see below) that we used nested cross-validation to ensure no information leakage between training and test sets. Regarding the stacked models more specifically, the hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7 below). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets.

      Methods:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or fluid cognition as the target and standardised brain MRI as the features (Denissen et al., 2022). We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds. In each outer-fold CV, one of the outer folds was treated as a test set, and the rest was treated as a training set, which was further divided into five inner folds. In each inner-fold CV, one of the inner folds was treated as a validation set and the rest was treated as a training set. We used the inner-fold CV to tune for hyperparameters of the models and the outer-fold CV to evaluate the predictive performance of the models.

      In addition to using each of the 18 sets of features in separate prediction models, we drew information across these sets via stacking. Specifically, we computed predicted values from each of the 18 sets of features in the training sets. We then treated different combinations of these predicted values as features to predict the targets in separate “stacked” models. The hyperparameters of the stacked models were tuned in the same inner-fold CV as the non-stacked model (see Figure 7). That is, training models for both non-stacked and stacked models did not involve the test set, ensuring that there was no data leakage between training and test sets. We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, in total, there were 26 prediction models for Brain Age and Brain Cognition.

      Reviewer 1 Recommendations For The Authors #3:

      Third, the authors standardize the elastic net regression coefficients post-hoc. Why did the authors not perform the more standard approach of standardizing the covariates and responses, prior to model estimation, which would yield standardized regression coefficients (in the classical sense) by construction? Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      Response For model fitting, we did not “standardize the elastic net regression coefficients post-hoc.” Instead, we did all of the standardisation steps prior to model fitting (see Methods below). For regression strengths across different models and cross-validation splits, we now provided predictive performance at each of the five outer-fold test sets in Figure 1 (below). As you may have seen, the predictive performance was quite stable across the cross-validation splits.

      For visualising feature importance, We originally only standardised the elastic net regression coefficients post-hoc, so that feature importance plots were in the same scale across folds. However, as mentioned by Reviewer 3 (Recommendations for the Authors #7, below), this might make it difficult to interpret the directionality of the coefficients. In the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      Methods

      “We controlled for the potential influences of biological sex on the brain features by first residualising biological sex from brain features in each outer-fold training set. We then applied the regression of this residualisation to the corresponding test set. We also standardised the brain features in each outer-fold training set and then used the mean and standard deviation of this outer-fold training set to standardise the test set. All of the standardisation was done prior to fitting the prediction models.”

      “To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘’ and ‘l_1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘’ leads to similar predictive performance), resulting in different ‘’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.”

      Reviewer 1 Recommendations For The Authors #4:

      I do not really find it surprising that the level of unique explained variance provided by a brain-cognition model is higher than a brain-age model, given that the latter is considerably more accurate (also, in view of the comment above). As such I would recommend to tone down the claims about the utility of this method, also because it is only really applicable to one application area for brain age.

      Response Thank you for bringing this issue to our attention. We have now toned down the claims about the utility of Brain Cognition and importantly treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. Please see Reviewer 3 Public Review #4 above for a detailed discussion about this issue.

      Reviewer 1 Recommendations For The Authors #5:

      Please provide more details about the task designs and MRI processing procedures that were employed on this sample so that the reader is not forced to dig through the publications from the consortia contributing the data samples used. For example, comments such as "Here we focused on the pre-processed task fMRI files with a suffix "_PA_Atlas_MSMAll_hp0_clean.dtseries.nii." are not particularly helpful to readers not already familiar with this dataset.

      Response Thank you so much for pointing out this important point on the clarity of the description of our MRI methodology. We now added additional details about the data processing done by the HCP-A and by us. We, for instance, explained the meaning of the HCP-A suffix “"_PA_Atlas_MSMAll_hp0_clean.dtseries.nii”. Please see below.

      Methods

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.

      Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features.

      HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.

      Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. “

      Reviewer 1 Recommendations For The Authors #6:

      Similarly, please be more specific about the regression methods used. There are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted. The same goes for the methods used for correcting bias, e.g. what is "de Lange and Cole's (2020) 5th equation"?

      Response Thank you. We now made a detailed description of Elastic Net including its equation (see below). We also added more specific details about the methods used for correcting bias in Brain Age indices (see our response to Reviewer 3 Public Review #2 above).

      Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘’: the greater the , the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l_1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l_1 ratio=0) or absolute (known as ‘Lasso’; l_1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as: argmin_ ((|(|y-X|)|_2^2)/(2×n_samples )+α×l_1 _ratio×|(||)|_1+0.5×α×(1-l_1 _ratio)×|(|w|)|_2^2 ), (1) where X is the features, y is the target, and  is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters:  using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.”

      Additional minor points:

      Reviewer 1 Recommendations For The Authors #7:

      • Please provide more descriptive figure legends, especially for Figs 5 and 6. For example, what do the boldface numbers reflect? What do the asterisks reflect?

      Response Thank you for the suggestion. We made changes to the figure legends to make it clearer what the numbers and asterisks reflect.

      Reviewer 1 Recommendations For The Authors #8:

      • Perhaps this is personal thing, but I find the nomenclature cognition_{fluid} to be quite awkward. Why not just define FC as an acronym?

      Response Thank you for the suggestion. We now used the word ‘fluid cognition’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      Reviewer 2 Recommendations For The Authors #1:

      • Since the study did not provide external validation for the indices, it is unclear how well the models would perform and generalize to other samples. Therefore, it is recommended to conduct out-of-sample testing of the models.

      Response Thank you for the suggestion. We now added discussions about how consistency between our results and several recent studies that investigated similar issues with Brain Age in different populations, e.g., large samples of older adults in Uk Biobank (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023), and in a broader context, extending to neurological and psychological disorders (for review, see Jirsaraie, Gorelik, et al., 2023). Please see below.

      Please also noted that all of the analyses done were out-of-sample. We used nested cross-validation to evaluate the predictive performance of age- and cognition-prediction models on the outer-fold test sets, which are out-of-sample from the training sets (please see Reviewer 1 Recommendations For The Authors #2). Similarly, we also conducted all of the commonality analyses on the outer-fold test sets.

      Discussion

      “The small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023). Cole (2020) studied the utility of Brain Age on cognitive functioning of large samples (n>17,000) of older adults, aged 45-80 years, from the UK Biobank (Sudlow et al., 2015). He constructed age-prediction models using LASSO, a similar penalised regression to ours and applied the same age-dependency adjustment to ours. Cole (2020) then conducted a multiple regression explaining cognitive functioning from Corrected Brain Age Gap while controlling for chronological age and other potential confounds. He found Corrected Brain Age Gap to be significantly related to performance in four out of six cognitive measures, and among those significant relationships, the effect sizes were small with a maximum of partial eta-squared at .0059. Similarly, Jirsaraie and colleagues (2023) studied the utility of Brain Age on cognitive functioning of youths aged 8-22 years old from the Human Connectome Project in Development (Somerville et al., 2018) and Preschool Depression Study (Luby, 2010). They built age-prediction models using gradient tree boosting (GTB) and deep-learning brain network (DBN) and adjusted the age dependency of Brain Age Gap using Smith and colleagues’ (2019) method. Using multiple regressions, Jirsaraie and colleagues (2023) found weak effects of the adjusted Brain Age Gap on cognitive functioning across five cognitive tasks, five age-prediction models and the two datasets (mean of standardised regression coefficient = -0.09, see their Table S7). Next, Butler and colleagues (2021) studied the utility of Brain Age on cognitive functioning of another group of youths aged 8-22 years old from the Philadelphia Neurodevelopmental Cohort (PNC) (Satterthwaite et al., 2016). Here they used Elastic Net to build age-prediction models and applied another age-dependency adjustment method, proposed by Beheshti and colleagues (2019). Similar to the aforementioned results, Butler and colleagues (2021) found a weak, statistically non-significant correlation between the adjusted Brain Age Gap and cognitive functioning at r=-.01, p=.71. Accordingly, the utility of Brain Age in explaining cognitive functioning beyond chronological age appears to be weak across age groups, different predictive modelling algorithms and age-dependency adjustments.“

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often lead to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023). “

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation of brain MRI that is related to fluid cognition. Brain Cognition, from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. The unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained. “

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). That is, those Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. This means that age-prediction models from Brain Age studies focusing on neurological/psychological disorders might be under-fitted when applied to participants with neurological/psychological disorders because they were built from largely healthy participants. And thus, the difference in Brain Age indices between participants without vs. with neurological/psychological disorders might be confounded by the under-fitted age-prediction models (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other Brain Age studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning do not suffer from being under-fitted. We consider this as a strength, not a weakness of our study.”

      Reviewer 2 Recommendations For The Authors #2:

      • Employ Variance Inflation Factor (VIF) to empirically test for multicollinearity.

      Response Given high common effects between many of the regressors in the models (e.g., between Brain Age and chronological age), VIF will be high, but this is not a concern for the commonality analysis. We showed now that applying the commonality analysis to multiple regressions allowed us to have robust results against multicollinearity, as demonstrated elsewhere (Ray-Mukherjee et al., 2014, Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity). Specifically, using the multiple regressions by themselves without the commonality analysis, researchers have to rely on beta estimates, which are strongly affected by multicollinearity (e.g., a phenomenon known as the Suppression Effect). However, by applying the commonality analysis on top of multiple regressions, researchers can then rely on R2 estimates, which are less affected by multicollinearity. This can be seen in our case (Figure 5 and 6) where Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models).

      To directly demonstrate the robustness of the current commonality analysis regarding multicollinearity, we applied the commonality analysis to Ridge regressions (see Supplementary Figures 3 and 5 below). Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). As seen below, the results from commonality analyses applied to Ridge regressions are closely matched with our original results.

      Methods

      “Note to ensure that the commonality analysis results were robust against multicollinearity (Ray-Mukherjee et al., 2014), we also repeated the same commonality analyses done here on Ridge regression, as opposed to multiple regression. Ridge regression is a method designed to deal with multicollinearity (Dormann et al., 2013). See Supplementary Figure 3 for the Ridge regression with chronological age and each Brain Age index as regressors and Supplementary Figure 5 for the Ridge regression with chronological age, each Brain Age and Brain Cognition index as regressors. Briefly, the results from commonality analyses applied to Ridge regressions are closely matched with our results done using multiple regression.”

      Reviewer 2 Recommendations For The Authors #3:

      • Incorporate non-linearities in the correction of brain-age indices, such as separate terms in the regression or statistical analyses.

      Response Thank you for the suggestion. We now added a non-linear term of chronological age in our multiple-regression models explaining fluid cognition (see Supplementary Figure 4 and 6 below). Originally we did not have the quadratic term for chronological age in our model since the relationship between chronological age and fluid cognition was relatively linear (see Figure 1 above). Accordingly, as expected, adding the quadratic term for chronological age as suggested did not change the pattern of the results of the commonality analyses.

      Methods

      “Similarly, to ensure that we were able to capture the non-linear pattern of chronological age in explaining fluid cognition, we added a quadratic term of chronological age to our multiple-regression models in the commonality analyses. See Supplementary Figure 4 for the multiple regression with chronological age, square chronological age and each Brain Age index as regressors and Supplementary Figure 6 for the multiple regression with chronological age, square chronological age, each Brain Age index and Brain Cognition as regressors. Briefly, adding the quadratic term for chronological age did not change the pattern of the results of the commonality analyses.”

      Reviewer 2 Recommendations For The Authors #4:

      • It would be helpful to include the complete set of results in the appendix - for instance, the statistical significance for each component for the final commonality analysis.

      Response Figures 5 and 6 (see above) already have asterisks to reflect the statistical significance of the unique effects. Because of this, we do not believe we need more figures/tables in the appendix to show statistical significance.

      Recommendations for improving the writing and presentation.

      Reviewer 2 Recommendations For The Authors #5:

      • The authors are encouraged to refrain from using terms such as 'fortunately', 'unfortunately', and 'unsettling', as they may appear inappropriate when referring to empirical findings.

      Response We agree with this suggestion and no long used those words.

      Reviewer 2 Recommendations For The Authors #6:

      • It would be helpful to clarify in the methods that you end up with 5 test folds.

      Response We now made a clarification why we chose 5 test folds.

      Methods

      “We used nested cross-validation (CV) to build these models (see Figure 7). We first split the data into five outer folds. We used five outer folds so that each outer fold had around 100 participants. This is to ensure the stability of the test performance across folds.”

      Minor corrections to the text and figures.

      Reviewer 2 Recommendations For The Authors #7:

      • Why use months, not years for chronological age? This seems inappropriate given the age range.

      Response We originally used months since they were units used in our prediction modelling. However, to make the figures easier to understand, we now used years.

      Reviewer 2 Recommendations For The Authors #8:

      • The formatting, especially regarding the text embedded within the figures, could benefit from significant improvements.

      Response Thank you for the suggestion. We made changes to the text embedded within the figures. They should be more readable now

      Reviewer 2 Recommendations For The Authors #9:

      • The legend for the neuroimaging feature labels is missing, and the captions are incomplete.

      Response Please see Figure 2 above. We now revised by adding the letter L and R for the laterality of the brain images. We made some changes to the captions to make sure they are complete.

      Reviewer 2 Recommendations For The Authors #10:

      • Figure 5's caption: SD has a missing decimal point).

      Response The numbers are not SD. The numbers to the left of the figure represent the unique effects of chronological age in %, the numbers in the middle of the figure represent the common effects between chronological age and Brain Age index in %, and the numbers to the right of the figure represent the unique effects of Brain Age Index in %. We now used the same one decimal point for these number

      Reviewer #3 (Recommendations For The Authors):

      The main question of this article is as follows: “To what extent does having information on Brain Age improve our ability to capture declines in fluid cognition beyond knowing a person’s chronological age?” While this question is worthwhile, considering most of the field is confused about the nature of brain age, the authors are currently missing an opportunity to convey the inevitability of their results given how Brain Age and the Brain Age Gap are calculated. They also misleadingly convey that Brain Cognition is somehow superior to Brain Age. If the authors work on conveying the inevitability of their results and redo (or remove) their section on Brain Cognition, I can see how their results would be enlightening to the general neuroimaging community that is interested in the concept of brain age. See below for specific critiques.

      Response Please see our response to Reviewer 3 Public Review Overall. Note we no longer argue that Brain Cognition is superior to Brain Age (Reviewer 3 Public Review #4). Rather, we treated the capability of Brain Cognition in capturing fluid cognition as the upper limit of Brain Age’s capability in capturing fluid cognition. We used the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age to indicate how much Brain Age misses the variation in the brain MRI that could explain fluid cognition.

      Reviewer 3 Recommendations For The Authors #1:

      “There are many adjustments proposed to correct for this estimation bias” (p3) → Regression to the mean is not a sign of bias. Any decent loss function will result in over- predicting the age of younger individuals and under-predicting the age of older individuals. This is a direct result of minimizing an error term (e.g., mean squared error). Therefore, it is inappropriate to refer to regression to the mean as a sign of bias. This misconception has led to a great deal of inappropriate analyses, including “correcting” the brain age gap by regressing out age.

      Response Please see our response to Reviewer 3 Public Review#1

      Reviewer 3 Recommendations For The Authors #2:

      “Corrected Brain Age Gap in particular is viewed as being able to control for both age dependency and estimation biases (Butler et al., 2021).” (p3) → This summary is not accurate as Butler and colleagues did not use the words "corrected" and "biases" in this context. All that authors say in that paper is that regressing out age from the brain age gap - which is referred to as the modified brain age gap (MBAG) - makes it so that the modified brain age gap is not dependent on age, which is true. This metric is meaningless, though, because it is the variance left over after regressing out age from residuals from a model that was predicting age. If it were not for the fact that regression on residuals is not equivalent to multiple regression (and out of sample estimates), MBAG would be a vector of zeros. Upon reading your Methods, I noticed that you are using a metric for Le et al. (2018) for your “Corrected Brain Age Gap”. If they cite the Butler et al. (2021) paper, I highly recommend sticking with the same notation, metrics and terminology throughout. That would greatly help with the interpretability of your paper, and cross-comparisons between the two.

      Response Please see our response to Reviewer 3 Public Review #2.

      Reviewer 3 Recommendations For The Authors #3:

      “However, the improvement in predicting chronological age may not necessarily make Brain Age to be better at capturing Cognitionfluid. If, for instance, the age-prediction model had the perfect performance, Brian Age Gap would be exactly zero and would have no utility in capturing Cognitionfluid beyond chronological age.” (p3) → I largely agree with this statement. I would be really careful to distinguish between Brain Age and the Brain Age Gap here, as the former is a predicted value, and the latter is the residual times -1 (predicted age - age). Therefore, together they explain all of the variance in age. If you change the first sentence to refer to the Brain Age Gap, this statement makes more sense. The Brain Age Gap will never be exactly zero, though, even with perfect prediction on the training set, because subjects in the testing set are different from the subjects in the training set.

      Response Please see our response to Reviewer 3 Public Review #3.

      Reviewer 3 Recommendations For The Authors #4:

      “Can we further improve our ability to capture the decline in cognitionfluid by using, not only Brain Age and chronological age, but also another biomarker, Brain Cognition?” → This question is fundamentally getting at whether a predicted value of cognition can predict cognition. Assuming the brain parameters can predict cognition decently, and the original cognitive measure that you were predicting is related to your measure of fluid cognition, the answer should be yes. This seems like an uninteresting question to me. Upon reading your Methods, it became clear that the cognitive variable in the model predicting cognition using brain features (to get predicted cognition, or as you refer to it, Brain Cognition) is the same as the measure of fluid cognition that you are trying to assess how well Brain Cognition can predict. Assuming the brain parameters can predict fluid cognition at all, of course Brain Cognition will predict fluid cognition. This is inevitable. You should never use predicted values of a variable to predict the same variable.

      Response Please see our response to Reviewer 3 Public Review #4.

      Reviewer 3 Recommendations For The Authors #5:

      “We also examined if these better-performing age-prediction models improved the ability of Brain Age in explaining Cognitionfluid.” → Improved above and beyond what?

      Response We referred to if better-performing age-prediction models improved the ability of Brain Age in explaining fluid cognition over and above lower-performing age-prediction models. We made changes to the Introduction to clarify this change.

      Reviewer 3 Recommendations For The Authors #6:

      Figure 1 b & c → It is a little difficult to read the text by the horizontal bars in your plots. Please make the text smaller so that there is more space between the words vertically, or even better, make the plots slightly bigger. Please also put the predicted values on the y-axis. This is standard practice for displaying regression results. To make more room, you can get rid of your rPearson or your R2 plot, considering the latter is simply the square of the former. If you want to make it clear that the association is positive between all of your variables, I would keep rPearson.

      Response Thank you so much for the suggestions.

      1) We now made sure that the text by the horizontal bars in Figure 1b and c is readable.

      2) Note in prediction model/machine-learning literature, it is more common to plot observed/real values on the y-axis. Here is the logic of our practice: values in the x-axis are the predicted values based on the model, and we would like to see if the changes in the predicted values correspond to the changes in the observed/real value in the y-axis.

      3) Regarding Pearson correlation vs R2, please note that we wrote ”for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020).” As such, R2 is NOT the square of the Pearson correlation. In fact, in Poldrack and colleages’s “Establishment of Best Practices for Evidence for Prediction” paper (2020), they discourage 1) the use of Pearson correlation by itself and 2) the use of the correlation coefficient square as R2 (as opposed to sum of squares definition):

      “It is common in the literature to use the correlation between predicted and actual values as a measure of predictive performance; of the 64 studies in our literature review that performed prediction analyses on continuous outcomes, 30 reported such correlations as a measure of predictive performance. This reporting is problematic for several reasons. First, correlation is not sensitive to scaling of the data; thus, a high correlation can exist even when predicted values are discrepant from actual values. Second, correlation can sometimes be biased, particularly in the case of leave-one-out cross-validation. As demonstrated in Figure 4, the correlation between predicted and actual values can be strongly negative when no predictive information is present in the model. A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      “A further problem arises when the variance explained (R2) is incorrectly computed by squaring the correlation coefficient. Although this computation is appropriate when the model is obtained using the same data, it is not appropriate for out-of-sample testing23; instead, the amount of variance explained should be computed using the sum-of-squares formulation (as implemented in software packages such as scikit-learn).”

      Accordingly, we decided to keep both R2 and Pearson correlation (along with MAE) in our Figure 1.

      Reviewer 3 Recommendations For The Authors #7:

      Figure 2 “We calculated feature importance by, first, standardizing Elastic Net weights across brain features of each set of features from each test fold.” → What do you mean by “standardize” here? Rescale to be mean 0, variance 1? If so, this seems like a misleading transformation, because it gives the impression that the relationships are negative, when they are not necessarily. Also, why did you choose to use elastic net weights in any form as measures of effect size (or importance)? The raw values are inherently penalized, which means they are under-estimates of the true effect size. It would be more meaningful (and less biased) to plot the raw correlations.

      Response For the first question regarding standardisation, we addressed this issue in our response to Reviewer 1 Recommendations For The Authors #3. Briefly, we agreed with Reviewer 3 that standardisation (with mean = 0, SD = 1) might make it difficult to interpret the directionality of the coefficients. For visualising feature importance in the revised manuscript, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images (see below).

      For the second question regarding why using Elastic Net coefficients as feature importance (as opposed to correlations), we need to mention the goal of feature importance: to understand how the model makes a prediction based on different brain features (Molnar, 2019). Correlations between a target and each brain feature do not achieve this. Instead, they will show univariate/marginal relationships between a target and a brain feature. What we want to visualise is how the model made a prediction, which in the case of Elastic Net, the prediction is based on the sum of the features’ coefficients. In other words, the multivariate models (including Elastic Net) focus on marginal relationships that take into account all brain features within each set of features.

      Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Reviewer 3 Recommendations For The Authors #8:

      Figure 3 → Again, what exactly do you mean by “standardised” here?

      Response It means mean subtraction followed by the division by an SD. Though we no longer applies standardisation for feature importance. See our response to Reviewer 1 Recommendations For The Authors #3 and Reviewer 3 Recommendations For The Authors #7.

      Reviewer 3 Recommendations For The Authors #9:

      “However, Brain Age Gap created from the lower-performing age-prediction models explained a higher amount of variation in Cognitionfluid. For instance, the top performing age-prediction model, “Stacked: All excluding Task Contrast”, generated Brain Age and Corrected Brain Age that explained the highest amount of variation in Cognitionfluid, but, at the same time, produced Brian Age Gap that explained the least amount of variation in Cognitionfluid.” (p7) → Yes, but you did not need to run any models to show this, considering it is an inevitable consequence of the following relationship between predicted values and residuals (or residuals times -1): 𝑦 = (𝑦 − 𝑦% ) + 𝑦% . Let’s say that age explains 60% of the variance in fluid cognition, and predicted age ( 𝑦% ) explains 40% of the variance in fluid cognition. Then the brain age gap (−(𝑦 − 𝑦% )) should explain 20% of the variance in fluid cognition. If by “Corrected Brain Age” you mean the modified predicted age from the Butler paper, the “Corrected Brain Age” result is inevitable because the modified predicted age is essentially just age with a tiny bit of noise added to it. From Figure 4, though, this does not seem to be the case, because the lower left quadrant in panel a should be flat and high (about as high as the predictive value of age for fluid cognition). So how are you calculating “Corrected Brain Age”? It looks like you might be regressing age out of Brain Age, though from your description the Methods (How exactly do you use the slope and intercept? You need equation of you are going to stick with this terminology), it is not totally clear. I highly recommend using terminology and metrics from the Butler et al. (2021) paper throughout to reduce confusion.

      Response Please see our response to Reviewer 3 Public Review #5

      Reviewer 3 Recommendations For The Authors #10:

      “On the contrary, an amount of variation in Cognitionfluid explained by Corrected Brain Age Gap was relatively small (maximum R2 = .041) across age-prediction models and did not relate to the predictive performance of the age-prediction models.” (p7) → If by “Corrected Brain Age Gap” you mean MBAG from The Butler paper, yes, this is also inevitable, considering MBAG would be a vector of zeros if it were not for regression on residuals (and out of sample estimates), as I mentioned earlier. Also, it is not clear why you used “on the contrary” as a transition here.

      Response Please see our response to Reviewer 3 Public Review #2 for the ‘MBAG’ term. Briefly, we didn’t use Butler and colleagues' (2021) MBAG, but rather we used the method described in de Lange and Cole’s (2020), which was called RBAG by Butler and colleagues.

      de Lange and Cole’s (2020) method, was commonly implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022). Accordingly, researchers who use Brain Age do not usually view this method as capturing a meaningless biomarker. Yet, the small effects of the Corrected Brain Age Gap in explaining fluid cognition of aging individuals found here are consistent with studies in older adults (Cole, 2020) and younger populations (Butler et al., 2021; Jirsaraie, Kaufmann, et al., 2023) (see our response to Reviewer 2 Recommendations For The Authors #1).

      “On the contrary” refers to the fact that the other three Brain Age indices (i.e., those that did not account for the relationship between Brain Age and chronological age) showed a much higher amount of variation in fluid cognition explained. As mentioned above (our response to Reviewer 2 Public Review #7), our argument resonates Butler and colleagues’ (2021) suggestion (p. 4097): “As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (e.g., Beheshti et al., 2018; Cole, Underwood, et al., 2017; Franke et al., 2015; Gaser et al., 2013; Liem et al., 2017; Nenadi c et al., 2017; Steffener et al., 2016)”.

      Reviewer 3 Recommendations For The Authors #11:

      “As before, the unique effects of Brain Age indices were all relatively small across the four Brain Age indices and across different prediction models.” (p10) → Yes, again, this is inevitable considering how they are calculated. You can show these analyses to demonstrate your results in data, if you want, but ignoring the inevitability given how these variables are calculated is misleading.

      Response Accounting for the relationship between Brain Age and chronological age when examining the utility of Brain Age is not misleading. Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we believe that not doing so is misleading. That is, without accounting for the relationship between Brain Age and chronological age, Brain Age will likely explain the same variation of the phenotype of interest as chronological age. Please see our response to Reviewer 3 Recommendations For The Authors #18 below.

      Reviewer 3 Recommendations For The Authors #12:

      “On the contrary, the unique effects of Brain Cognition appeared much larger.” (p10) → This is not a fair comparison if you don’t look at the unique effects above and beyond the cognitive variable you predicted (fluid cognition) in your Brain Cognition model. When you do this, you will see that Brain Cognition is useless when you include fluid cognition in the model, just as Brain Age would be in predicting age when you include age in the model. This highlights the fact that using predicted values of a metric to predict that metric is a pointless path to take, and that using a predicted value to predict anything is worse than using the value itself.

      Response Please see our response to Reviewer 3 Public Review #6.

      Reviewer 3 Recommendations For The Authors #13:

      “First, how much does Brain Age add to what is already captured by chronological age? The short answer is very little.” (p12) → This is a really important point, but your paper requires an in-depth discussion of the inevitability of this result, which I have discussed previously in this review.

      Response Please see our response to Reviewer 3 Public Review #7.

      Reviewer 3 Recommendations For The Authors #14:

      “Second, do better-performing age-prediction models improve the ability of Brain Age to capture Cognitionfluid? Unfortunately, the answer is no.” (p12) → You need to be clear that you are talking about above and beyond age here.

      Response Thank you so much for your suggestion. We now made the change to this sentence accordingly.

      Discussion

      “Second, do better-performing age-prediction models improve the utility of Brain Age to capture fluid cognition above and beyond chronological age? The answer is also no.”

      Reviewer 3 Recommendations For The Authors #15:

      “Third, do we have a solution that can improve our ability to capture Cognitionfluid from brain MRI? The answer is, fortunately, yes. Using Brain Cognition as a biomarker, along with chronological age, seemed to capture a higher amount of variation in Cognitionfluid than only using Brain Age.” (p12) → Again, try controlling for the cognitive measure you predicted in your Brain Cognition model. This will show that Brain Cognition is not useful above and beyond cognition, highlighting the fact that it is not a useful endeavor to be using predicted values.

      Response Please see our response to Reviewer 3 Public Review #8.

      Reviewer 3 Recommendations For The Authors #16:

      “Accordingly, a race to improve the performance of age-prediction models (Baecker et al., 2021) does not necessarily enhance the utility of Brain Age indices as a biomarker for Cognitionfluid. This calls for a new paradigm. Future research should aim to build prediction models for Brian Age indices that are not necessarily good at predicting age, but at capturing phenotypes of interest, such as Cognitionfluid and beyond.” (p13) → I whole-heartedly agree with the first two sentences, and strongly disagree with the last. Certainly your results, and the underlying reason as to why you found these results, calls for a new paradigm (or, one might argue, a pre-brain age paradigm). They do not, however, suggest that we should keep going down the Brain Age path. In fact, I think it should be abandoned all together. While it is difficult to prove that there is no transformation of Brain Age or the Brain Age Gap that will be useful, I am nearly sure this is true from the research I have done. Therefore, if you would like to suggest that the field should continue down this path, you need to present a very good case to support this view.

      Response Please see our response to Reviewer 3 Public Review #9.

      Reviewer 3 Recommendations For The Authors #17:

      “Perhaps this is because the estimation of the influences of chronological age was done in the training set.” (p13) → I believe this is the case, and it is testable. Try re-running your analyses where parameters are estimated and performance is evaluated on the same data.

      Response Yes, we agreed with this. Based on the equations we used, this is inevitable.

      Reviewer 3 Recommendations For The Authors #18:

      “Similar to a previous recommendation (Butler et al., 2021), we suggest focusing on Corrected Brain Age Gap.” (p13) → To be clear, the authors did not use the term “Corrected” because it is very misleading. The authors also did not suggest that we proceed with any brain age metric; rather they mentioned that the modified brain age gap is independent of age. Note the following passage: “Further, the interpretability of the modified brain age gap (MBAG) itself is limited by the fact that it is a prediction error from a regression to remove the effects of age from a residual obtained through a regression to predict age. By virtue of these limitations, we suggest that the modified version may not provide useful information about precocity or delay in brain development. In light of this, as well as the complexities associated with interpretations of the BAG and its dependence on age, we suggest that further methodological and theoretical work is warranted.” I recognize that that this statement is hedged, as is often required in the publication process, but I am all but certain that MBAG/BAG/modified predicted age are useless constructs. Therefore, if you are going to suggest that people continue to use them, opposed to suggesting that further methodological or theoretical work is warranted, you need to make a strong case, which you did not try to make here. If anything, your results support abandoning the age- prediction endeavor altogether.

      Response Please see our response to Reviewer 3 Public Review #2 for the term. Briefly, we didn’t use Butler and colleagues’ (2021) MBAG, but rather RBAG. This index was originally described in de Lange and Cole’s (2020), and has now been implemented elsewhere (Cole et al., 2020; Cumplido-Mayoral et al., 2023; Denissen et al., 2022).

      We do not intend to encourage people to abandon the Brain Age endeavour altogether. However, we made main three suggestions for future research on Brain Age to ensure its utility. First, they should account for the relationship between Brain Age and chronological age either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining the unique effects of Brain Age indices after controlling for chronological age through commonality analyses (see below). This is similar to the suggestion made by Le and colleagues (2018) and later rephased by Butler and colleagues (2021). More specifically, Le and colleagues (2018) mentioned (p. 10): “Based on our observations in both real and simulated data, we recommend that the relationship between chronological age and BrainAGE should be accounted for. The two methods proposed in this study are either: (1) regress age on BrainAGE, producing BrainAGER, which is centered on 0 regardless of a participant's actual age or (2) include age as a regressor when doing follow-up analyses.”

      Second, we suggested that researchers should not select age-prediction models based solely on age-prediction performance (see our response to Reviewer 1 Recommendations For The Authors #1).

      Third, we suggested that researchers should test how much Brain Age miss the variation in the brain MRI that could explain fluid cognition or other phenotypes of interest (see our response to Reviewer 2 Public Review #4).

      Discussion

      “What does it mean then for researchers/clinicians who would like to use Brain Age as a biomarker? First, they have to be aware of the overlap in variation between Brain Age and chronological age and should focus on the contribution of Brain Age over and above chronological age. Using Brain Age Gap will not fix this. Butler and colleagues (2021) recently highlighted this point, “These results indicate that the association between cognition and the BAG are driven by the association between age and cognitive performance. As such, it is critical that readers of past literature note whether or not age was controlled for when testing for effects on the BAG, as this has not always been common practice (p. 4097).” Similar to previous recommendations (Butler et al., 2021; Le et al., 2018), we suggest future work should account for the relationship between Brain Age and chronological age, either using Corrected Brain Age Gap (or other similar adjustments) or, better, examining unique effects of Brain Age indices after controlling for chronological age through commonality analyses. Note we prefer using unique effects over beta estimates from multiple regressions, given that unique effects do not change as a function of collinearity among regressors (Ray-Mukherjee et al., 2014). In our case, Brain Age indices had the same unique effects regardless of the level of common effects they had with chronological age (e.g., Brain Age vs. Corrected Brain Age Gap from stacked models). In the case of fluid cognition, the unique effects might be too small to be clinically meaningful as shown here and previously (Butler et al., 2021; Cole, 2020; Jirsaraie, Kaufmann, et al., 2023).”

      Reviewer 3 Recommendations For The Authors #19:

      “To compute Brain Age and Brain Cognition, we ran two separate prediction models. These prediction models either had chronological age or Cognitionfluid as the target.” (p16) → You should make it clear in the main text of your paper that the cognition variable in your Brain Cognition models is the same as what you refer to as Cognitionfluid. Some of your analyses would have been much more reasonable if you had two different measures of cognition.

      Response Thank you so much for the suggestion. We believe, given the re-conceptualisation of Brain Cognition as the main text

      Introduction

      “certain variation in the brain MRI is related to fluid cognition, but to what extent does Brain Age not capture this variation? To estimate the variation in the brain MRI that is related to fluid cognition, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data.”

      Reviewer 3 Recommendations For The Authors #20:

      “We controlled for the potential influences of biological sex on the brain features by first residualizing biological sex from brain features in the training set.” (p16) → Why? Your question is about prediction, not causal inference.

      Response While the question is about prediction, we still would like to, as much as possible, be confident about what kind of information we drew from. Here we focused on brain data and controlled for other variables that might not be neuronal. For instance, we controlled for movement and physiological noise using ICA-FIX (Glasser et al., 2016). Following conventional practices in brain-based predictive modelling, we also treated biological sex as another sort of noise (Vieira et al., 2022). The difference between movement/physiological noise and biological sex is that the former varies across TRs, and the latter varies across individuals. Thus we controlled for movement and physiological noise within each participant and controlled for biological sex within a group of participants who belonged to the same training set.

      Reviewer 3 Recommendations For The Authors #20:

      “Lastly, we computer Corrected Brain Age Gap by subtracting the chronological age from the Corrected Brain Age (Butler et al., 2021; Le et al., 2018).” (p17) → The modified brain age gap in that paper is the residuals from regressing BAG on age (see equation 6). I highly recommend using that terminology and notation throughout to provide consistency and interpretability across papers.

      Response Please see our response to Reviewer 3 Public Review #2 for the term.

      Reviewer 3 Recommendations For The Authors #21: Equations (pgs 17-19) → Please use statistical notation instead of pseudo-R code.

      Response We rewrote all of the equations using statistical notations.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Beheshti, I., Nugent, S., Potvin, O., & Duchesne, S. (2019). Bias-adjustment in neuroimaging-based brain age frameworks: A robust scheme. NeuroImage: Clinical, 24, 102063. https://doi.org/10.1016/j.nicl.2019.102063

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533 Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Cole, J. H., Raffel, J., Friede, T., Eshaghi, A., Brownlee, W. J., Chard, D., De Stefano, N., Enzinger, C., Pirpamer, L., Filippi, M., Gasperini, C., Rocca, M. A., Rovira, A., Ruggieri, S., Sastre-Garriga, J., Stromillo, M. L., Uitdehaag, B. M. J., Vrenken, H., Barkhof, F., … Group, M. study. (2020). Longitudinal Assessment of Multiple Sclerosis with the Brain-Age Paradigm. Annals of Neurology, 88(1), 93–105. https://doi.org/10.1002/ana.25746

      Cumplido-Mayoral, I., García-Prat, M., Operto, G., Falcon, C., Shekari, M., Cacciaglia, R., Milà-Alomà, M., Lorenzini, L., Ingala, S., Meije Wink, A., Mutsaerts, H. J., Minguillón, C., Fauria, K., Molinuevo, J. L., Haller, S., Chetelat, G., Waldman, A., Schwarz, A. J., Barkhof, F., … OASIS study. (2023). Biological brain age prediction using machine learning on structural neuroimaging data: Multi-cohort validation against biomarkers of Alzheimer’s disease and neurodegeneration stratified by sex. ELife, 12, e81067. https://doi.org/10.7554/eLife.81067

      de Lange, A.-M. G., & Cole, J. H. (2020). Commentary: Correction procedures in brain-age prediction. NeuroImage: Clinical, 26, 102229. https://doi.org/10.1016/j.nicl.2020.102229

      Demontis, D., Walters, R. K., Martin, J., Mattheisen, M., Als, T. D., Agerbo, E., Baldursson, G., Belliveau, R., Bybjerg-Grauholm, J., Bækvad-Hansen, M., Cerrato, F., Chambert, K., Churchhouse, C., Dumont, A., Eriksson, N., Gandal, M., Goldstein, J. I., Grasby, K. L., Grove, J., … Neale, B. M. (2019). Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nature Genetics, 51(1), Article 1. https://doi.org/10.1038/s41588-018-0269-7

      Denissen, S., Engemann, D. A., De Cock, A., Costers, L., Baijot, J., Laton, J., Penner, I., Grothe, M., Kirsch, M., D’hooghe, M. B., D’Haeseleer, M., Dive, D., De Mey, J., Van Schependom, J., Sima, D. M., & Nagels, G. (2022). Brain age as a surrogate marker for cognitive performance in multiple sclerosis. European Journal of Neurology, 29(10), 3039–3049. https://doi.org/10.1111/ene.15473

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Franke, K., & Gaser, C. (2019). Ten Years of BrainAGE as a Neuroimaging Biomarker of Brain Aging: What Insights Have We Gained? Frontiers in Neurology, 10, 789. https://doi.org/10.3389/fneur.2019.00789

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Horien, C., Noble, S., Greene, A. S., Lee, K., Barron, D. S., Gao, S., O’Connor, D., Salehi, M., Dadashkarimi, J., Shen, X., Lake, E. M. R., Constable, R. T., & Scheinost, D. (2020). A hitchhiker’s guide to working with large, open-source neuroimaging datasets. Nature Human Behaviour, 5(2), 185–193. https://doi.org/10.1038/s41562-020-01005-4

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Khojaste-Sarakhsi, M., Haghighi, S. S., Ghomi, S. M. T. F., & Marchiori, E. (2022). Deep learning for Alzheimer’s disease diagnosis: A survey. Artificial Intelligence in Medicine, 130, 102332. https://doi.org/10.1016/j.artmed.2022.102332

      Le, T. T., Kuplicki, R. T., McKinney, B. A., Yeh, H.-W., Thompson, W. K., Paulus, M. P., Tulsa 1000 Investigators, Aupperle, R. L., Bodurka, J., Cha, Y.-H., Feinstein, J. S., Khalsa, S. S., Savitz, J., Simmons, W. K., & Victor, T. A. (2018). A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE. Frontiers in Aging Neuroscience, 10. https://www.frontiersin.org/articles/10.3389/fnagi.2018.00317

      Liang, H., Zhang, F., & Niu, X. (2019). Investigating systematic bias in brain age estimation with application to post-traumatic stress disorders. Human Brain Mapping, 40(11), 3143–3152. https://doi.org/10.1002/hbm.24588

      Luby, J. L. (2010). Preschool Depression: The Importance of Identification of Depression Early in Development. Current Directions in Psychological Science, 19(2), 91–95. https://doi.org/10.1177/0963721410364493

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Ray-Mukherjee, J., Nimon, K., Mukherjee, S., Morris, D. W., Slotow, R., & Hamer, M. (2014). Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity. Methods in Ecology and Evolution, 5(4), 320–328. https://doi.org/10.1111/2041-210X.12166

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Satterthwaite, T. D., Connolly, J. J., Ruparel, K., Calkins, M. E., Jackson, C., Elliott, M. A., Roalf, D. R., Hopson, R., Prabhakaran, K., Behr, M., Qiu, H., Mentch, F. D., Chiavacci, R., Sleiman, P. M. A., Gur, R. C., Hakonarson, H., & Gur, R. E. (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. NeuroImage, 124, 1115–1119. https://doi.org/10.1016/j.neuroimage.2015.03.056

      Smith, S. M., Vidaurre, D., Alfaro-Almagro, F., Nichols, T. E., & Miller, K. L. (2019). Estimation of brain age delta from brain imaging. NeuroImage, 200, 528–539. https://doi.org/10.1016/j.neuroimage.2019.06.017

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Stigler, S. M. (1997). Regression towards the mean, historically considered. Statistical Methods in Medical Research, 6(2), 103–114. https://doi.org/10.1177/096228029700600202

      Sudlow, C., Gallacher, J., Allen, N., Beral, V., Burton, P., Danesh, J., Downey, P., Elliott, P., Green, J., Landray, M., Liu, B., Matthews, P., Ong, G., Pell, J., Silman, A., Young, A., Sprosen, T., Peakman, T., & Collins, R. (2015). UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLOS Medicine, 12(3), e1001779. https://doi.org/10.1371/journal.pmed.1001779

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript titled "Disease modeling and pharmacological rescue of autosomal dominant Retinitis Pigmentosa associated with RHO copy number variation" the authors describe the use of patient iPSC-derived retinal organoids to evaluate the pathobiology of a RHO-CNV in a family with dominant retinitis pigmentosa (RP). They find significantly increased expression of rhodopsin, especially within the photoreceptor cell body, and defects in photoreceptor cell outer segment formation/maturation. In addition, they demonstrate how an inhibitor of NR2E3 (a rod transcription factor required for inducing rhodopsin expression), can be used to rescue the disease phenotype.

      Strengths:

      The manuscript is very well written, the illustrations and data presented are compelling, and the authors' interpretation/discussion of their findings is logical.

      Weaknesses:

      A weakness, which the authors have addressed in the discussion section, is the lack of an isogenic control, which would allow for direct analysis of the RHO-CNV in the absence of the other genetic sequence contained within the duplicated region. As the authors suggest, CRISPR correction of a large CNV in the absence of inducing unwanted on-target editing events in patient iPSCs is often very challenging. Given that they have used a no-disease iPSC line obtained from a family member, controlled for organoid differentiation kinetics/maturation state, and that no other complete disease-causing gene is contained within the duplicated region, it is unlikely that the addition of an isogenic control would yield significantly different results.

      Aims and conclusions:

      This reviewer is of the opinion that the authors have achieved their aims and that their results support their conclusions.

      Discussion:

      The authors have provided adequate discussion on the utility of the methods and data as well as the impact of their work on the field.

      We thank the reviewer for their insightful, and encouraging review of our work that has taken several years to get to current stage.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kandoi et al. describes a new 3D retinal organoid model of a mono-allelic copy number variant of the rhodopsin gene that was previously shown to induce autosomal dominant retinitis pigmentosa via a dominant negative mechanism in patients. With advancements in the low-cost genomics application to detect copy number variations, this is a timely article that highlights a potential disease mechanism that goes beyond the retina field. The evidence is relatively strong that the rod photoreceptor phenotype observed in an adult patient with RP in vivo is similar to that phenotype observed in human stem cell-derived retinal organoids. Increases in RHO expression detected by qPCR, RNA-seq, and IHC support this phenotype. Importantly, the amelioration of photoreceptor rhodopsin mislocalization and related defects using the small molecule drug photoregulin demonstrates an important potential clinical application.

      Overall, the authors succeeded in providing solid evidence that copy number variation via a genomic RHO duplication leads to abnormalities in rod photoreceptors that can be partially blocked by photoregulin. However, there are several points that should be addressed that will enhance this paper.

      Strengths:

      • The use of patient-derived organoids from patients that have visual defects is a major strength of this work and adds relevance to the disease phenotype.

      • The rod phenotype assessed by qPCR, RNA-seq, and IHC supports a phenotype that shares similarities with the patient.

      • The use of a small molecule drug that selectively targets rod photoreceptors, as opposed to cones, is a noteworthy strength.

      We thank the reviewers for highlighting the key strengths of the paper.

      Weaknesses:

      1) The chromosomal segment that was duplicated had 3 copies of RHO in addition to three copies of each of the flanking genes (IFT122, HIF100, PLXND1). Discussion of the involvement of these genes would be helpful. Would duplication of any of these genes alone cause or contribute to adRP? As an example, a missense mutation in IFT122 was previously implicated in photoreceptor loss (PMID: 33606121 PMCID: PMC8519925).

      Thank you for your comment. It is an interesting question on the contribution of the other duplicated genes. Of these, IFT122 is particularly interesting as pointed out. We did a thorough survey through literature and our genetic testing partner’s database, BluePrint Genetics. We did not find any human retinal degeneration cases with variants in IFT122. IFT122 has been shown to cause recessive phenotype in dogs and in complete knockout zebrafish model but dominant or overexpression has not been shown to have a phenotype. Interestingly, recessive biallelic IFT122 mutation can cause Cranioectodermal Dysplasia (Sensenbrenner syndrome, PMID: 24689072) and none of these patient exhibited retinal dystrophy. HIF100 is an epigenetic modifier gene while PLXND1 is expressed in endothelial cells. We will include a discussion on this in the revised manuscript.

      2) Related to #1, have the authors considered inserting extra copies of RHO (and/or the flanking genes) of these at a genomic safe harbor site? Although not required, this would allow one to study cells with isogenic-matched genetic backgrounds and would partially address the technical challenge of repairing a 188kb duplication, which as the authors note would be difficult to do. Demonstrating that excess copy numbers in different genetic backgrounds would be a huge contribution to the field. At a minimum, a discussion of the role of the nearby genes should be included.

      Thank you for your suggestion. We plan to test the relative role of 1-3 extra copies of RHO driven off a NRL promoter in order to drive it only in rods in our future mechanistic analysis studies. We will include a discussion on the potential role of the other genes in the revised manuscript.

      3) In the patient, the central foveal region was spared suggesting that cones were normal. Was there a similar assessment that cones are unaffected in retinal organoids?

      We will include this data in our revised manuscript but overall did not see a cone defect in RHO CNV organoids. Additionally, although it is true that the central foveal region was relatively spared in this patient, the cones are definitely not normal. The macular cones that remain have been damaged by chronic edema, and photoreceptor and RPE atrophy has progressed into the macula, sparing only the foveal cones.

      4) Pathway analysis indicated that glycosylation was perturbed and this was proposed as an explanation as to why rhodopsin was mislocalized. Have the authors verified that there is an actual decrease in glycosylation?

      These studies are ongoing. We are currently looking into the details of cellular pathophysiology focusing on RHO trafficking in RHO-CNV including role of glycosylation and other post-translational modifications defects.

      5) Line 182: by what criteria are the authors able to state that " there were no clear visible anatomical changes in apical-basal retinal cell type distribution during the early differentiation timeframe (data not shown)." Was this based on histological staining with antibodies, nuclear counter-staining, or some other evaluation?

      This was based on both IHC for various cell type markers and nuclear (DAPI) staining.

      6) Figure 2C - the appearance of the inner segments in RC and RM looks very different from one another. Have the authors ruled out the possibility that the RC organoid cell isn't a cone? In addition, the RM structure has what appears to be a well-defined OLM which would suggest well-formed Muller glia. Do these structures also exist in RC organoids? Typically the OLM does form in older organoids. In addition, was this representative in numerous EM preparations?

      For clarification on EM data, we will include additional images in the revision as supplementary data. We have not carefully compared OLM between the patient and control organoids but do observe them in both conditions in the older organoids. The EM preparations were made from multiple organoids from two different batches with consistent results.

      7) What criteria were used to assess cell loss? Has any TUNEL labeling been performed to confirm cell loss? From the existing data, it seems that rod outer segments appear to be affected in organoids. However, it's not clear if the photoreceptors themselves actually die in this model.

      TUNEL was used to assess cell loss and it was not significantly different between the control and patient organoids at the timepoints examined. We did not expect a change as the disease in the patient developed over decades.

      8) Figure 5B. The RHO staining in the vehicle-treated sample is perturbed relative to the PR3 treatments as indicated in the text. In the vehicle-treated sample, the number of DAPI-positive cells that are completely negative proximal to the inner segments suggests that there might be non-rod cells there. Have the authors confirmed whether these are cones? Labels would be helpful in the left vehicle panel as the morphology looks very different than the treated samples.

      Thank you very much for the various suggestions and these will be included in the revised manuscript version. A number of the cells in the negative regions are OTX2+/NRL- and likely to be cones (Figure 4 A and B). Unfortunately, we do not have a very good cone nuclear marker as RXRγ does not consistently stain mature cones.

      9) It is interesting that in addition to increases in RHO, and photo-transduction, there are also increases in PTPRT which is related to synaptic adhesion. Is there evidence of ectopic neurites that result from PTPRT over-expression?

      You are absolutely correct that PTPRT data is very interesting. PTPRT requires similar PTMs like RHO in photoreceptors for its synaptic localization. We did not specifically look at ectopic neurites and test that in the revision. It will interesting to follow-up on its expression pattern to see if it gets processed or localized normally if we can find a working antibody. It is also possible that the gene-expression increase due to feedback upregulation secondary to improper protein processing.

      Reviewer #3 (Public Review):

      This manuscript reports a novel pedigree with four intact copies of RHO on a single chromosome which appears to lead to overexpression of rhodopsin and a corresponding autosomal dominant form of RP. The authors generate retinal organoids from patient- and control-derived cells, characterize the phenotypes of the organoids, and then attempt to 'treat' aberrant rhodopsin expression/mislocalization in the patient organoids using a small molecule called photoregulin 3 (PR3). While this novel genetic mechanism for adRP is interesting, the organoid work is not compelling. There are multiple problems related to the technical approaches, the presentation of the results, and the interpretations of the data. I will present my concerns roughly in the order in which they appear in the manuscript.

      Major concerns:

      (1) Individual human retinal organoids in culture can show a wide range of differentiation phenotypes with respect to the expression of specific markers, percentages of given cell types, etc. For this reason, it can be very difficult to make rigorous, quantitative comparisons between 'wild-type' and 'mutant' organoids. Despite this difficulty, the author of the present manuscript frequently presents results in an impressionistic manner without quantitation. Furthermore, there is no indication that the investigator who performed the phenotypic analyses was blind with respect to the genotype. In my opinion, such blinding is essential for the analysis of phenotypes in retinal organoids. To give an example, in lines 193-194 the authors write "we observed that while the patient organoids developing connecting cilium and the inner segments similar to control organoids, they failed to extend outer segments". Outer segments almost never form normally in human retinal organoids, even when derived from 'wild-type' cells. Thus, I consider it wholly inadequate to simply state that outer segment formation 'failed' without a rigorous, quantitative, and blinded comparison of patient and control organoids.

      We agree it is challenging to generate outer segments in retinal organoids but we are not the first to show this. This has been demonstrated by multiple independent labs (Mayerl et al (PMID: 36206764), Wahlin et al (PMID: 28396597), West at al (PMID: 35334217) including ours (Chirco et al (PMID: 34653402). To clarify, we did not observe any OS like tissue in the patient organoids across multiple EM preps of a number of organoids from two independent 300+ day experiments which matched the phase microscopy data presented in Fig2B.

      (2) The presentation of qPCR results in Figure 3A is very confusing. First, the authors normalize expression to that of CRX, but they don't really explain why. In lines 210-211, they write "CRX, a ubiquitously expressing photoreceptor gene maintained from development to adulthood." Several parts of this sentence are misleading or incomplete. First, CRX is not 'ubiquitously expressed' (which usually means 'in all cell types') nor is it photoreceptor-specific: CRX is expressed in rods, cones, and bipolar cells. Furthermore, CRX expression levels are not constant in photoreceptors throughout development/adulthood. So, for these reasons alone, CRX is a poor choice for the normalization of photoreceptor gene expression.

      As you are aware, all housekeeping genes have shortcomings when used for normalizing PCR data. We went with CRX as within the timepoints chosen, it is not expected to change much and thus represent a good equalizer for relative photoreceptor numbers between the organoids and conditions. While we agree that CRX is weakly expressed in bipolar cells (Yamamoto et al 2020), it is not expected to bias the data too much as we have not seen nor have other reported a huge relative difference in bipolar cell number in organoids. We also confirm this by showing equivalent expression of OTX2, RCVRN and NRL between all conditions.

      Second, the authors' interpretation of the qPCR results (lines 216-218) is very confusing. The authors appear to be saying that there is a statistically significant increase in RHO levels between D120 and D300. However, the same change is observed in both control and patient organoids and is not unexpected, since the organoids are more mature at D300. The key comparison is between control and patient organoids at D300. At this time point, there appears to be no difference between control and patient. The authors don't even point this out in the main text.

      Thank you for the comment and we apologize if this confused you. However, as can been seen in the graph in Figure 3A, we do compare expression of genes including RHO between control and patient organoids at two different time points. There are four conditions: D120-RC, D120-RM, D300-RC and D300-RM with individual data points and error bars for each condition. There is a statistically significant increase at both time points upon comparing the control and patient organoids for RHO. We compared RHO expression between patient organoids at the two time points and it was not statistically different.

      Third, the variability in the number of photoreceptor cells in individual organoids makes a whole-organoid comparison by qPCR fraught with difficulty. It seems to me that what is needed here is a comparison of RHO transcript levels in isolated rod photoreceptors.

      We agree that this makes it challenging. This was the exact reasoning for using CRX for normalization since it is predominantly present in photoreceptors. This was validated by the data showing no difference in expression of photoreceptor markers OTX2, RCVRN or NRL between the organoids.

      (3) I cannot understand what the authors are comparing in the bulk RNA-seq analysis presented in the paragraph starting with line 222 and in the paragraph starting with line 306. They write "we performed bulk-RNA sequencing on 300-days-old retinal organoids (n=3 independent biological replicates). Patient retinal organoids demonstrated upregulated transcriptomic levels of RHO... comparable to the qRT-PCR data." From the wording, it suggests that they are comparing bulk RNA-seq of patients and control organoids at D300. However, this is not stated anywhere in the main text, the figure legend, or the Methods. Yet, the subsequent line "comparable to the qRT-PCR data" makes no sense, because the qPCR comparison was between patient samples at two different time points, D120 and D300, not between patient and control. Thus, the reader is left with no clear idea of what is even being compared by RNA-seq analysis.

      We apologize if the conditions were not obvious and will clarify this in the revised version. The conditions compared are control and patient organoids at D300. Regarding comparison to RT-PCR, as stated above, the comparison shown is between patient and control organoids at two different timepoints.

      Remarkably, the exact same lack of clarity as to what is being compared is found in the second RNA-seq analysis presented in the paragraph starting with line 306. Here the authors write "We further carried out bulk RNA-sequencing analysis to comprehensively characterize three different groups of organoids, 0.25 μM PR3-treated and vehicle-treated patient organoids and control (RC) organoids from three independent differentiation experiments. Consistent with the qRT-PCR gene expression analysis, the results showed a significant downregulation in RHO and other rod phototransduction genes." Here, the authors make it clear that they have performed RNA-seq on three types of samples: PR3-treated patient organoids, vehicle-treated patient organoids, and control organoids (presumably not treated). Yet, in the next sentence, they state "the results showed a significant downregulation in RHO", but they don't state what two of the three conditions are being compared! Although I can assume that the comparison presented in Fig. 6A is between patient vehicle-treated and PR3-treated organoids, this is nowhere explicitly stated in the manuscript.

      Thank you for the comment and we will explicitly state various comparisons in the revised version.

      (4) There are multiple flaws in the analysis and interpretation of the PR3 treatment results. The authors wrote (lines 289-2945) "We treated long-term cultured 300-days-old, RHO-CNV patient retinal organoids with varying concentrations of PR3 (0.1, 0.25 and 0.5 μM) for one week and assessed the effects on RHO mRNA expression and protein localization. Immunofluorescence staining of PR3-treated organoids displayed a partial rescue of RHO localization with optimal trafficking observed in the 0.25 μM PR3-treated organoids (Figure 5B). None of the organoids showed any evidence of toxicity post-treatment."

      There are multiple problems here. First, the results are impressionistic and not quantitative. Second, it's not clear that the investigator was blinded with respect to the treatment condition. Third, in the sections presented, the organoids look much more disorganized in the PR3-treated conditions than in the control. In particular, the ONL looks much more poorly formed. Overall, I'd say the organoids looked considerably worse in the 0.25 and 0.5 microM conditions than in the control, but I don't know whether or not the images are representative. Without rigorously quantitative and blinded analysis, it is impossible to draw solid conclusions here. Lastly, the authors state that "none of the organoids showed any evidence of toxicity post-treatment," but do not explain what criteria were used to determine that there was no toxicity.

      Thank you for your critical insight. The RHO localization data is qualitative as it is very difficult to accurately quantify rhodopsin trafficking within the cell in the organoid. Thus, for quantitative comparison, we have provided expression level changes. Regarding toxicity, we analyzed the organoids by morphology and TUNEL and did not observe significant difference between the conditions. This closely mimics mouse data on PR3 which suppressed rod function in mice following IP injection without any obvious toxicity.

      (5) qPCR-based quantitation of rod gene expression changes in response to PR3 treatment is not well-designed. In lines 294-297 the authors wrote "PR3 drove a significant downregulation of RHO in a dose-dependent manner. Following qRT-PCR analysis, we observed a 2-to-5 log2FC decrease in RHO expression, along with smaller decreases in other rod-specific genes including NR2E3, GNAT1 and PDE6B." I assume these analyses were performed on cDNA derived from whole organoids. There are two problems with this analysis/interpretation. First, a decrease in rod gene expression can be caused by a decrease in the number of rods in the treated organoids (e.g., by cell death) or by a decrease in the expression of rod genes within individual rods. The authors do not distinguish between these two possibilities. Second, as stated above, the percentage of cells that are rods in a given organoid can vary from organoid to organoid. So, to determine whether there is downregulation of rod gene expression, one should ideally perform the qPCR analysis on purified rods.

      The reviewer is correct in pointing the potential reasons for reduction in RHO levels following PR3 treatment. Thus, we have provided NRL expression levels in the graph to show that this key rod-specific gene does not change suggesting equivalent number of rod photoreceptor cells. The suggestion of using purified rods is not practical here, as we do not have any way to sort human rods due to the lack of a rod-specific cell surface marker.

      (6) In Figure 4B 'RM' panels, the authors show RHO staining around the somata of 'rods' but the inset images suggest that several of these cells lack both NRL and OTX2 staining in their nuclei. All rods should be positive for NRL. Conversely, the same image shows a layer of cells scleral to the cells with putative RHO somal staining which do not show somal staining, and yet they do appear to be positive for NRL and OTX2. What is going on here? The authors need to provide interpretations for these findings.

      Since RHO is a cytoplasmic marker and photoreceptor are tightly packed, it is difficult to make a 1:1 comparison to NRL/OTX2 nuclear marker to RHO. Additionally, as the RHO+ cytoplasm moves towards scleral surface, it is expected to pass adjacent to other nuclei. Few of the rods do still have normal Rhodopsin trafficking and it is likely these will not have somal RHO similar to control conditions. We do rarely observe these cells as highlighted by the occasional RHO in IS/OS of RM organoids in the figure. We do agree that the NRL staining in the figure 4B (>D250) is not extremely crisp and we will include an updated figure in the revised version.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: This study presents fundamental new insights into vesicular monoamine transport and the binding pose of the clinical drug tetrabenazine (TBZ) to the mammalian VMAT2 transporter. Specifically, this study reports the first structure for the mammalian VMAT (SLC18) family of vesicular monoamine transporters. It provides insights into the mechanism by which this inhibitor traps VMAT2 into a 'dead-end' conformation. The structure also provides some evidence for a novel gating mechanism within VMAT2, which may have wider implications for understanding the mechanism of transport in the wider SLC18 family.

      Strengths: The structure is high quality, and the method used to determine the structure via fusing mVenus and the anti-GFP nanobody to the amino and carboxyl termini is novel. The binding and transport data are convincing, although limited. The binding position of TBZ is of high value, given its role in treating Huntington's chorea and for being a 'dead-end' inhibitor for VMAT2.

      Weaknesses: The lack of additional mutational data and/or analyses on the impact of pH on ligand binding reduces the insights from these experiments. This reduces the strength of the conclusions that can be drawn about the mechanism of binding and transport or the novelty of the gating mechanism discussed above.

      We greatly appreciate this summary and thank reviewer #1 for their comments and suggested experiments which we believe will further strengthen this work. We agree with these comments and plan to include more mutagenesis data in a revised manuscript in order to address this point and expand further on the mechanistic details of transport.

      Reviewer #2 (Public Review):

      Overview:

      As a report of the first structure of VMAT2, indeed the first structure of any vesicular monoamine transporter, this manuscript represents an important milestone in the field of neurotransmitter transport. VMAT2 belongs to a large family (the major facilitator superfamily, MFS) containing transporters from all living species. There is a wealth of information relating to the way that MFS transporters bind substrates, undergo conformational changes to transport them across the membrane, and couple these events to the transmembrane movement of ions. VMAT2 couples the movement of protons out of synaptic vesicles to the vesicular uptake of biogenic amines (serotonin, dopamine, and norepinephrine) from the cytoplasm. The new structure presented in this manuscript can be expected to contribute to an understanding of this proton/amine antiport process.

      The structure contains a molecule of the inhibitor TBZ bound in a central cavity, with no access to either luminal or cytoplasmic compartments. The authors carefully analyze which residues interact with bound TBZ and measure TBZ binding to VMAT2 mutated at some of those residues. These measurements allow well-reasoned conclusions about the differences in inhibitor selectivity between VMAT1 and VMAT2 and differences in affinity between TBZ derivatives.

      The structure also reveals polar networks within the protein and hydrophobic residues in positions that may allow them to open and close pathways between the central binding site and the cytoplasm or the vesicle lumen. The authors propose the involvement of these networks and hydrophobic residues in the coupling of transport to proton translocation and conformational changes. However, these proposals are quite speculative in the absence of supporting structures and experimentation that would test specific mechanistic details.

      Thank you for these comments and summary describing this work. We agree that the involvement of polar networks has not been experimentally tested; these are proposed as a possible mechanism, but we have not made mechanistic conclusions on how protons are translocated and coupled to transport. We believe we have made it clear in the manuscript when describing the polar networks that the corresponding discussion is largely descriptive and speculative and will further stress that in a future revision. We would like to point out however, that many of the polar and charged residues which make up these networks have been studied and that there is a wealth of biochemical and functional experiments in the literature which implicate these residues in this process. Yet, we agree that establishing the precise mechanistic details will require additional structures and likely also extensive computational experiments. We have cited these papers that have characterized these polar residues extensively throughout the text (30-32,37,49,55).

      We would like to submit that we have not proposed that the hydrophobic gates are involved in proton translocation. Gating residues, by definition, block access to the binding site (29,30,48); and since our structure is occluded, we directly observe the residues which participate in both gates. We have also performed extensive mutagenesis studies of many of these hydrophobic gating residues and our binding data are consistent with this conclusion. Transport experiments with mutations at these gates might be helpful toward gaining a deeper understanding of transport mechanism but given the current structural data it is conceivable that these residues play a role in gating neurotransmitter.

      Critique:

      Although the structure presented in this MS is clearly important, I feel that the authors have overstated several of the conclusions that can be drawn from it. I don't agree that the structure clearly indicates why TBZ is a non-competitive inhibitor; the proposal that specific hydrophobic residues function as gates will depend on lumen- and cytoplasm-facing structures for verification; the polar networks could have any number of functions - indeed it would be surprising if they were all involved in proton transport. Several of these issues could be resolved by a clearer illustration of the data, but I believe that a more rigorous description of the conclusions and where they fall between firm findings and speculation would help the reader put the results in perspective.

      The central argument made by this reviewer that is repeated throughout this critique is that more structures of various states are needed to make mechanistic conclusions with respect to how TBZ binds and alternating access. While additional structures would certainly add mechanistic detail, they are not required to make these conclusions. In fact, as we point out throughout the text, these conclusions have already been made in various publications which we have cited and discussed. Decades of mutagenesis, binding, transport, inhibition, and accessibility measurements all support the conclusion that TBZ binds from the luminal side and that VMAT2 uses an alternating mechanism to transport neurotransmitter (30-32,35-37,55). Structures are neither required nor sufficient to make such claims and more structures of various apo states in different conformations would not provide any additional support to this question. If the predominant apo state was luminal open, cytoplasm open or occluded, this would not prove how TBZ enters VMAT2. Structural data alone does not provide these details; biochemical data does and structures are useful for understanding the details of how these mechanisms work. Thus, our structure provides the molecular framework for understanding the binding site, conformation, gating, and polar networks and we have interpreted our own biochemical data as well as the available biochemical data in the literature in the context of our structure.

      The structure indicates why TBZ is a non-competitive inhibitor (35,36) because it is not possible for neurotransmitters to compete for binding to this state. Neurotransmitter initially binds to the cytosolic facing state where the intracellular gates are open, inhibition by binding to this state would result in a competitive mechanism. Since TBZ is non-competitive, it must bind through the luminal-open state where the luminal gate is open. Further conformational change produces the occluded conformation with both the luminal and intracellular gates closed which is what we observe in the structure. This finding is supported by numerous biochemical and functional experiments and by extensive analysis of mutants in the gates using binding assays, transport experiments and cysteine accessibility experiments. We have cited and discussed these key papers (30-32,35-37,55) throughout the text and our results support the conclusions drawn from these works.

      Non-competitive inhibition occurs when the action of an inhibitor can't be overcome by increasing substrate concentration. The structure shows TBZ sequestered in the central cavity with no access to either cytoplasm or lumen. The explanation of competitive vs non-competitive inhibition depends entirely on how TBZ got there. If it is bound from the cytoplasm, cytoplasmic substrate should have been able to compete with TBZ and overcome the inhibition. If it is bound from the lumen, or from within the bilayer, cytoplasmic substrate would not be able to compete, and inhibition would be non-competitive. The structure does not tell us how TBZ got there, only that it was eventually occluded from both aqueous compartments and the bilayer.

      TBZ is accepted to be a non-competitive inhibitor, based on decades of research, and not based solely on our structure (30-32,35,36). Our structure provides insight into the molecular mechanism by which non-competitive inhibition occurs. Previous studies have shown that TBZ enters through the luminal side of the transporter, resulting in non-competitive inhibition by binding to a conformation of the transporter which does not bind cytosolic neurotransmitter. We agree our structure does not prove how TBZ ‘got there’, but other studies have addressed this question (30-32, 35, 36) and have been discussed in detail.

      The issue of how VMAT2 opens access to the central binding site from luminal and cytoplasmic sides is an important and interesting one, and comparison with other MFS structures in cytoplasmic-open or extracellular/luminal-open is a very reasonable approach. However, any conclusions for VMAT2 should be clearly indicated as speculative in the absence of comparable open structures of VMAT2. As a matter of presentation, I found the illustrations in ED Fig. 6 to be less helpful than they could have been. Specifically, illustrations that focus on the proposed gates, comparing that region of the new structure with the corresponding region of either VGLUT or GLUT4 would better help the reader to compare the position of the proposed gate residues with the corresponding region of the open structure. I realize that is the intended purpose of ED Fig. 6b and 6c, but currently, those show the entire protein, and a focus on the gate regions might make the proposed gate movements clearer. I also appreciate the difference between the Alphafold prediction and the new structure, but I'm not convinced that ED Fig. 6a adds anything helpful.

      Thank you for the suggestion. We will prepare a new figure that focuses on the gates to make this clearer. The comparison with Alphafold is valuable since the luminal loops and gates are not well modeled. Many groups are using these structures to do biochemical and computational experiments and perhaps even to design small-molecules. Since Alphafold differs substantially in this area, it might be of interest to those in the community doing this type of work.

      The polar networks described in the manuscript provide interesting possibilities for interactions with substrates and protons whose binding to VMAT2 must control conformational change. Aside from the description of these networks, there is little evidence presented to assess the role of these networks in transport. Are the networks conserved in other closely related transporters? How could the interaction of the networks with substrate or protons affect conformational change? Of course, any potential role proposed for the networks would be highly speculative at this point, and any discussion of their role should point out their speculative nature and the need for experimental verification. Some speculation, however, can be useful for focusing the field's attention on future directions. However, statements in the abstract (three distinct polar networks... play a role in proton transduction.) and the discussion (...are likely also involved in mediating proton transduction.) should be clearly presented as speculation until they are validated experimentally.

      We agree these statements are speculative, which we acknowledged in the text. We will further emphasize this point in a future revision. Please note, however, that many of these residues have been highlighted in other studies (30-32,37,49,55), and we have cited them in the text. Please see previous response.

      Most of these residues are indeed highly conserved. It is a good idea to highlight this in our sequence alignment of related transporters. We will do so in our revised manuscript.

      The strongest aspect of this work (aside from the structure itself) is the analysis of TBZ binding. There is a problematic aspect to this analysis. The discussion on how TBZ stabilizes the occluded conformation of VMAT2 is premature without structures of apo-VMAT2 and possibly structures with other ligands bound. We don't really know at this point whether VMAT2 might be in the same occluded conformation in the absence of TBZ. Any statements regarding the effect of interactions between VMAT2 and TBZ depend on demonstrating that TBZ has a conformational effect. The same applies to the discussion of the role of W318 on conformation and to the loops proposed to "occlude the luminal side of the transporter" (line 131).

      Please see the response to this argument presented earlier. The occluded structure clearly shows the residues serving as gates. To understand how the gates open is a separate question. This does require additional structures and computations which are beyond the scope of this work. Our structure is interpreted in the context of all available biochemical data.

      The description of VMAT2 mechanism makes many assumptions that are based on studies with other MFS transporters. Rather than stating these assumptions as fact (VMAT2 functions by alternating access...), it would be preferable to explain why a reader should believe these assumptions. In general, this discussion presents conclusions as established facts rather than proposals that need to be tested experimentally.

      Indeed, the structural details of alternating access in MFS transporters are based on structures of other related proteins and we have cited review articles that describe this (29,30,48). We would like to highlight that these assumptions are not without merit, as previous studies investigating predicted gating residues (the same residues resolved in our structure) were based on studies of other MFS transporters and the demonstrated biochemical results are consistent with an alternating access transporter. These biochemical experiments also clearly demonstrate that a broadly similar mechanism of alternating access is used by VMAT2, see (30-32,48) which we have cited extensively when discussing these mechanisms.

      The MD simulations are not described well enough for a general reader. What is the significance of the different runs? ED Fig. 4d is not high enough resolution to see the details.

      We plan to provide additional experimental details and data to support the computational experiments in a revision. See response to reviewer #3.

      Reviewer #3 (Public Review):

      Summary:

      The vesicular monoamine transporter is a key component in neuronal signaling and is implicated in diseases such as Parkinson's. Understanding of monoamine processing and our ability to target that process therapeutically has been to date provided by structural modeling and extensive biochemical studies. However, structural data is required to establish these findings more firmly.

      Strengths:

      Dalton et al resolved a structure of VMAT2 in the presence of an important inhibitor, tetrabenazine, with the protein in detergent micelles, using cryo-EM and with the aid of domains fused to its N- and C-terminal ends. The resolution of the maps allows clear assignment of the amino acids in the core of the protein. The structure is in good agreement with a wealth of experimental and structural prediction data and provides important insights into the binding site for tetrabenazine and selectivity relative to analogous compounds.

      Weaknesses:

      The authors follow up their structures with molecular dynamics simulations. The simulations resulted in repositioning of the ligand, which does not seem to be well founded, and raises questions about the methodological choices made for the simulations.

      We appreciate the comments of reviewer #3 and thank them for these suggestions regarding the MD simulations. We will be supplying additional information to address the questions of reviewer #2 and #3 regarding the MD simulations including 1) movies which show there is not a substantial repositioning of ligand in any of the three runs 2) a table showing protonation states of residues and TBZ 3) data which shows that the number of waters which enter the binding site is relatively few compared with simulations of dopamine bound VMAT2 4) in run 2, more waters have entered the binding site vs. run 1 and 3 which likely explains why there is a small repositioning of TBZ.

      We will also be providing a substantially improved map in a revised manuscript where the peripheral TMHs and loops are better resolved.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their helpful comments which we have addressed, point-by-point, below:

      Reviewer #1:

      1) It might be useful to add more details to the methods (especially lines 191-196) to make them a bit more user-friendly for an audience who still may be unfamiliar with the relatively new and complex Mendelian randomisation technique.

      The following information has been included in this section of the methods, to describe the different MR models in more detail:

      “The IVW MR model will produce biased effect estimates in the presence of horizontal pleiotropy, i.e. where one or more genetic variant(s) included in the instrument affect the outcome by a pathway other than through the exposure. In the weighted median model, each genetic variant is weighted according to its distance from the median effect of all genetic variants. Thus, the weighted median model will provide an unbiased estimate when at least 50% of the information in an instrument comes from genetic variants that are not horizontally pleiotropic. The weighted mode model uses a similar approach but weights genetic instruments according to the mean effect. In this model, over 50% of the weight of the genetic instrument can be contributed to by genetic variants which are horizontally pleiotropic, but the most common amount of pleiotropy must be zero (known as the Zero Modal Pleiotropy Assumption (ZEMPA))[Hartwig et al., 2017].”

      2) I was just wondering why MR egger was not carried out as part of this analysis?

      We did consider also employing the MR Egger model as a further sensitivity analysis. However, given we were already employing the weighted median and weighted mode models, and given that MR-Egger suffers from reduced statistical power in comparison to the other models, we reasoned that adding in a further MR model would not add further clarity to our analyses, particularly given the relatively small sample size.

      3) Although it is included in Figure 1 flowchart, I think it is also important to explain clearly in the written text way only n=6,118 of n=13,988 children in ALSPAC study were included in this study and the reason for this.

      The following information has been included in the paragraph describing the ALSPAC study in the methods:

      “Sufficient information was available on 6,221 of these individuals to be included in our analysis, as metabolomics was not performed for all individuals in the ALSPAC study.”

      4) It is mentioned within the discussion 'the NMR metabolomics platform utilised in the analyses outlined here has limited coverage of fatty acids'. I think it might be useful to also add this detail into the methods section to aid readers when they are making their own interpretation whilst reading the results section.

      The following sentence has been included in the methods section:

      “This metabolomics platform has limited coverage of fatty acids.”

      5) However, I feel that the conclusion should be tempered slightly as although this study alongside other similar MR studies provides evidence of an association between genetic liability to CRC and levels of metabolites at certain ages, I do not think there is enough evidence at this stage to say that genetic liability for CRC actually alters the levels of metabolites.

      The first sentence of the conclusion has been changed to:

      “Our analysis provides evidence that genetic liability to CRC is associated with altered levels of metabolites at certain ages, some of which may have a causal role in CRC development.”

      Reviewer #2:

      1) The background is lacking introduction to the different components of the metabolic features tested. For instance, there is a broader discussion about polyunsaturated fatty acids (PUFA) in the discussion, however, this should have been introduced and defined already before that. What metabolites are included in that term (PUFA)? Are there other studies on PUFA and CRC?

      The following information has been included in the background section:

      “In particular, previous work has highlighted polyunsaturated fatty acids (PUFA) as potentially having a role in colorectal cancer development. The term PUFA includes omega-3 and -6 fatty acids. Recent MR work has highlighted a possible link between PUFAs, in particular omega 6 PUFAs, and colorectal cancer risk.”

      2) There seem to be indications for horizontal pleiotropy given the changed estimates when genetic variants in the FADS loci are removed. Could multivariable MR methods have been used to account for pleiotropy and differentiate individual fatty acid effects?

      Multivariable MR can be employed to investigate the effects of horizontal pleiotropy. However, the multiple exposures must have sufficiently distinct underlying genetic architecture in order to instrument each one whilst adjusting for the other, as determined by conditional F-statistics. Given the correlations across metabolite levels, this is unlikely to be the case.

      3) The ALSPAC sample sizes are decreasing across the different age groups, which is not strange given the longitudinal collection. However, does the altered sample composition affect the results? Have sensitivity analyses been done on the complete set of individuals from age 8-25?

      The altered sample composition could be affecting results. The limitations section of the discussion has been amended to reflect this:

      “Secondly, mostly due to the longitudinal nature of the ASLAPC study, our sample at each time point is composed of slightly different individuals. This could be influencing our results, and should be taken into account when comparing across time points.”

      We have not completed any sensitivity analyses to investigate this.

      4) Although beyond the scope of this paper, sex-stratified GWAS analyses on metabolites can easily be done in UK Biobank.

      We thank the reviewer for this suggestion, and agree that this would be an interesting future analysis. We have amended the discussion to mention this:

      “Fourthly, our analysis would benefit from being repeated with sex-stratified data. Although such GWAS results for metabolites are not currently available, the data to perform such GWAS are available in UK Biobank for future analyses.”

      5) Very minor, there is a difference in reporting a number of decimals in ALSPAC results. There is also a difference in reporting the units for the results comparing text and figures (per SD higher CRC liability or per doubling). Please include sample sizes and data sources in the figure legends as they should be stand-alone items.

      We have amended the ALSPAC results to all have two decimal places, reporting units have been altered and figure legends to include sample sizes and data sources.

    1. Author Response

      We thank the reviewers for their suggestions. We are confident in the model that predicts odor vs odor (OCT-MCH) preference using calcium activity, but we acknowledge the relative weakness of the model that predicts odor (OCT) vs air preference. We are preparing an updated manuscript that will prioritize our interpretation of the OCT-MCH results and more fully document uncertainties around our estimates of prediction capacity.

      Reviewer #1 (Public Review):

      Summary: The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths: Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses: The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      We are working on a revision that overhauls the interpretations of the results. We recognize that the current version inadequately distinguishes the results that we have high confidence in (specifically, PC2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as the PC1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with r2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that the more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried to in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. We are working on a revision that is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferonni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In the revision we are working on, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision will include confidence limits.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the forthcoming revision will address the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn: i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We are working to guarantee that all such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and are revising the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      -No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure. We are reviewing these results to determine if they warrant including as a negative finding in the revision.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when were only able to image a small portion of the glomeruli. In analyses we did not report, we explored this possibility using the AL computational model. We are likely to include this interpretation in the revised discussion.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements…

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements. I.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states. We are considering those suggestions and anticipate responding to them in the revision.

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we are working to make sure this is appropriately reflected in all word choice across the paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank you for your thoughtful review and constructive feedback on our manuscript. We have implemented numerous revisions throughout the manuscript to address your comments and suggestions. Below, our point-by-point responses to the reviewers' remarks. We hope that our revisions adequately address all raised concerns.

      Reviewer #1

      One major drawback of the manuscript is the fact that the data were collected from male subjects only. One might expect similar behavioral outcomes from male and female rats receiving 2shock and 10-shock training. However, increasing attention to sex as a biological variable has revealed an interesting truth, namely that males and females can engage distinct neural pathways to arrive at the same behavioral destination. It should not be taken for granted that retrieval of aversive contextual associations would reproduce the same networks in females, and, as such, the manuscript does not give a complete accounting of the phenomenon under study.

      We thank the reviewer for highlighting the importance of sex differences in fear memory and for encouraging us to discuss this issue. We agree that males and females can engage different behavioral and circuit mechanisms and that our findings may not be generalizable to female rats. We expanded the discussion section to state this limitation and to suggest future directions for research on sex differences in fear memory:

      “In addition, a growing body of evidence underscores the differences between males and females concerning fear memories (Fleischer and Frick, 2023). Given that our study was conducted only with male rats, future studies exploring sex differences will be instrumental in providing a more complete account of the network-level mechanisms underlying fear memory strength.”

      The aversive associative memories described by the authors are characterized as mild or strong. More analysis of the meaning of memory strength, and its relationship to conditioning parameters, is needed.

      In particular, the authors should discuss issues such as amount of training, US magnitude, and rate of shock delivery. If amount of training is important, would 2 vs 10 presentations of a milder shock produce the same networks at retrieval? Would a larger shock require fewer presentations to isolate amygdalar regions from the rest of the network? If the shocks were presented at the same rate during training, would you get the same result in both groups? More data examining these questions would be ideal, but, in the absence of that, a discussion of these issues is needed and missing from the manuscript in its current form.

      We appreciate the reviewer's feedback on the characterization of the fear memories in our study and agree that the labels "mild" and "strong" could oversimplify the complex nature of fear memories. Our study's main objective was not to delineate how varying conditioning protocols result in 'mild' or 'strong' fear memories, but to employ protocols of different intensities known to produce distinct behaviors, and then discern their brain differences. Our categorization was rooted in the resulting behavioral expressions, classifying 'mild' memories as those triggering sub-maximal fear responses with low generalization and a potential for extinction learning and reconsolidation. Conversely, 'strong' memories were defined by peak or near-peak fear responses, high generalization, and impeded extinction and reconsolidation processes. To isolate the number of foot shocks as the sole variable, we kept both shock intensity and session duration constant. While this decision allowed for a clear comparative analysis, we acknowledge its limitations in exploring other influential factors.

      A more ideal approach would be to reverse this process—first experimenting with several different conditioning parameters and then observing the resulting behaviors and brain mechanisms—but given the additional workload that would entail, particularly when combined with the c-fos and network analyses, we opted for our current approach. Nevertheless, we hope our study will stimulate research that goes deeper into the nuances of fear conditioning protocols, fostering a better understanding of adaptive and maladaptive fear memories. This is now discussed in the discussion session:

      “To generate mild and strong fear memories, we based our conditioning parameters on methods that have shown distinct behavioral outcomes in prior studies (Haubrich et al., 2020, 2015; Holehonnur et al., 2016; Poulos et al., 2016; Wang et al., 2009). To ensure a focused comparative analysis, our conditioning protocols differed only in the number of foot shocks, and maintained consistent shock intensities and session durations. Yet, the number of shocks is not the only factors that can affect the strength of fear memories (Gazarini et al., 2023). Other conditioning parameters, such as shock intensity, its predictability, and inter-shock intervals, can also play crucial roles. Moreover, different fear measures like freezing behavior, fear-potentiated startle, and inhibitory avoidance might manifest differently following varying conditioning protocols, adding another layer of complexity. A comprehensive understanding of fear memory strength will benefit from further studies scrutinizing these parameters and memory attributes.”

      Reviewer #2

      One alternative account to the weak vs. strong memory distinction made in the paper is the opportunity for extinction in the 2S compared to the 10S group. In the 2S group, the last shock occurs in the 3rd minute, leaving 9 minutes of context exposure without reinforcement to follow. This is not the case for the 10S group. If context fear extinction is engaged during this time, then this would mean that two memories (acquisition and extinction) are taking place in the 2S group, weakening the fear memory in this group, setting up the ground for stronger effects of extinction, less generalization and of course potential greater connectivity required for representing and linking these memories. Indeed, the IL, a brain area linked to extinction, is more predominant in the connectivity map of the 2S compared to the 10S group. While testing this alternative is beyond the scope of this paper, it will need to be discussed.

      We thank the reviewer for raising this interesting point. We agree that the structure of the 2S protocol might inadvertently provide an opportunity for within-session extinction. However, we would like to clarify that we made a mistake in the description of the 2S training protocol. The timing of the shock deliveries was not at the second and third minutes as stated (a usual protocol in the literature), but at the sixth and seventh minutes. We apologize for this mistake and are thankful for your help in identifying this discrepancy which had unfortunately persisted despite multiple proofreading rounds. We have amended this detail in the methods section of our manuscript.

      Nevertheless, we recognize that the subsequent minutes post-shock in the 2S group may still provide a window for potential extinction. To address this possibility, we scored the freezing expression during the training session minute by minute. In the 2S group, two videos were corrupted, and it was only possible to score freezing in six out of eight animals (this is acknowledged in the methods section). As presented in Figure 1.A (middle plot), freezing behavior increased post-shocks and showed no decline towards the session's end. These findings suggest that within-session extinction did not occur during our conditioning session. This analysis is now integrated into the relevant results subsection.

      Methodological detail is lacking re the timeline of study, and connectivity analyses.

      Thank you for your feedback. The formula for the discrimination index is now explained in the methods section. The new plot showing freezing behavior during training shows the exact time bin when shocks were delivered. We have expanded the description of the connectivity analysis.

      Reviewer #3

      Major concerns)

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this insightful observation. We believe that the absence of the expected increase in hippocampal c-fos activation is due to the unique experimental design employed for our control group. In our study, control rats were subjected to an equivalent duration of context exposure without receiving shocks. As a result, these animals formed and retrieved a neutral, rather than fearful, contextual memory. This likely elevated cfos levels in the hippocampus in comparison to the more traditional home-cage condition frequently used in earlier studies. We used the NS (no shock) protocol for our control group to specifically elucidate the impact of the number of shock presentations on memory formation, therefore the context exposure was kept the same across groups. Importantly, this aspect did not affect our connectivity analysis, since it is influenced by the relative variance across structures than on the absolute magnitude of c-fos expression. We now emphasize the nature of our control group in the discussion:

      “Importantly, our control animals were exposed to the conditioning chamber for an equivalent duration without being subjected to shocks, thus encoding and recalling a non-fearful contextual memory.”

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DREADD and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      We appreciate the reviewer's perspective and acknowledge the limitations of our current findings. While our data based on c-fos expression suggests functional connections reflective of neural activity during fear memory recall, we agree that it is not possible to deduce causality from this alone. Instead, our study aimed to uncover the network-level distinctions between mild and strong memories, laying the groundwork for subsequent, in-depth investigations of the causal relationships within these identified pathways. We agree that corroborating our findings with interventional experiments, such as using DREADDs, is an important next step. We also agree that such experiments would enhance our study and hope future research will address these points. These points were included in the discussion session:

      “To further elucidate the underlying mechanisms of fear memory strength in vivo, understanding the specific roles of individual network elements in fear regulation becomes essential. Future research will be important to probe the causal interplay among distinct nodes and edges, both individually and in combination, in shaping diverse aspects of fear expression.”

      Reviewer #2 (Recommendations For The Authors):

      Methodological detail is lacking:

      How is the discrimination index calculated?

      We have included this information in the methods section: “The generalization index was calculated as Freezing in Test B / (Freezing in Test A + Freezing in Test B).”

      A distinction between complete spontaneous recovery (10S group) vs. partial spontaneous recovery (2S group) vs. extinction retention needs to be considered in discussing the extinction data.

      Thank you for this suggestion. To address this point, we now include Tukey’s post hoc comparisons between the first and last bins of extinction and the test session. The results show that in the 2S group, freezing during test remained consistent with the levels observed in the final extinction bin and was lower than the levels in the initial extinction bin. Conversely, in the 10S group, freezing levels increased from the final extinction bin to the test, reaching levels comparable to those observed in the initial extinction bin.

      Detail regarding the connectivity analyses is missing from the methods. For example the calculation of the r value distractions should be detailed in the methods not just the results, more detail regarding calculations is needed for the degree of centrality, betweenness centrality, nodal efficiency, small world analyses etc.

      We appreciate the reviewer’s feedback. We have expanded the description of the connectivity analysis.

      Justification for 'excluding edges with r values lower than the average plus one standard deviation of all 292 networks (Figure 4.B; r < 0.61)' is needed.

      Thank you for your encouraging us to elaborate on the rationale behind our thresholding method. We acknowledge that there is no consensus in the literature on the optimal thresholding method for functional networks. Our primary objective with thresholding was to retain the most robust connections while minimizing potential noise from weakly correlated regions. Instead of opting for an arbitrary threshold, we determined our cut-off based on the average plus one standard deviation across all networks. Theoretically, this retains approximately the top 16% of connections. Given our 12 regions of interest, this translates to roughly 10 connections per network. This count is sufficient for a nuanced analysis of the network structures and between group comparisons.Importantly, our method inherently accounts for variations in interregional correlations across groups. Groups with a distribution skewed towards higher r values will naturally have more edges, highlighting the enhanced synchronized activity between certain regions. On the other hand, networks with tendencies towards lower r-values will exhibit fewer connections. Thus, our thresholding method is rooted in the data’s distribution and result in networks that reflect the differences across groups.

      We added the following sentence to the methods session summarizing this rationale:

      “This thresholding approach was used to provide a cut-off based on the data’s inherent distribution, therefore retaining the top edges according to the data variance. “

      Line 81 - 'brain areas' is missing after '12'.

      Thank you, this is now fixed.

      Tile for 2. is somewhat odd. Thought the following may be better, but obviously leaving this up to the author's discretion: 'Commonalities and differences in brain activation induced by recall of mild and strong fear memories'

      Thank you for this suggestion. We agree with the title suggested by the reviewer, and it was replaced in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Previous studies including Karim's lab have shown that protein synthesis in the hippocampus is required for the reconsolidation of contextual fear memory and that the retrieval of contextual fear memory activates gene expression such as c-fos in the hippocampus. However, the authors failed to confirm this observation. This may be due to the small number of rats or some technical problems.

      Thank you for this suggestion. As explained above, we believe that this is due to the nature of our control group, which is now highlighted in the discussion section.

      2) The author's computation analyses suggested differences in neural networks activated by the retrieval of mild and strong fear memories. The results of computer analysis are interesting. However, it is not clear whether such results are actually occurring in vivo. At this moment, the author's findings are not a conclusion, but rather a suggestion or hypothesis. Therefore, it is also important to conduct interventional experiments to evaluate the validity of the authors' findings. Specifically, the authors' results could be validated by analyzing the effects of inhibition of specific brain regions on mild and strong fear memories retrieval using such as DRRED and other methods. These experiments seem hard, but would greatly improve the quality of the manuscript.

      Thank you for your valuable feedback. As explained above, these points are now included in the discussion section.

      Minor comments)

      1) cfos should be c-fos or c-Fos.

      Thank you for your correction. All instances of ‘cfos’ were replaced by ‘c-fos’.

      2) Line 275; "Compared to the to re-exposure to" should be "Compared to the to re-exposure to".

      Thank you for your correction. This is now fixed.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Comment. “The manuscript demonstrates that FGF4, FGF8, and FGF9 exhibit distinct binding modes towards FGFRs”

      No, this paper is not about ligand binding, and there are NO binding data in the manuscript. This paper is about ligand-dependent functional bias. Previously, differential effects of ligands on the signaling of one FGFR have been attributed to differences in ligand binding, but that paradigm is incomplete, if not incorrect. This manuscript is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). The bias we report here cannot be the result of differences in ligand binding. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Our article thus changes the current paradigm about how FGF ligands activate FGFR signaling.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics.

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). We calculate bias coefficients, and we analyze the results using statistical tools.

      Comment. …“Unproven and speculative structural differences in the FGF-FGFR1 dimers”.

      This statement is not correct, as it is directly contradicted by the differences reported in Figure 6. This Figure presents the results of a quantitative FRET assay performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured differences in FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when FGF8 is bound to the extracellular domain of FGFR1, as compared to FGF4 and FGF9. The difference can be observed in the raw FRET data in Figure 6A. While these data do not reveal the exact molecular origin of the structural differences, they unequivocally prove that there are structural differences when different ligands are bound.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.


      The following is the authors’ response to the previous reviews.

      eLife assessment. This manuscript describes useful data on the mechanisms underlying the activation of the receptor tyrosine kinase FGFR1 and stimulation of intracellular signaling pathways in response to FGF4, FGF8, or FGF9 binding to the extracellular domain of FGFR1. Solid quantitative binding experiments are presented to demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      No, this paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. This is the first demonstration that three FGF ligands induce bias in FGFR1 signaling. Thus far, differential effects in the signaling of one FGFR have been attributed to differences in ligand binding, but this current paradigm is incomplete/incorrect. Our article changes the current paradigm in how FGF activate downstream FGFR signaling.

      We have clarified this point by adding the following text in the Discussion.

      "Thus far, differential effects in the signaling of one FGFR in response to different FGF ligands have been attributed to differences in ligand binding. It can be reasoned, however, that differences in ligand binding strengths, alone, cannot explain differential signaling. Indeed, if the differences between ligands are only in the binding strength, then a strongly binding ligand at low concentration will act identically to weakly binding ligand at high concentration. Here we discovered, using tools that are novel for the RTK field, that there are qualitative differences in the actions of the ligands. FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and collagen loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and growth arrest). These effects occur in addition to previously measured differences in ligand binding coefficients (87).”

      We have also re-written the abstract.

      “Abstract

      “The mechanism of differential signaling of multiple FGF ligands through a single FGF receptor is poorly understood. Here, we use biophysical tools to quantify multiple aspects of FGFR1 signaling in response to FGF4, FGF8 and FGF9: potency, efficacy, bias, ligand-induced oligomerization and downregulation, and conformation of the active FGFR1 dimers. We find that the three ligands exhibit distinctly different potencies and efficacies for inducing responses in cells. We further discover qualitative differences in the actions of the three FGFs through FGFR1, as FGF8 preferentially activates some of the probed downstream responses (FRS2 phosphorylation and extracellular matrix loss), while FGF4 and FGF9 preferentially activate different probed responses (FGFR1 phosphorylation and cell growth arrest). Thus, FGF8 is a biased ligand, when compared to FGF4 and FGF9. Förster resonance energy transfer experiments reveal a correlation between biased signaling and the conformation of the FGFR1 transmembrane domain dimer. Our findings expand the mechanistic understanding of FGF signaling during development and bring the poorly understood concept of receptor tyrosine kinase ligand bias into the spotlight.”

      Reviewer #1 (Public Review):

      Comment. Quantitative binding experiments presented in the manuscript demonstrate that FGF4, FGF8, and FGF9 exhibit distinct binding affinities towards FGFRs.

      This paper is not about binding, and there is NO binding data in the manuscript. This paper is about function. Please see our response to the Elife assessment.

      Comment. It is also proposed that FGF8 exhibits "biased ligand" characteristics that is manifested via binding and activation FGFR1 mediated by "structural differences in the FGF- FGFR1 dimers, which impact the interactions of the FGFR1 transmembrane helices, leading to differential recruitment and activation of the downstream signaling adapter FRS2".

      We do not “propose” the existence of ligand bias, we demonstrate it in the manuscript by following the latest IUPHAR community guidelines on bias identification and quantification (Kolb et al, 2022). Specifically, we construct bias plots, we calculate bias coefficients, and we analyze the results using statistical tools.

      Also, please note that ligand bias has no direct connection to binding strength, so the statement that biased ligand characteristics “is manifested via binding” is not correct.

      Comment. In the absence of any structural experimental data of different forms of FGFR dimers stimulated by FGF ligands the model presents in the manuscript is speculative and misleading.

      Figure 6 presents the “structural experimental data”. A quantitative FRET assay is performed at high ligand concentration, which ensures that there are no monomeric receptors. Under these conditions, the measured FRET efficiency depends only on the dimer conformation. The measured FRET efficiencies reveal distinct differences in the FGFR1 TM domain dimer conformations when the ligand FGF8 is bound to the extracellular domain of FGFR1, as compared to the cases of FGF4 and FGF8.

      Because the Rosetta modeling of the kinase domains in the previous version of the paper is not based on experimental data, we have removed the modeling from the Results, and we have removed all references to it in the Discussion. Thus, all that is shown and discussed in the revised paper is based on experimental data.

      We have substituted two paragraphs in the discussion with the following two sentences:

      “The experimental data in Figure 6 hint at the possibility that ligand bias arises due to differences in FGFR1 dimer conformations. If this is so, then conformational differences in the signaling complex in the plasma membrane underlie biased signaling for both RTKs and GPCRs, the two largest receptor families in the human genome”.

      References

      Kolb P, Kenakin T, Alexander SPH, Bermudez M, et al. Community guidelines for GPCR ligand bias: IUPHAR review 32. Br J Pharmacol. 2022;179, 3651-3674.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank all the reviewers for their comments and constructive feedback regarding our manuscript. We have made many changes to strengthen the manuscript, including addition of two new experiments (presented in Fig. S1) that help to clarify the nature and scope of activation of late response genes in striatal neurons. Our specific responses to individual reviewer comments are provided below.

      Reviewer #1

      Public review

      Weaknesses: The timing and the location of the accessibility changes are meaningfully different from other similar studies, which should be discussed. The authors provide good data for the function of a single enhancer near Pdyn, but could contextualize this with respect to other regulatory elements nearby.

      In the revised manuscript, we have expanded our discussion of the differences between chromatin accessibility changes observed in this study and those found in prior reports in different systems. These differences are also addressed in extended detail below. Unfortunately, limitations on resources and time prevented a deeper exploration of additional candidate enhancers near the Pdyn locus. However, we believe our efforts to characterize an activity-dependent enhancer in the Pdyn locus provides a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      Recommendations For The Authors

      1) At 1hr after stimulation in previous papers (Su 2017 which is reference #8 of FernandezAlbert Nat Neurosci. 2019 October ; 22(10): 1718-1730.) there are large increases in accessibility directly over the IEGs, consistent with the concerted transcription of these genes following stimulation. It is surprising that the authors do not see this here, either at 1hr or at 4hr. This difference in results needs to be addressed.

      We thank the reviewer for bringing this discrepancy to our attention. Indeed, Su et al. 2017 and Fernandez-Albert et al. 2019 both describe increases in chromatin accessibility at IEG promoters. There are several experimental differences that could be contributing to differences between our study and previously published studies. Two major reasons include the developmental timepoint of the tissue/cells and the cell type/brain region that is being assayed. Su et al. assayed chromatin accessibility in ex vivo slices containing the dentate gyrus from adult mice, while Fernandez-Albert et al. assayed chromatin accessibility in forebrain principal neurons of adult mice following kainic acid injection. Bulk ATAC-Seq experiments described in the present manuscript were generated from cultured embryonic rat striatal neurons. Additionally, baseline chromatin accessibility seems to be significantly different between forebrain principal neurons studied in Fernandez-Albert et al. 2019 and the current study. For example, in Figure 3a of Fernandez-Albert et al. 2019, the Npas4 gene body is not accessible in a saline treated animal. In vehicle treated, cultured embryonic rat striatal neurons, the Fos gene body and associated enhancers are accessible at baseline (Fig. S3), and do not increase with KCl depolarization.

      We have expanded our discussion of this discrepancy in the discussion section of the revised manuscript, and included additional citations addressing this difference.

      2) It is also somewhat surprising that the authors see almost no regions that show changes in accessibility at 1hr and then a very large number of differentially accessible regions at 4hr. This is quite different from the more rapid changes shown for example in Figure 7f in the human GABA neurons even though these are also studies in culture with rapid calcium channel opening. Can the authors speculate on the reason for the difference?

      Many previously published studies that use cultured neurons include a pre-treatment in which spontaneous neuronal activity is inhibited with the sodium channel blocker tetrodotoxin (SanchezPriego et al. Cell Reports, 2022; Kim et al. Nature, 2010; Malik et al. Nature Neuroscience, 2014). The Sanchez-Priego et al. Cell Reports manuscript also blocked NMDA receptor activity with the competitive NMDAR antagonist D-AP5 for 12 hours prior to depolarization. Rapid changes in chromatin accessibility observed in other studies at <1 hour timepoints could be due to prior silencing of the cells and subsequent reduction in the accessibility and transcriptional activity of IEGs. Decreased baseline accessibility and transcriptional activity of IEGs can be observed in Figure 1a of Malik et al. 2014, which displays ChIP-Seq tracks for both RNA pol II and H3K27ac. At baseline, H3K27ac and RNA pol II enrichment is low throughout the Fos locus. Subsequent depolarization of silenced neurons drives accessibility and transcription of the Fos gene and associated enhancers. In contrast, we found accessible chromatin at Fos enhancer elements at baseline (without stimulation; Fig. S3).

      The experiments described in the current study do not include any pre-treatment with tetrodotoxin or D-AP5, and thus the neurons are expected to be spontaneously active. This baseline electrophysiological activity may result in increased accessibility and transcription at IEG loci, which ultimately makes it difficult to identify activity-dependent increases in IEG accessibility at timepoints <1 hour. Furthermore, a previously published manuscript from our lab (Carullo et al. Nucleic Acids Research, 2020) conducted ATAC-seq on cultured embryonic rat cortical, hippocampal, and striatal neurons and found that transcribed enhancers for IEG loci (including Fos) had decreased chromatin accessibility following depolarization when compared to vehicle treatment. These differences in experimental design (including cell type, model organism, developmental timepoint, and treatment paradigm) may all contribute to differences in the temporal dynamics of chromatin remodeling between the current manuscript and previously published studies.

      3) Experimentally it can be challenging to repress a single enhancer and show a significant effect on gene regulation which makes the repression in Fig 6c somewhat unexpected. There are several regions near Pdyn that show activity-dependent changes in accessibility in the human cells (Fig. 7e) and presumably in the rat neurons too (Fig. 5a shows a few but most of the intervening region is cut out). Did the authors target any of these other regions?

      We chose the identified regulatory element upstream of the Pdyn TSS because it met several criteria that we determined are important for characterizing LRG enhancers. These criteria are outlined in the Results: “1) located in non-coding regions of the genome, 2) inaccessible at baseline and accessible following depolarization, and 3) inaccessible when depolarization was paired with protein synthesis inhibition.” Indeed, ATAC-seq experiments presented in the current study demonstrate that thousands of genomic regions undergo reprogramming, and many of these regions meet these criteria (including additional loci near Pdyn). However, we lacked the time and resources to systematically investigate all other enhancers, and did not target any other regions within the Pdyn locus. While many enhancers may regulate a single gene, the identified enhancer seems to be particularly important for activity-dependent Pdyn gene expression. Importantly, CRISPRi-based repression of this enhancer (Fig. 6c) did not reduce basal Pdyn expression as compared to a non-targeting control, but completely blocked stimulus-dependent induction of Pdyn transcription. We believe this is a useful starting point, and future studies may seek to more completely define the contributions of nearby regulatory elements.

      4) The authors should clarify in the methods or figure legends the number of independent replicate libraries for each experiment and were the RNA and ATAC libraries made from the same or different experiments.

      We thank the reviewer for bringing this to our attention. We have clarified the number of replicates in the methods as outlined below. Additionally, RNA and ATAC libraries were generated from different experiments, and this information is also now included in the methods.

      Within the ATAC-Seq library preparation and analysis methods section: “ATAC-seq libraries were generated from experiments independent of the RNA-seq experiments. For the ATAC-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 3 replicates within each treatment group (3 Veh, 3 KCl). For the ATAC-seq experiment of neurons pre-treated with DMSO or Anisomycin, there were 4 replicates within each treatment group (4 DMSO + Veh, 4 DMSO + KCl, 4 Anisomycin + KCl).”

      Within the RNA-seq library preparation and analysis methods section: “RNA-seq libraries were generated from experiments independent of the ATAC-seq experiments. For the RNA-seq experiment of neurons treated with vehicle or KCl for 1 h, there were 3 replicates within the KCl group and 4 replicates within the vehicle group. For the RNA-seq experiment of neurons treated with vehicle or KCl for 4 h, there were 4 replicates within each group (4 Veh, 4 KCl).”

      Reviewer #2

      Public review

      First of all, at a conceptual level, most of the findings related to the induction of particular transcriptional programs upon neuronal activation the changes in chromatin state, and the need for protein translation for proper induction of LRGs have been broadly characterized previously in the literature (Tyssowski et al., Neuron, 2018; Ibarra et al., Mol. Syst. Biol., 2022; and also reviewed by Yap and Greenberg, Neuron, 2018). In addition, it is not so obvious why to focus on Pdyn gene regulatory regions among the thousands of genes upregulated and with modified chromatin landscape after neuronal activation. The authors highlight three particular traits of this gene as the reason to choose it, but those traits are probably shared by most of the genes that are part of the LRGs set.

      We thank the reviewer for these comments, and have included these important publications as citations in our manuscript. With over 5,000 differentially accessible chromatin regions following KCl stimulation, it was not possible to follow up on all regulatory regions or linked genes in a rigorous way. Therefore, we selected a target candidate enhancer near the Pdyn locus for mechanistic studies. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of this gene, and makes our results applicable to a broader pre-existing literature.

      At the methodological level, some attention should be put into the timings chosen for generating the data. The authors claim that these time points (1h and 4hrs) identify the first (i.e IEGs) and second (i.e LRGs) waves of transcription. However, at 4hrs the highest over-expressed genes are still IEGs, as shown in the volcano plots of Figure 1B and 1C, showing a high overlap with up-regulated genes found at 1h (Figure 1D). This might suggest that the 4hrs time point is somewhere in between the first and second wave of transcription, probably missing some of the still-to-be-induced LRGs of the latest one.

      Given that the depolarization applied in RNA-seq and ATAC-seq experiments is continuous, it was not unexpected to find IEGs present at both 1 h and 4 h timepoints. The revised manuscript contains a new experiment (Fig. S1d-f) demonstrating that a shorter depolarization period (1 h KCl followed by a 3 h wash off period) also induces Fos mRNA, but to a much lower extent than 4 h continuous stimulation. In contrast, both short (1 h) and long (4 h) depolarization periods induce Pdyn to equivalent levels when measured at 4 h after the onset of the stimulus. These experiments support our conclusion that LRGs require a temporal delay, and not simply extended stimulation. Nevertheless, the reviewer is correct that a 4 h timepoint may potentially miss some LRGs that are induced even later. We plan to explore the full timecourse of LRG induction in future studies.

      Finally, while only prosed as a suggestion, the assumption that from the data generated in this article, we can envision a mechanism by which AP-1 family of transcription factors interacts with the SWI/SNF chromatin remodeling complex is going too far, as no evidence is provided implicated SWI/SNF in the data presented in the manuscript.

      While speculative in the current context, we felt that it was important to highlight this prior literature to identify potential mechanisms that may link IEGs (specifically, AP-1 members) to chromatin remodeling machinery. We have altered this section of the discussion to emphasize that this link is speculative in the context of neuronal chromatin remodeling.

      Recommendations For The Authors

      1) I couldn't find the number of replicates generated for each dataset, neither for RNA nor for ATAC-seq. It could be worth adding these data to the figure legends or in the material and methods.

      We thank the reviewer for bringing this to our attention. The number of replicates generated for each dataset are now included in the methods section (see response to Reviewer #1, comment #4 above).

      2) In Figure 1D, Gene Ontology terms appear significant only for each of the individual datasets. While this might be expected for the 1h time-point, the 4hrs time-point comprises a big extent of the genes up-regulated at 1h as well, and it is surprising no term related to chromatin or transcription regulation appears as significant. Is this due to the fact that the analysis has been conducted with two separated lists of genes and only the top terms are shown without crossing the data? This could be misleading for the reader and maybe a comparative GO term analysis might be better such as using CluterProfiler or similar tools, that might allow for real comparison of terms enriched in each dataset.

      We thank the reviewer for pointing this out. For Figure 1d, GO term analysis was conducted with two separated gene lists, each consisting of timepoint-specific upregulated DEGs. Thus, 772 genes were included for the analysis of 4 h GO terms and 39 genes were included for the analysis of 1 h GO terms. Previously, comparisons of cellular component GO terms included in the current study only included the top 10 GO terms. The revised manuscript contains an updated analysis that compares all enriched GO terms and identifies that three of the top 10 cellular component GO terms for the 1 h gene set are also identified as significantly enriched in the 4 h gene set. We have revised the graph in Fig. 1f to reflect this updated analysis. Overall, our conclusions (that 1 h and 4 h DEG sets fall into distinct functional categories) remains supported by this analysis.

      3) In Figure 3D, the graphs show the density of motifs within the DARs in units of "Motifs/Kb/peak" while the x-axis represents the peaks coordinates from -500bp to +500bp. It is not clear to me how this graph is generated and how within 1000bp the profiles can reach values of 18-20 Motifs/Kb/peak. Could this be clarified?

      The motif enrichment score was calculated by identifying the number of total motifs within defined 50bp genomic bins surrounding the center of the DAR regions. HOMER builds enrichment histograms that normalize motif presence to set size (e.g., number of peaks or DARs), and also to genomic space (base pairs). While HOMER’s default histogram represents motifs/bp/peak, we converted this to motifs/kb/peak for ease of interpretation. However, to avoid confusion we have returned the y axis labels to the default HOMER settings (motifs/bp/peak). The normalization and units for this graph have been clarified in the methods section.

      4) In Figure 4C the newly generated ATAC-seq data is just "targeted" analyzed, showing global tendencies are maintained between the initial generated data and this one. It could be interesting, however, to see the number of DARs obtained in these conditions, especially to see if some DARs are observed in the Anisomycin condition that might be translation-independent.

      The experiment described in Figure 4 was designed to both validate the 5,312 DARs and understand the role of protein translation in activity-dependent chromatin remodeling. One way to begin identifying translation-independent DARs is to compare the DMSO + Vehicle group to the Anisomycin + KCl group. With this comparison, any 4 h DAR that has increased accessibility in the Anisomycin + KCl group may be translation-independent as pretreatment with anisomycin did not prevent chromatin remodeling. After conducting this analysis, we identified a very small percentage (3.44%) of 5,312 4 h DARs that still exhibited significantly increased accessibility when pre-treated with Anisomycin. This small number is consistent with the robust effects of anisomycin on KCl-dependent remodeling shown in Fig. 4c-d. However, to confirm that these were in fact translation-independent activity-regulated DARs, we would need to perform direct comparison of chromatin accessibility between neurons pre-treated with Anisomycin and then treated with either vehicle or KCl. Since we did not include an anisomycin only group in experiments in Fig. 4, we cannot confidently claim whether this 3.4% of DARs are translationindependent. Nevertheless, we agree with the reviewer that this is an interesting avenue of future exploration.

      Reviewer #3

      Public review

      1) Throughout the paper, the authors emphasize a "temporal decoupling" of transcriptional and chromatin response to depolarization, based on a lack of significant chromatin changes at 1h, despite IEG transcription. However, previous publications show significant chromatin remodeling at 1h (e.g. Su et al., NN 2017 in adult dentate gyrus) or 2h (Kim et al., Nature 2010; Malik et al., NN 2014 in cultured embryonic cortical neurons). The discussion briefly mentions this contrast, but it remains difficult to conclude decisively whether there is temporal decoupling when such decoupling is not found consistently. If one is to make broad conclusions about basic neural chromatin response to depolarization, it would be ideal to know under which conditions there is temporal decoupling, or if this is a region-specific phenomenon.

      Indeed, prior studies referred to in our manuscript have identified chromatin remodeling at earlier timepoints than we identified here. As addressed above (Reviewer #1, comments 1 & 2), it is possible that this discrepancy arises due to the difference in experimental model system, differences in the type of stimulation applied, pretreatment protocols used to silence neurons prior to activation, or even differences in developmental stage. Differences in each of these parameters make it difficult to make straightforward comparisons between datasets and results in this manuscript. It is possible that other cell types induce IEGs more quickly (or more robustly) in response to stimulation, which could lead to earlier chromatin remodeling. However, the common patterns of chromatin reorganization (e.g., the fact that changes are enriched at AP-1 motifs and are found in intergenic regions at putative enhancers) lend support for the idea that the transcriptional waves identified here can also be found in other cell types and in other contexts.

      2) The UMAP analysis is a novel way to probe transcription factor enrichment, but it's unclear what this is actually showing. The authors sought to ask whether "DARs could be separated based on transcription factor motifs in these regions." However, the motifs present in any genomic stretch are fixed based on genomic sequence, so it seems like this analysis might be asking whether certain motifs are more likely to be physically clustered together in the genome, in activity-regulated regions (rather than certain transcription factors acting in concert, as is implied in discussion). While still potentially interesting, this analysis does not seem to give much additional insight into activity-dependent chromatin remodeling beyond the motif enrichment analysis already performed. Nevertheless, to draw stronger conclusions, it would be necessary to compare clustering to a random set of genomic regions of the same length/size to interpret the clustering here. It would also be useful to know whether the ISL1 motif is also enriched in ubiquitously accessible genomic regions in the striatum (and not just DARs).

      We agree that additional analysis is needed to explore enrichment of various transcription factor motifs and activity at differently accessible regions of the genome. The motif enrichment analysis in Figure 3 demonstrated the types of motifs that were enriched in DARs (Fig. 3a-c), the overall degree of enrichment (Fig. 3c), and the distribution of those motifs across DAR sites (Fig. 3d). This analysis allowed us to understand whether motifs for cell-defining transcription factors like ISL1 are enriched uniquely in DARs, or are also found in other regions that are accessible at baseline (see direct comparisons between vehicle/baseline peaks and DARs in Fig. 3d). However, these approaches represent enrichment across all DARs as group, and do not show TF presence/absence at any specific DAR. The UMAP analysis presented in Figure 3e allowed identification of DAR clusters based on the presence or absence of specific transcription factor motifs, and allowed us to represent specific DARs in a reduced two-dimensional space. Because this analysis identifies the existence of distinct motifs within single DARs, it allowed us to speculate as to the possibility of transcription factor cooperation within DARs, or the meaning of DAR clusters that appear to be defined by specific motifs (e.g., KLF10 in Fig. 3f). Given the information that this adds to the initial analyses, we argue that its inclusion in the manuscript is useful and potentially informative for generating follow-up hypotheses.

      3) The authors identify late-response gene enhancers by 3 criteria. However, only Pdyn was highlighted thereafter. How many putative DARs met these three criteria in striatum? Only Pdyn?

      As illustrated in Figures 2 and 4, nearly all of the DARs in our dataset met these criteria, which included presence in non-coding genomic regions, increase in accessibility following stimulation, and prevention of chromatin accessibility changes by protein synthesis inhibition. We did not mean to indicate that the Pdyn locus was unique in this way. In addition to the criteria highlighted in the manuscript, we chose this locus due to decades of literature establishing the importance of prodynorphin in the striatum, and the role of this gene in human neuropsychiatric diseases. We would argue that this increases the relevance of more detailed exploration of the regulator mechanisms that control expression of this gene, and makes our results applicable to a broader pre-existing literature. The revised manuscript includes additional experiments that examine Pdyn expression changes in response to different stimuli, which help to justify the focus on this gene from the beginning of the manuscript.

      Recommendations For The Authors

      1) Figure 1 volcano plots show a scatter primarily in the up-regulated portion at both the 1-h and 4-h time points. However, the Venn diagrams show largely similar numbers of up- and downregulated genes at the 4-h time point. Is the clustering of down-regulated genes tighter/more overlapping? If so, semi-translucent volcano dots or some acknowledgment of the visual discrepancy would be useful.

      We thank the reviewer for bringing this to our attention. Down-regulated genes are clustering tighter on the volcano plot due to smaller fold changes. This visual discrepancy is acknowledged by the numeric indicators of up- and down-regulated genes in the upper left-hand corner of the volcano plot.

      2) Methods for RNA and ATAC seq analysis align to human genome Hg38, rather than rat?

      RNA- and ATAC-Seq analyses from rat neurons were aligned to the mRatBn7.2/Rn7 rat genome. RNA- and ATAC-Seq analyses from human neurons were aligned to the Hg38 human genome. We have updated the methods to make this clear.

      3) The introduction states that different classes of neurons induce distinct LRGs. Please add a citation. Citations are also needed for the last statement WRT consequences of chromatin remodeling near LRGs not being concretely linked to LRG transcription.

      We thank the reviewer for pointing this out. The revised manuscript now includes additional citations supporting each of these statements.

      4) Specify somewhere in Methods that DEGs were compared to vehicle for both 1-h and 4-h (and not 4 vs 1 h).

      We thank the reviewer for bringing this to our attention. We have updated the methods to include: “DEGs were calculated by comparing the KCl and Vehicle treatment groups at each respective timepoint.”

      5) In Figure 2E, why are the enrichments exactly opposite, especially given these are two different types of input (all baseline peaks vs DARs)?

      Odds ratios were calculated by comparing baseline peaks (i.e., ATAC-seq peaks identified in vehicle treated cells) to KCl-induced DARs. This allowed us to identify the enrichment of DARs in specific genomic annotations in comparison to the genomic features that are accessible at baseline, rather than making comparisons to random probe sets or genomic space dedicated to these distinct annotations. This analysis identified that relative to baseline peaks, DARs are significantly depleted in coding regions of the genome and enriched in non-coding regions of the genome. However, given this analysis we agree that it does not make sense to graph both the vehicle (baseline) and DARs on this graph, given that enrichment of each set is determined relative to the other (creating the reciprocal enrichment in this panel). We have updated Fig. 2e to only include points for 4 h DARs.

      6) Some references are off. One that I noted was "...chromatin remodeling in the mouse dentate gyrus following 1 h of electricoconvulsive stimulation" should be Su et al 2017 not Malik 2014. For the statement that IEGs are critical regulators of non-neuronal IEGs, the authors may want to add Hrvatin 2017 ref.

      We thank the reviewer for bringing this to our attention. We have revised the manuscript to include the correct citation for this claim, and also to incude the Hrvatin, et al reference.

      7) It would be helpful for the authors to write out the whole gene name for Pdyn somewhere.

      We have updated the text to include the gene name for Pdyn, both in the abstract and also in the introduction of the manuscript.

      8) Figure 5f: For ease, please include what is high vs low in the figure caption in addition to the main text.

      We thank the reviewer for bringing this to our attention. We have updated the figure caption and main text to include what is high vs low in Pseudotime estimates in Fig. 5f.

      9) How are the tracks ordered in Fig8c?

      Tracks within Fig. 8c demonstrate snATAC-seq signal at the Pdyn gene locus for transcriptionally distinct cell types within the NAc. The tracks are ordered by cluster size (nuclei number) in the snATAC-seq dataset.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes a structural analysis of the tripartite HipBST toxin-antitoxin (TA) system, which is related to the canonical two-component HipBA system composed of the HipA serine-threonine kinase toxin and the HipB antitoxin. The crystal structure of the kinase-inactive HipBST complex of the Enteropathogenic E. coli O127:H6 was solved and revealed that HipBST forms a hetero-hexameric complex composed of a dimer of HipBST heterotrimers that interact via the HipB subunit. The HipS antitoxin shows a structural resemblance to HipA N-terminal region and the HipT toxin represents to the core kinase domain of HipA, indicating that in HipBST the hipA toxin gene was likely split in two genes, namely hipS and hipT.

      -The structure also reveals a conserved and essential Trp residue within the HipS antitoxin, which likely prevents the conserved "Gly-rich loop" of HipT from adopting an inward conformation needed for ATP binding. This work also shows that the regulating Gly-rich loop of the HipT toxin contains conserved phosphoserine residues essential for HipT toxicity that are key players within the HipT active site interacting network and which likely control antitoxin binding and/or activity.

      Strengths:

      The manuscript is well written and the experimental work well executed. It shows that major features of the classical two-component HipAB TA system have somehow been rerouted in the case of the tripartite HipBST. This includes the N-terminal domain of the HipA toxin, which now functions as bona fide antitoxin, and the partly relegated HipB antitoxin, which could only function as a transcription regulator. In addition, this work shows a new mode of inhibition of a kinase toxin and highlights the impact of the phosphorylation state of key toxin residues in controlling the activity of the antitoxin.

      Weaknesses:

      A major weakness of this work is the lack of data concerning the role of HipB, which likely does not act as an antitoxin. Does it act as a transcriptional regulator of the hipBST operon and to what extent both HipS and HipT contribute to such regulation? These are still open questions.

      We thank the reviewer for their feedback and have included a supplementary figure (Figure 1 supplement 2) and accompanying text that shows the transcriptional role of HipB, and how HipS and HipT influence this regulatory effect.

      In addition, there is no in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of HipBST from Legionella. This is also a major weakness of this work.

      A structural comparison to the recent structures from Legionella has now been included in the discussion, including Figure 6 supplement 1.

      Reviewer #2 (Public Review):

      The work by Bærentsen et al., entitled "Structural basis for regulation of a tripartite toxin-antitoxin system by dual phosphorylation" deals with the structural aspects of the control of the hipBST TA operon, the role of auto-phosphorylation in the activation and neutralisation of the enzyme and the direct effects of HipS and HipB in neutralisation. This is a follow-up to the Vang Nielsen et al., and Gerdes et al., papers from the same authors on this very unique TA module, that brings forth a thorough and well written dissection of an unusually complex regulatory system.

      This is a much improved manuscript, the paper is more focused and the message is now clear.

      Reviewer #1 (Recommendations For The Authors):

      My main recommendation would be to include an in-depth structural comparison between the structure of the HipBST solved in the work and the two recent structures of similar HipBST from Legionella.

      We thank the reviewer and have included a new supplementary figure (Figure 6 supplement 1) and expanded the comparison in the discussion to accommodate this.

      Reviewer #2 (Recommendations For The Authors):

      So I only have some minor comments.

      1) The authors should accompany Fig.1 (a supplementary panel is sufficient) with a surface electrostatic representation of the complex to better illustrate the potential role of the complex in transcription auto-regulation.

      We have included a new panel in Figure 1 supplement 3 to show the electrostatic surface of the DNA-binding domains of HipB of HipBST and HipBASo.

      2) When the Gly-rich loop is first introduced, please provide from which residue to which residue the loop expands.

      Corrected for both the first mention of the Gly-rich loop of HipA and HipT.

      3) In Fig 2. The authors try to show how the interaction of the main helix of HipS with HipT is different in HipBST compared to HipAB. I think it would be helpful if these two panel show the surface of HipT and HipA coloured by electrostatics so that not only the differences in HipS become apparent, but also the local differences between both toxins.

      We thank the reviewer for this excellent idea, and the electrostatics did in fact reveal that the region of the toxins are different. We have updated figure 2b to show this difference.

      4) Fig. 4 Shows the experimental SAXS curves for the HipT D210Q variants SIS (blue), SID (red), and DIS (orange). In each case a black curve is fitted to the data (presumably the fitting of the model-derived scattering curve to the data). Could the authors clarify this in the figure?

      We agree that this information is missing in the legend. The black curves are the fits for the models based on the crystal structure after rigid-body refinements and inclusion of a structure factor to account for oligomerization of the complexes. This is now included in the figure caption.

      5) Also regarding the SAXS analysis, in the manuscript the authors state that all three models "gave good fits to the data" as assessed by the fitting χ2. These χ2 values should be explicit in the figure or the figure legend.

      We thank the reviewer for this suggestion. The chi squared values for the best fits are now given in the text.

      In addition, is the SAXS data (the parameters derived from the experimental scattering, including the MW) consistent with the lack of HipS from the complex? (it should be...).

      This is a good point, however, the partial oligomerization (dimerization) of the complexes (heterohexamers) and the variation of the dimerization degree between samples prevent extraction of useful mass values from the I(0) determinations. Therefore, we decided not to give the values explicitly in the text but only state “…consistent with analysis of the forward scattering that revealed partial oligomerisation of the samples with an average mass corresponding to roughly a dimer of the HipBST heterohexamer.”

      6) Please improve this sentence: "Moreover, since it has previously been shown that only the HipT Gly-rich loop never is observed in doubly phosphorylated form with both Ser57 and Ser59 modified simultaneously, it is unlikely that the effects are due to autophosphorylation of the remaining serine residue in either case (Vang Nielsen et al., 2019)."

      Done

    1. Author Response

      We are happy that the novelty and strengths of the study have been appreciated by the editor/s and reviewer/s. We thank the editor/s and reviewer/s for a considerably detailed and constructive review of the manuscript. Here are the responses and proposed revisions from the authors.

      • The weakness, as pointed out in the editorial comment regarding the absence of data on role of Piezo1 in migrating T cells in varying physico-chemical conditions were, in the opinion of the authors, beyond the scope of the present manuscript. Moreover, introducing external forces using invasive techniques followed by assessment of Piezo1 function was intentionally avoided. That was the reason for using the non-invasive microscopy technique like IRM to assess membrane tension generation in migrating T cells.

      • With regard to the explanation sought for the statement 'these high tension edges are usually further emphasized at later time points', the edges are visible right from 1 min (Supp fig 2B) and seen to be emphasized at 30 min. In Fig 2D, we find the 3 min time point at which increased tension at edges is visible together with a clear difference in median tension too. Fig. 2c and Supp fig 2C are averaged over all cells - hence it is possible that at a time point when a particular cell still shows higher tension at edges the median tension of Fig 2C is not significantly different. Also, if only a thin section of cell-edge enhances tension - it may contribute to a second peak without affecting the median much.

      • With regard to the query regarding experimental replicates, all data shown is derived from at least 3 experimental replicates for Jurkat cells or independent blood donors for primary CD4+ T lymphocytes as specified in the respective figure legends.

      • With regard to the comments on nonavailability of representative images/videos for Figures 1 A and B, in the revised manuscript we will add representative video of GFP (-) and GFP (+) tracks. The transwell experiments were assessed by collecting cells from the bottom chamber followed by flow cytometry. We did not take microscopic images of the bottom chambers before collecting the cells.

    1. Author Response

      We thank the two reviewers and the reviewing editor for their positive evaluation of our manuscript. Especially, we appreciate the useful comments and suggestions on how the manuscript can be improved and which directions would be promising for future work on this topic. We would like to point out that we did consider the possibility that the plant enzymes produce ethylene in the same manner as EFE, but so far we did not obtain any evidence for such an activity (Supplementary Figure 3). We also performed some preliminary experiments with plants subjected to biotic stress, but the results suggested that neither defence responses nor pipecolate and proline biosynthesis depend to a significant extent on the 2-ODD-C23 enzymes. We plan to address these questions in more detail in further experiments. Depending on the outcome, we will either incorporate the results into a revised version of the present manuscript, or present them as follow-up studies. Concerning the possibility of testing all types of pathogens that affect expression of the 2-ODD-C23 genes, it is beyond our capacity and beyond the scope of the present manuscript. We hope, however, that such experiments can be the subject of a future research project in collaboration with experts in plant-pathogen interactions.

    1. Author Response

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      We thank the reviewer for this comment. We will show the data in the revised manuscript.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We will describe the limitation and advantage of our strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      We apologize for not mentioning it clearly. As we have confirmed the unresponsiveness using synthetic HCoV peptides, we will include these data in the revised manuscript.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      We thank the reviewer for this helpful comment. We will add the discussion to the revised manuscript.

      Reviewer #3 (Public Review):

      Summary: The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      We will carefully describe the interpretation of the data with statistical analysis in the revised manuscript.

      2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We apologize for the insufficient explanation and will describe how we performed cell annotation in the revised manuscript.

      3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important point. We will describe the limitation of the strategy. In addition, we will include some data in accordance with the reviewer’s recommendation.

      4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We will also show the proportion of clonotypes in the revised manuscript.

      5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have exactly the same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

    1. Author Response

      Reviewer #1 (Public Review):

      Gambelli et al. provide a structural study of the SlaA/SlaB S-layer of the archaeon Sulfolobus acidocaldarius. S-layers form an essential component of most archaeal cell envelopes, where their self-assembling properties and activity as cell envelope support structures have raised substantial interest, both from researchers seeking to understand the fundamental biology of archaea, as well as researchers seeking to exploit the biomaterial properties of S-layers in biotechnological applications. Both interests are hampered by the paucity of structural information on archaeal S-layer assembly, structure, and function to date, in large part due to technical difficulties in their study.

      In this study, Gambelli and coworkers overcome these difficulties and report the high-resolution 3D cryoEM structures of the purified SlaA monomers at three different pH, as well as the medium resolution 3D cryoET structures of the SlaA/SlaB lattices determined from S-layer fragments isolated from the Sulfolobus cells.

      The structural work is generally well executed, although lacks in detail in places to allow a proper review, particularly in the cryoET. A further drawback of the current manuscript is that the structural work remains rather descriptive and speculative, with little validation of the proposed models.

      The authors run a plethora of representation, analyses, prediction, and simulation software on their structures resulting in an abundance of Figures that risk overloading the reader and in several cases bring little new insight beyond unsubstantiated speculation.

      We understand the reviewer’s concern about the number of figures presented in the manuscript. To avoid overloading the reader, we have further simplified the supplementary figures and provided additional context and explanations in the narrative of the manuscript to ensure that the reader can follow the data presented. We have also improved unclarities in legends, making sure that they provide clearer explanations of the data. Additionally, we have taken extra care to connect each figure to the main findings, emphasising how each piece of data contributes to the overall understanding of the structures.

      We find it difficult to agree with the assertion of unsubstantiated speculation. We carefully justified our interpretation of our data, referring to well-established principles and relevant literature. Nevertheless, we have attempted to provide further context and clarification in the revised manuscript. Where appropriate, we have acknowledged the limitations of our analyses and have made sure to note where further research is needed to confirm their findings.

      The structural description of the S. acidocaldarius S-layer will be of high general interest and the authors have made a substantial leap forward, but the current manuscript would benefit from a better validation and basic atomic description of the SlaA/SlaB S-layer.

      Specific points.

      • It is not possible to review the quality of the SlaA and SlaA/SlaB models in the cryoET reconstruction. No detailed fits of the map and model are shown, and no correlation statistics are given (the latter is also true for the higher resolution 3D reconstructions at pH4, 7, and 10). To be of use to the community, the S-layer model and cryoET maps should also be deposited in PDB and EMDB, and an autodep report and ideally the cryoET maps should be available.

      Maps and models for the SlaA single particle at pH4, 7 and 10 have now been released on the PDB database under the accession codes PDB-7ZCX, PDB-8AN3 and PDB-8AN2 and all validation statistics can be accessed there. We have also provided a standard cryoEM statistics table with the manuscript.

      We have also changed the main figures 4 and 5 to include more detail about the STA maps and models. We have deposited the sub-tomogram averaging map in the EMDB (EMD-18127) and models of the hexameric and trimeric pores in the Protein Databank under accession codes PDB-8QP0 and PDB-8QOX, respectively (with status release upon publication). We have also attached the map and models as supporting files to this rebuttal.

      • The authors spend a great deal on the MD simulation of the SlaA glycans and the description of the 'glycan shield' and its possible role in subunit electrostatics and intersubunit contacts. This does not result in testable hypotheses, however, and does not bring much more than vague speculation on the role of the glycans or the subunits contacts in S-layer assembly and stability.

      We propose that our glycan analysis does lead to a testable hypothesis, which could for example be tested by a future study involving the genetic or enzymatic ablation of glycosylation sites and the subsequent investigation of the structure and stability of the S-layer. We have included this statement in our manuscript to inspire future research in this direction.

      • For the primary description of the SlaA/B S-layer, more important would be a detailed atomic description and validation of the intermolecular contacts in the proposed lattice model. Given the low resolution of the cryoET, this would require MD simulation of the contacts. Lattice stability during MD simulation and/or the confirmation of lattice contacts by cross-linking mass spectrometry would go a great way in validating the proposed lattice model.

      We have improved our map and model by reprocessing our sub-tomogram averages (STA) using a different pipeline (Warp and M). We are now able to visualise more of SlaB, and the new map agrees with our Alphafold predictions of the SlaB trimer. The new map also clearly shows the interaction sites between SlaA and SlaB, as well as how SlaB integrates into the lipid bilayer. We have made new figures that now correlate the STA with the atomic model more clearly.

      Taking the reviewer’s suggestions on board, we have used Namdinator – a molecular dynamics-based flexible fitting software, to refine our model. Due to RAM limitations, we had to split our model into two pdb files. The first contains 6 SlaA monomers delineating a hexameric pore and the second, 3 SlaB monomers and 5 SlaA in the region of a trimeric pore. While the new models largely agree with the original, Namdinator did improve them. The IgG domains of SlaB now fill previously unoccupied areas of the map and any clashes have been removed. Notably, the way that SlaA is modelled is the only way in which the subunits can be reconciled with the map. This is especially true for the surface glycans, which in our model are excluded from any of the intermolecular interfaces and thus remain free to move around in the solvent. In any other SlaA configuration, there would be severe clashes between neighbouring polypeptide backbones or proteins and surface glycans and thus be sterically or entropically unfavourable.

      Unfortunately, full MD simulations of the entire S-layer array would necessitate the simulation of at least 36 SlaA monomers, including glycans, in addition to 9 SlaB monomers integrated into a membrane and solvent environment, implying >8 Million atoms. Such largescale models would only enable the simulation of very short simulation times (on the order of no more than 100 nanoseconds). Such time scales would preclude the observation of major changes, even if the model was sub-optimally configured.

      • The discussion of the subunit electrostatics and the role they could play in subunit assembly/disassembly remains superficial and speculative. No real model or hypothesis is put forward, let alone validated.

      We have rephrased the discussion to clearly state our hypothesis regarding S-layer disassembly. Hopefully, it should now be clearer that from our data, we deduce that S-layer disassembly at high pH is likely not driven by protein unfolding or pH-induced conformational change. We hypothesise that instead the pH-induced disassembly is likely caused by a weakening or abolishment of hydrogen bonds, as the proton concentration is reduced.

      • The authors solve the cryoEM structure of SlaA released and purified form S. acidocaldarius S-layers by an alkaline pH shift. When shifted back to acidic pH, does this native material self-assemble in vitro? If not, do the authors have an explanation for this? Are components missing or could the solved structures represent SlaA conformations that are no longer assembly competent?

      We have previously shown that S. acidocaldarius S-layers disassembled by a pH shift from acidic to alkaline reassemble when the pH is shifted back to acidic. We also demonstrated that this disassembly / reassembly works with both SlaB present and absent, showing that SlaA alone can assemble into an S-layer (Gambelli et al, PNAS, 2019). This means that the SlaA protein that we imaged in this manuscript is indeed reassembly competent. We have included a sentence clarifying this in the first paragraph of the Results section and have discussed our hypothesis for the mechanism underlying assembly and disassembly in detail.

      Reviewer #2 (Public Review):

      Gambelli et al. investigated the surface layer (S-layer) of Sulfolobus acidocaldarius by using combined single particle cryo-electron microscopy (cryoEM), cryo-electron tomography (cryoET), and Alphafold2 predictions to generate an atomic model of this outermost cell envelope structure. As known from previous studies, the two-dimensional lattice comprises two distinct S-layer glycoproteins (SLPs) termed SlaA, the outer component interacting with the harsh living environment of this archaeon, and SlaB, comprising a dominant hydrophobic domain, which anchors this SLP in the cytoplasmic membrane, respectively. The interwoven S-layer lattice of S. acidocaldarius shows a hexagonal lattice symmetry with a p3 topography. It is built very complex as the unit cell constitutes of one SlaB trimer and three SlaA dimers (SlaB3/3SlaA2). Despite the complexity of this distinct proteinaceous S-layer lattice, the authors not only investigated the SLP structures but also considered the glycans in their structure predictions.

      The strengths of this study are that it was possible, and the first approach taken, to divide the Y-shaped SlaA SLP, starting from the N-terminus into six domains, D1 to D6. As previous studies revealed that SlaA assembly and disassembly are pH-sensitive processes, the structure of SlaA was investigated at different pH conditions. This approach led to the striking result that the cryoEM maps of SlaA D1 to D4 are virtually identical at the three pH conditions, demonstrating remarkable pH stability of these protein domains. For SlaA at low pH, however, the domains D5 and D6 were too flexible to be resolved in the cryoEM maps. Nevertheless, the authors were able to hypothesize that jackknife-like conformational changes of a link between domains D4 and D5, as well as pH-induced alterations in the surface charge of SlaA play important roles in S-layer assembly. This study showed in addition, that the surface charges of SlaA shift significantly from positive at acidic pH to negative at basic pH. A comparison of the surface charge between glycosylated and non-glycosylated SlaA showed that the glycans contribute considerably to the negative charge of the protein at higher pH values. This change in electrostatic surface potential may therefore be a key factor in disrupting protein-protein interactions within the S-layer, causing its disassembly as it is highly desired for new practical applications in biomolecular nanotechnology and synthetic biology. An excellent approach was to use exosomes to determine the structure of the entire S-layer structure comprising of SlaA and SlaB. By this approach, effectively two zones in the SlaA assembly could be distinguished: an outer zone constituted by D1 to D4, and one inner zone formed by D5 and D6. Moreover, for the first time, deeper insights into how SlaA forms the hexagonal and triangular pores within the S-layer lattice of S. acidocaldarius are provided. Very interesting are the found SlaA dimers, which are suggested to be formed by two SlaA monomers through the D6 domains, with each SlaA dimer spanning two adjacent hexagonal pores.

      The weaknesses in this work are in the introduction, where the citation is incomplete. In the comparisons drawn between archaeal and bacterial S-layers, basic citations are missing for the latter. One gets the impression that there is a deliberate avoidance of citing individual prominent S-layer research groups here. The same is true for citations of glycosylation of archaeal S-layer proteins and Sulfolobus mutants lacking SlaB.

      We thank the reviewer for suggesting the inclusion of additional references. We would like to reassure the reviewer that we did not intend any deliberate omissions. Instead, we aimed to focus on archaeal S-layers and thus did not provide a detailed overview of bacterial S-layers. We have now incorporated more references on bacterial S-layers, hoping that this will be provide a more balanced overview.

      The authors show many pictures and schematic drawings of high quality. In the main text, these illustrations should be briefly commented on if there is any ambiguity. For example, it is somewhat difficult to understand that in one schematic drawing the angle between the SlaA longitudinal axis and the membrane plane is 28 degrees and at the same time in another schema, the angle of the longitudinal axes in SlaA dimers is given as 160 degrees.

      We thank the reviewer for their appreciation for our figures. To clarify, the angles mentioned are two different ones. The 28 degrees angle is located between the cytoplasmic membrane and the longitudinal axis of an SlaA monomer in the assembled S-layer. The 160 degrees angle is located between two SlaA monomers forming a dimer.

      The authors argue that by a pH shift to 10, SlaA disassembles and exists exclusively as a single molecule. The presence of exclusively single SlaA proteins and the purity of the fractions were assessed by SDS/PAGE analysis and cryoEM micrographs. However, one can doubt that, due to the strong denaturing effect of SDS and the subsequent dissociation of protein complexes, SlaA dimers or oligomers could have been determined with SDS/PAGE.

      To clarify, we did not assess the assembly state of the S-layer by SDS PAGE, as we are aware that assembled S-layers would not travel into the gel. Instead, we assessed the assembly state by negative stain electron microscopy. Class averages of purified SlaA did not reveal any dimers or higher oligomers.

      Moreover, the shown representative micrographs (supplementary figure 2, a-c) show a heterogeneous structure and thus, do not support the exclusive presence of disassembled SlaA monomers.

      We are not sure what exactly the reviewer is referring to, there are only single SlaA particles visible in supplementary figure 2, a-c. (new ) Larger, amorphous “blobs” in the panels are likely ethane contaminations on the cryoEM grid.

      An interesting finding is SlaA dimerization. SlaA dimers can obviously be found in co-existence with SlaA-only S-layer as shown in supplementary figure 15. A short discussion on whether dimers are an intermediate structure in the process of S-layer lattice formation from monomeric SlaA or if this structure was just a coincident observation could help the reader to better understand the meaning of these dimeric structures and at which stage they are formed.

      We thank the reviewer for their suggestion and added a brief statement to the discussion to clarify this point: “Their co-existence with assembled S-layer may indicate that SlaA dimers are an intermediate of S-layer assembly or disassembly.” The figure numbering was updated, so supplementary figure 15 has now become Figure 4-figure supplement 4.

    1. Author Response

      Reviewer #1 (Public Review):

      “In analyzing neural activity accompanying the behavioral persistence of the dominant sequence after a block change, the authors find that the ACC ensemble firing pattern is closer to the original dominant sequence pattern during reinforcement and less like this pattern during exploration… As time, and trials, progress the rat is approaching the point at which it explores another strategy. The authors find strengthened "prevalence" encoding with increasing sequence repetition, but if this parameter is related to behavioral change/flexibility, this was not clear to me. Might there be something unique about the last trials in a tail "predicting" an upcoming switch? Can the authors please expand? Relatedly, if the prediction of upcoming behavioral change is not observed in the neural activity from sequence steps 2-6, it is notable that these are the steps 'within' the sequence, that leaves out the initiation (first center poke) and termination (reward/reward omission). Thus one could imagine this information is "missed" in the current analysis given that both the reward period and the initiation of a trial at the center are not analyzed. This does lead me to suggest a softening of some claims made of identifying "unifying principles" of ACC function, as the authors state, based on the analyses included in the current report, since the neural activity related to the full unit of behavior is not considered. (I appreciate the motivation behind this focus on within-sequence behavior - the wish to compare time periods with similar movement parameters .)

      We apologize for the confusion; while the sequence prevalence itself tends to be high for ‘dominant tails’, we do not claim that the fit of the prevalence model is better at those sequence instances. We do share the interest in linking prevalence encoding to behavioral adaptation as well as the Reviewer’s intuition that block transitions should be among the epochs where strategy prevalence is tracked particularly well. And indeed, we had spent a considerable amount of time thinking about whether we can identify and interpret periods during the session where our prevalence model fits better or worse. Two arguments convinced us to abandon that direction: a technical one and a conceptual one. The technical argument is that when the explanatory power of a variable is limited, regression residuals are proportional to the variable itself. Thus, any meaningful comparison of the model’s fit would have had to be done for periods where strategy prevalence is within a similar range. The conceptual argument is even more disarming: imagine we do identify a putative session epoch where the model fits worse. While it is possible that it truly means that the animal tracks the details of how much he has pursued this strategy in recent past less, it is equally possible that we were simply off in selecting the specific window over which the prevalence signal is estimated, the exact behavioral statistic tracked, or the exact form of the dependence between that statistic and neural activity. We certainly do see changes leading up to behavioral switches at block transitions – something we plan to elaborate on in a subsequent paper – but whether those are related to prevalence tracking is something we believe is hard to crack.

    1. Author Response

      Reviewer 1 (Public Review):

      Weakness: Although the cross-links stimulate ATP hydrolysis, further controls are needed to convince me that the TM1 conformations observed in the structures are physiologically relevant, since they have been trapped by "large" substrates covalently-tethered by crosslinks.

      Reviewer 1 raised concerns about the relatively large size of our covalently attached AAC substrate that would potentially distort TM1 in Pgp. We would like to clarify that AAC has a molecular weight of 462 Da, which, in comparison to many known Pgp substrates ranging from 250 to over 1,000 Da, is not a large compound. For instance, the few other Pgp substrates mentioned in our manuscript all have a comparable or larger size: verapamil, 455 Da; doxorubicin, 544 Da; FK506, 804 Da; valinomycin, 1,111 Da; cyclosporin A, 1,203 Da.

      Furthermore, AAC was strategically attached to a site distant from TM1 in the inwardfacing Pgp conformation. After it was exported to the outward-facing state, several TM helices accommodate the compound. The observation that only TM1 exhibited significant conformational changes suggests its potential role in the transport mechanism. This hypothesis is supported by our findings, where a conservative substitution (G72A) in TM1 resulted in a dramatic loss of transport function for various drug substrates and impaired verapamil-stimulated ATPase activity.

      Reviewer 1 (Recommendations for the Authors):

      I understand the need for an unconventional approach to understanding the translocation pathway. What would help to support this model is to cross-link a much smaller substrate, as the one used is quite large and could potentially distort TM1 in the outward-state when cross-linked.

      We thank the reviewer for this recommendation, and we have outlined plans for future experiments involving other substrates, including smaller ones, to further investigate our proposed model. However, it is important to acknowledge that conducting these studies will require a significant amount of effort and resources, which we believe extend beyond the scope of our current manuscript.

      In unbiased MD simulations starting from the IF state are there any simulations where the substrate follows the same path as proposed here?

      All our MD simulations were performed in the outward-facing state to focus on potential substrate release pathways. Starting MD simulations from the inwardfacing state would introduce complexities in capturing the necessary domain motions and nucleotide binding and hydrolysis required for substrate translocations. Therefore, we opted not to perform MD studies starting from the inward-facing state.

      Reviewer 2 (Public Review):

      Weakness: There is much to like about the experimental work here but I am less sanguine on the interpretation. The main idea is to covalently link via disulfide bonds a model tripeptide substrate under different conditions that mimic transport and then image the resulting conformations. The choice of the Pgp cysteine mutants here is critical but also poses questions regarding the interpretation. What seems to be missing, or not reported, is a series of control experiments for further cysteine mutations.

      Reviewer 2 raised concerns about the interpretation of our results and suggested the need for additional mutant designs to validate our proposed TM1 mechanism. Firstly, we believe that the observed TM1 conformational changes are valid in our cryoEM structures, despite the use of different conditions and several mutants to capture Pgp in the outward-facing state.

      Regarding the G72A mutant, we consider it conclusive that this single point mutation in the TM1 has a profound effect. Importantly, the G72A mutant was readily expressed and purifiable as a stable protein. We were able to resolve a high-resolution structure of the G72A mutant (without the substrate), confirming that the protein is not generally destabilized but properly folded.

      Above all, we appreciate the Reviewer’s suggestion to explore additional mutations and intend to do so in future studies.

      Reviewer 2 (Recommendations for the Authors):

      I am sold on the results regarding TM1 conformational changes as they are evident in the cryoEM structures. However, the set of states compared between mutants are not biochemically equivalent: for 335 and 978 they used an ATP-impaired Pgp whereas for 971 they used what appears to be WT, and the conformation was imaged presumably subsequent to ATP hydrolysis and Vanadate trapping. This is significant if the authors were unable to trap the OF in the impaired mutant background and should be highlighted. I have to believe that they tried that condition but I could be wrong.

      We acknowledge the point made by the Reviewer about the biochemical equivalence of mutant states and the potential significance of using an ATP-impaired mutant for trapping the outward-facing conformation of 971. We have not yet attempted to use the ATPase-deficient 971C mutant for crosslinking and intend to address this question in future studies.

      In our current approach, we used the ATPase-active 971C for two specific reasons:

      1) Our biochemistry data, as shown in Fig 1C, indicates that 971C only crosslinks in the presence of ATP hydrolysis conditions. Vanadate trapping was employed to stabilize the outward-facing conformation.

      2) Based on our experience, we have observed that the conformations of ATP-bound (mutant) and vanadate-trapped states of an ABC transporter are structurally equivalent at this resolution level of our study (see ref. 21: Hoffmann et al. NATURE 2019).

      The authors propose a new model for substrate translocation. It is based on three mutants and a number of structures. If the authors were not challenging the current dogma I would not have written the next comment. Considering the impact of the findings, I would have designed a couple more cysteine mutants based on their model. For instance, this pathway has a number of stabilizing interactions, can't they make a mutant that preserves conformational switching but eliminates substrate translocation? I like the G97A mutant result but I am worried that the effect could just be a general destabilization or misfolding as part of the cryoEM particles seem to suggest. The authors advance one interpretation of the disorder observed in this mutant but it could easily be my interpretation.

      We thank the reviewer for the suggestion to design additional mutants to further validate our proposed model for substrate translocation. We agree that this would be highly valuable, considering the potential impact of our findings. However, given the time-intensive nature of our approach, we believe that presenting these additional designs in a future study is a reasonable course of action.

      Regarding the G72A mutation, we believe that our current data fully supports our model and the role of TM1 in regulating the Pgp activity. Importantly, we would like to emphasize that the G72A mutant was readily expressed and purifiable as a stable protein. Additionally, our cryoEM structural determination of the G72A mutant at high resolution confirmed that the protein is not generally destabilized but properly folded.

      There are a couple of troubling methodological questions that I want the authors to address or clarify:

      1- In the methods they report that the final sample for cryoEM was prepared on a SEC devoid of detergent. It is obvious that the sample was folded but I was wondering why the detergent was removed? Was that critical for observing these structures with multiple ligands? Did they observe any lipids in their cryoEM?

      We avoid detergent in the buffer on final SEC purification. This step is to remove free detergent from the background which helps during cryoEM imaging. Of course, this cannot be done with every detergent but due to the very low CMC of LMNG it is possible. By now, we have verified this method for several other transporters with the same success. While this procedure helps us to obtain better images it is not necessary to obtain specific conformations or ligand bound states, nor does it affect these states or conformations.

      In our cryoEM structures , we did observe multiple cholesterol hemisuccinate (CHS) molecules on the outer transmembrane surface of Pgp.

      2- Can the authors comment on why labeling was carried out in the presence of ATP? Does it matter if the substrate was added prior to ATP and incubated for a few minutes?

      For every dataset, we first added the substrate to be cross-linked and afterwards added the ATP. In the cases of 335C and 978C, labeling was successful before ATP was added, as evidenced by the inward-facing structures with cross-linked substrate.

      However, for 971C, cross-linking only occurred after the addition of ATP. We interpret this data to suggest that the 971 site is inaccessible to the substrate in the inward-facing state, and cross-linking can only occur after the transporter transitions to outward-facing state. This is in line with our inward-facing structure which does not show a cross-linked substrate, and our biochemical data shown in Fig 1C, where 971C only crosslinked in the presence of ATP.

      3- I am not an expert on MD simulations and I understand that carrying out simulations at higher temperatures used to be a trick to accelerate the process. Is this still necessary? Why didn't the author use approaches such as WESTPA?

      Most so-called enhanced sampling methods, including WESTPA, explicitly define a reaction coordinate for the process of interest, usually based on intuition or prior studies. If this coordinate is chosen poorly, enhanced sampling usually fails, either because the sampling becomes inefficient or because the sampling biases the transition pathway (or both). Lacking reliable intuition or prior knowledge on which motions would result in substrate release, we chose temperature to speed up the process. High temperature largely avoids the introduction of an any bias through the definition of a progress coordinate. By contrast, the weighted ensemble method underlying WESTPA is a great method to simulate unbiased dynamics of a process with a known progress coordinate, but unfortunately requires to choose a progress coordinate prior to the simulation and will then mostly sample the process along this progress coordinate, because this is the only direction in which sampling is improved. High temperature MD on the other hand accelerates all processes in the system under study. Indeed, we have now confirmed that the pathway found at high temperature is also feasible at near-ambient conditions.

      In new simulations, we have now observed a similar release pathway at T=330 K. As the only difference, the substrate has not fully dissociated from the protein after 2.5 us, with weak interactions persisting at the top part of TM1 from the extracellular side. Importantly, this is a configuration observed also in higher temperature simulations but with much shorter lifetime.

      In response, we will include these new findings in the revised manuscript.

      4- One way to show that the two substrates binding mode is biochemically relevant is to measure Vmax at different substrate concentrations. One would expect a cooperative transition if that interaction is mechanistically important.

      We have measured Vmax as a function of QZ-Ala concentration in a previous report (ref. 24), supporting positive cooperativity for binding to two sites.

      Reviewer 3 (Public Review and Recommendations for the Authors):

      We thank Reviewer 3 for recommending the acceptance of our manuscript as is. We will address all minor comments from Reviewer 3 in the revised manuscript.

    1. Author Response

      We thank the Editors and Reviewers for the thorough assessment of our work. We are pleased that you agree with us that our proof-of-concept study of the ATUM Tomo technology advances volume electron microscopy and has the potential to solve research questions in diverse biological areas. Based on your comments, we are planning to revise the manuscript to optimize readability, clarify the fields of applicability of our approach more, and add some data related to questions you raised. We plan the following revisions:

      Reviewer #1 The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.

      As part of the supplemental figures describe essential experimental details, we will move them into the main part of the manuscript.

      Reviewer #1 In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.

      Reviewer #2 Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.

      Thank you for the valuable comments on the missing experimental details, which could affect the ease of establisihing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).

      Reviewer #1 Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.

      Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?

      In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.

      Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We will add more suggestions for possible applications to the discussion to accommodate the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.

      Reviewer #2 Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?

      Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we will try to assess the importance of timing in retrospect.

      Reviewer #2 To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?

      We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We aim to test the general workflow with tissue embedded in other commonly used resin types.

      Reviewer #2 Minor corrections to the text and figures.

      Thank you for the detailed corrections. We will apply them accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Sun and co-authors have determined the crystal structures of EHEP with/without phlorotannin analog, TNA, and akuBGL. Using the akuBGL apo structure, they also constructed model structures of akuBGL with phlorotannins (inhibitor) and laminarins (substrate) by docking calculation. They clearly showed the effects of TNA on akuBGL activity with/without EHEP and resolubilization of the EHEP-phlorotannin (eckol) precipitate under alkaline conditions (pH >8). Based on this knowledge, they propose the molecular mechanism of the akuBGL- phlorotannin/laminarin-EHEP system at the atomic level. Their proposed mechanism is useful for further understanding of the defensive-offensive association between algae and herbivores. However, there are several concerns, especially about structural information, that authors should address.

      Thank you for reviewing our manuscript. We addressed all comments below.

      1) TNA binding to EHEP

      The electron densities could not show the exact conformations of the five gallic acids of TNA, as the authors mentioned in the manuscript. On the other hand, the authors describe and discuss the detailed interaction between EHEP and TNA based on structural information. The above seems contradictory. In addition, the orientation of TNA, especially the core part, in Fig. 4 and PDB (8IN6) coordinates seem inconsistent. The authors should redraw Fig. 4 and revise the description accordingly to be slightly more qualitative.

      We apologize for the mistake with the PDB file. We forgot to re-upload the final coordinate file of 8IN6, which had been modified according to the requirement of the PDB instructions. We have now re-uploaded the correct PDB file. We carefully checked Fig. 4 (Fig.3 in the revised version), which used the final coordinate file of 8IN6.

      2) Two domains of akuBGL

      The authors concluded that only the GH1D2 domain affects its catalytic activity from a detailed structural comparison and the activity of recombinant GH1D1. That conclusion is probably reasonable. However, the recombinant GH1D2 (or GH1D1+GH1D2) and inactive mutants are essential to reliably substantiate conclusions. The authors failed to overexpress recombinant GH1D2 using the E. coli expression system. Have the authors tried GH1D1+GH1D2 expression and/or other expression systems?

      By referencing other BGLs (six samples were expressed by using E. coli, and one was expressed by using Pichia), we only tried the overexpression of akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2 in E. coli expression system using several different vectors. As the reviewer mentioned that inactive mutants are essential to substantiate our conclusion reliably, it will be tried further to use yeast or cell expression systems to confirm our conclusion. We added these limitations as “Future assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion (Line 343-345)

      3) Inhibitor binding of akuBGL

      The authors constructed the docking structure of GH1D2 with TNA, phloroglucinol, and eckol because they could not determine complex structures by crystallography. The molecular weight of akuBGL would also allow structure determination by cryo-EM, but have the authors tried it? In addition, the authors describe and discuss the detailed interaction between GH1D2 and TNA/phloroglucinol/eckol based on docking structures. The authors should describe the accuracy of the docking structures in more detail, or in more qualitative terms if difficult.

      Yes, it is possible to try cryo-EM for obtaining the structure of akuBGL complexed with the ligand. However, we didn’t try because 110 kDa akuBGL consists of two 55 kDa GH1Ds linked by along loop, and we worried that ligand may not be visualized using cryo-EM.

      Following the comment, we added the description of the accuracy of the docking structures as “Those docking scores corroborated well with the inhibition activity toward akuBGL, that TNA had a more robust inhibition activity than phloroglucinol, indicating that the docking results are reasonable.” (Line 322-324)

      Reviewer #2 (Public Review):

      In this study the authors try to understand the interaction of a 110 kDa ß-glucosidase from the mollusk Aplysia kurodai, named akuBGL, with its substrate, laminarin, the main storage polysaccharide in brown algae. On the other hand, brown algae produce phlorotannin, a secondary metabolite that inhibits akuBGL. The authors study the interaction of phlorotannin with the protein EHEP, which protects akuBGL from phlorotannin by sequestering it in an insoluble complex.

      The strongest aspect of this study is the outstanding crystallographic structures they obtained, including akuBGL (TNA soaked crystal) structure at 2.7 Å resolution, EHEP structure at 1.15 Å resolution, EHEP-TNA complex at 1.9 Å resolution, and phloroglucinol soaked EHEP structure at 1.4 Å resolution. EHEP structure is a new protein fold, constituting the major contribution of the study.

      We thank you for reviewing our manuscript.

      The drawback on EHEP structure is that protein purification, crystallization, phasing and initial model building were published somewhere else by the authors, so this structure is incremental research and not new.

      We have published the results of protein purification, crystallization, phasing, and initial model building for determining structure but have yet to give the structure since further structural refinement is indispensable. Such published data in [Acta F] is a service for obtaining the structure.

      We believe that the structure of the EHEP holds great importance, and it is the first time to publish.

      Most of the conclusions are derived from the analysis of the crystallographic structures. Some of them are supported by other experimental data, but remain incomplete. The impossibility to obtain recombinant samples, implying that no mutants can be tested, makes it difficult to confirm some of the claims, especially about the substrate binding and the function of the two GH1Ds from akuBGL.

      As mentioned by the reviewer, mutant analysis would be the best way to substantiate our conclusions. However, it is challenging to obtain recombinant samples, although we tried to overexpress them (akuBGL, GH1D1, GH1D2, and GH1D1+GH1D2). So, we did the structural comparison, and docking simulation to propose the molecular mechanism. We added these limitations as “Further assay of GH1D2 and inactive mutants is the complement to validate the molecular mechanism of akuBGL” in the discussion part (Line 343-345).

      The authors hypothesize from their structure that the interaction of EHEP with phlorotannins might be pH dependent. Then they succeed to confirm their hypothesis, showing they can recover EHEP from precipitates at alkaline pH, and that the recovered EHEP can be reutilized.

      A weakness in the model is raised by the fact that the stoichiometry of the complex EHEP:TNA is proposed to be 1:1, but in Figure 1 they show that 4 µM of EHEP protects akuBGL from 40 µM TNA, meaning EHEP sequesters more TNA than expected, this should be addressed in the manuscript.

      The assay experiment in figure1 does not directly provide the stoichiometric ratio of EHEP: TNA because the activity assay system consists of substrate of akuBGL, akuBGL, TNA, and EHEP, which involves multiple equilibration processes: akuBGL⇋ substrate, akuBGL⇋TNA, and EHEP ⇋TNA. To avoid misunderstanding, we added the descriptions of ″As this activity assay system involves multiple equilibration processes: akuBGL⇋substrate, akuBGL⇋TNA, and EHEP ⇋TNA.″(Line 120-121).

      The authors study the interaction of akuBGL with different ligands using docking. This technique is good for understanding the possible interaction between the two molecules but should not be used as evidence of binding affinity. This implies that the claims about the different binding affinities between laminarin and the inhibitors should be taken out of the preprint.

      Following the suggestion, we deleted the descriptions about the difference in binding affinity with docking scores at the last paragraph of [Inhibitor binding of akuBGL].

      In the discussion section there is a mistake in the text that contradicts the results. It is written "EHEP-TNA could not dissolve in the buffer of pH > 8.0" but the result obtained is the opposite, the precipitate dissolved at alkaline pH.

      We apologize for this mistake and corrected it to " EHEP–TNA could dissolve in the buffer of pH > 8.0." (Line 394).

      Solving a new protein fold, as the authors report for EHEP, is relevant to the community because it contributes to the understanding of protein folding. The study is also relevant dew to the potential biotechnological application of the system in biofuel production. The understanding on how an enzyme as akuBGL can discriminate between substrates is important for the manipulation of such enzyme in terms of improving its activity or changing its specificity. The authors also provide with preliminary data that can be used by others to produce the proteins described or to design a strategy to recover EHEP from precipitates with phlorotannin at industrial scales.

      In general methods are not carefully described, the section should be extended to improve the manuscript.

      Following the comment, we added the method descriptions

      1. Recombinant GH1D1 domain expression and purification in [EHEP and akuBGL preparation].

      2. Sections of [recomGH1D1 activity assay], and [N-terminal sequencing of akuBGL]

      3. More details of resolubiliztion of EHEP and activity in [Resolubilization of the EHEP–eckol precipitate].

      Reviewer #3 (Public Review):

      The manuscript by Sun et al. reveals several crystal structures that help underpin the offensivedefensive relationship between the sea slug Aplysia kurodai and algae. These centre on TNA (a algal glycosyl hydrolase inhibitor), EHEP (a slug protein that protects against TNA and like compounds) and BGL (a glycosyl hydrolase that helps digest algae). The hypotheses generated from the crystal structures herein are supported by biochemical assays.

      The crystal structures of apo and TNA-bound EHEP reveals the binding (and thus protection) mechanism. The authors then demonstrate that the precipitated EHEP-TNA complex can be resolubilised at an alkaline pH, potentially highlighting a mechanism for EHEP recycling in the A. kurodai midgut. The authors also present the crystal structures of akuBGL, a beta-glucosidase utilised by Aplysia kurodai to digest laminarin in algae into glucose. The structure revealed that akuBGL is composed of two GH1 domains, with only one GH1 domain having the necessary residue arrangement for catalytic activity, which was confirmed via hydrolytic activity assays. Docking was used to assess binding of the substrate laminaritetraose and the inhibitors TNA, eckol and phloroglucinol to akuBGL. The docking studies revealed that the inhibitors bound akuBGL at the glycone-binding suggesting a competitive inhibition mechanism. Overall, most of the claims made in this work are supported by the data presented.

      We thank you very much for reviewing our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • Fig. 3 should be moved to the Supplements because acetylation modification at the N-terminus is not essential for the function of EHEP.

      Following the recommendation, we moved Fig.3 to Supplements (Fig. S2).

      • EHEP2 is processed at 1.4 Å resolution, however, the statistics at highest resolution shell indicate you can process at higher resolution. Why 1.4 Å resolution?

      We tried to process this dataset at the higher resolution at 1.35 Å, and the completeness and I/sigma of the highest resolution shell reduced to 88.9% and 2.16, respectively. The parameter of I/sigma is OK, but the completeness reduced seriously. So, we set a cutoff of 1.4 Å.

      • Fig. S1A should be revised to include the gallic acid numbers (1, 2, 3, 4, 6) and the 3.0 σ map. >

      As presented in Fig. S1A, the omitted map (fo–fc map) of the ligand TNA, countered at 2.0 σ, showed that gallic acid 2 has poor density, and gallic acid 4 has weak density. Moreover, the TNA is relatively big to EHEP (7.5 %), and the omitted map countered 3.0 σ could not clearly show gallic acids. So, we keep the map at 2.0 σ in Fig. S3A.

      • The authors should provide more information on "co-cage-1 nucleant".

      Our lab is currently publishing a paper that provides detailed information on the co-cage-1 nucleant, including components, synthesis, nucleation mechanism, and application. Once the paper is published, we will cite it in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      • Is the word "offence" the appropriate word for referring to the activity of EHEP? Is this word used in the literature for this system? I find it confusing but might be because I am not in the specific topic.

      In the field of prey–predator, the defense–offensive is commonly used.<br /> According to Charles D. Amsler's book ″Algal Chemical ecology″, Herbivore offensive is the traits that allow herbivores to increase feeding rates on algae. Therefore, in our opinion, the offensive is appropriate.

      Taking into consideration that I am not an English language expert I find the writing of the manuscript could be improved in general. Here are some lines as examples of where the grammar could be better:

      Line 193: "decrement of the loop part"

      Following the comment, we corrected it to "decrease of the loop part" (Line 197).

      Line 199: there is a typographical error.

      We apologize for our mistake and corrected it to “EHEP” (Line 202).

      Line 205-206: "only hydrophobically interacted with"

      Following the comment, we modified it to "only interacted hydrophobically with EHEP" (Line 209)

      Line 224: "phlorotannin–precipitate activity"

      Following the comment, we modified it to “phlorotannin-precipitate activity” (Line 227).

      Line 232: "without the N-terminal 25 residues"

      Following the comment, we modified it to "lacked the N-terminal 25 residues" (Line 236).

      Line 353: "bound" should be "bind"

      We apologize for our mistake and modified it (Line 356).

      Line 359: "predator mammals"

      We apologize for our mistake and modified it to "predatory mammals" (Line 363).

      Line 363: "at an alkaline pH of insect midgut"

      Following the comment, we modified it to "at the alkaline pH of the insect midgut" (Line 367).

      Line 370: "nonstructural proteins" means "unstructured proteins"?

      Yes, unfolding proteins, we modified to "unfolding proteins with randomly coils" (Line 374).

      Line 374: "similar strategy with mammals"

      Following the comment, we modified it to "similar strategy to mammals" (Line 379).

      Line 403: "to forming"

      We apologize for our mistake and modified it to "to form" (Line 404).

      Line 404: "considered no binding"

      We apologize for our mistake and modified it to "considered not binding" (Line 405).

      Line 406: "activity pocket" means the active site?

      Yes, we modified it to "active site" (Line 407).

      Line 424: "step purification"

      Following the comment, we corrected it to "one step for purification" (Line 425).

      Line 431

      Following the comment, we corrected it to “To verify whether the chemical modifications which was indicated by previous study affects” (Line 432-433).

      Line 812: there is typographical error

      We apologize for our mistakes, and corrected it to Tris-HCl” for all “Tris–HCl (Line 878~).

      Line 223: eckol is not mentioned in the text and appears for the first time in the figure caption.

      Following the comment, we added “eckol” in the first section of the [Result] (Line 117).

      The paragraph between lines 271 and 280 is disconnected from the previous one and it is not about results, it should be at the discussion section.

      Following the comment, we moved them to the discussion part (Line 335-343).

      Line 324: "the three inhibitors inhibited": this claim should be corrected to "the three inhibitors interacted", since the word inhibited would imply the authors measured activity experimentally.

      We modified it as the comment. (Line 325).

      Line 392: "could not dissolve" is contradicting the result.

      We apologize for our mistake and corrected it to "could dissolve" (Line 394).

      They describe acetylation but they try overexpressing in E. coli, could it be that they needed to express the construct in a system where they would get the acetylation? At least this should be discussed in the text.

      Because our sample of EHEP with acetylation was purified from the natural source of the digestive fluid of A.kurodai, we only need to express EHEP without acetylation. Following the comment, we modified the descriptions to clarify it in the section (Lines 170-173 and 177-179).

      “Consistent with the molecular weight results obtained using MALDI–TOF MS, the apo structure2 (1.4 Å resolution) clearly showed that the cleaved N-terminus of Ala21 underwent acetylation, demonstrating that EHEP is acetylated in A. kurodai digestive fluid.”

      "To explore whether acetylation affects the protective effects of EHEP on akuBGL, we used the E. coli expression system to obtain the unmodified recomEHEP (A21–K229)."

      From the text it is not clear in which biological context the brown algae meet the attack by the hydrolase, the information is spread all over the manuscript, it should be clearly described at the introduction.

      When the brown algae are consumed as food by sea hare A. kurodai, they meet the attack by the hydrolase akuBGL. Following the comment, we clear the descriptions in the introduction part as below (Line 42-45).

      ″In brown algae Eisenia bicyclis, laminarin is a major storage carbohydrate, constituting 20%–30% of algae dry weight. The sea hare Aplysia kurodai, a marine gastropod, preferentially feeds on the E. bicyclis with its 110 and 210 kDa β-glucosidases (akuBGLs), hydrolyzing the laminarin and releasing large amounts of glucose.″

      Affinity ranking based on docking is not reliable, the differences in free energy are in the same order of magnitude. I would recommend erasing this claim since it is not fundamental to the study. Another option would be to determine affinities experimentally.

      We agree with the comment and removed the text about affinity ranking with docking scores.

      Figure 1: relative activity is not defined. HPLC data should be shown as supplementary material.

      Following the comment, we added the definition of relative activity and the HPLC data as Fig. S1 in the revised version.

      Figure 4: Sephacryl resin is mentioned here but not described in the methods.

      Following the comment, we added the description in the methods (Line 515).

      Protein N-terminal sequencing analysis should be described in the methods.

      Following the comment, we added the sequencing analysis in the methods (Line 476-483).

      Figure S1 C: it should be specified how the surface electrostatic potential at different pH was calculated.

      Following the comment, we added the descriptions of how the surface electrostatic potential at different pH was calculated in the figure legend of Fig. S2 of the revised version (Line 876-877).

      Since the authors are capable of producing good amounts of akuBGL and have already conducted glycosidase activity assays using ONPG, it would not be difficult for them to run some kinetics experiments for the enzyme in the presence of the different inhibitors to confirm their hypothesis derived from the docking calculations.

      As mentioned by the reviewer, kinetics experiments are the best way to confirm our hypothesis derived from docking calculations. However, the yield of akuBGL purification from the digestive fluid of sea hare A.kurodai is quite difficult. We could not obtain a sufficient sample of akuBGL to conduct the kinetic experiments. So, we stopped at docking simulation in this study. We added such limitations of ″Future kinetic experiments are required to validate quantitatively the competitive inhibition of phlorotannin against akuBGL″ (Line 359-360).

      Some citations are missing in the discussion section, for example in lines 362, 364 and 396.

      Following the comment, we added the citations.

      Reviewer #3 (Recommendations For The Authors):

      Please see comments/suggestions below for revisions.

      Line 176-178 - Text explains that recombEHEP precipitated after incubation with TNA to a comparable level to natural EHEP. However, figure 3B shows no comparison between recombinant and natural EHEP.

      As the reviewer suggested, we repeated the binding assay of recomEHEP to confirm the precipitation with TNA and added a precipitation result of natural EHEP (Fig. S2B right) for comparing.

      Line 223 - The work presented in Figure S1E goes partway towards demonstrating the activity of resolubilised EHEP. This claim would be strengthened if resolubilised EHEP was used in the akuBGL Galactoside hydrolytic activity assay and is then seen to rescue akuBGL activity in the presence of TNA.

      Yes, our claim would be strengthened by adding resolubilized EHEP to akuBGL assay in the presence of TNA. Since we have obtained and presented the relationship between the precipitating of EHEP with TNA and the rescuing akuBGL activity from TNA, we only used the precipitation to demonstrate the activity of resolubilized EHEP.

      Line 380-384 - Here it is discussed how TNA simultaneously binds to three EHEP molecules thus crosslinking them. It is then proposed that this could be the mechanism of precipitation. However, it is noted that TNA is soaked into crystals, therefore it is likely that this lattice exists whether TNA is present or not (this absolutely needs to be mentioned in the text). It would be possible to test this mechanism through mutagenesis. If the sites where TNA packs in between chains of EHEP were mutated to prevent crosslinking, it could then be determined whether crosslink-null EHEP can still precipitate TNA.

      As the review mentioned, we do not have enough experiments to propose that the TNA-crosslink may cause the EHEP-TNA precipitation. So, we deleted the discussion of the TNA crosslink and the corresponding figure.

      All docked models need to be deposited (perhaps modelarchive.org) and this resource referred to in the text.

      The structures in modelarchive.org site are either homology models or de novo. We think the docked model is out of this site. So, we did not deposit them.

      The x-ray data table contains data previously published in the referenced Acta cryst publication. What is eLife policy on this "double use" of data?

      We apologize for our mistake, and deleted the SAD data in Table 1.

      Minor points

      Line 26 - use "apo akuBGL" so as not to infer a tannic-acid bound form of this also >

      Following the comment, we modified it to “apo akuBGL” (Line 26).

      Line 48 - The sentence currently reads as A. kurodai is being digested.

      Following the comment, we modified it to “by A. kurodai” (Line 48).

      Line 49-50 & Line 65-66 - Both these lines make the same point about the impact of phlorotannin inhibition on the use of brown algae as feedstocks for biofuel, please remove one.

      Following the comment, we deleted the line 49-50.

      Line 115 - This needs attention as its an unusual opening sentence

      Following the comment, we modified it o “Phlorotannin, a type of tannin, is a chemical defense metabolite of brown algae.” (Line 114).

      Line 130 - Should the EHEP concentration be 3.96 µM not 3.36?

      We apologize for our mistake 3.36 is correct, and we corrected the X-axis label in Fig.1B.

      Line 133 - consider using "non-recombinant" rather than "natural"

      To distinguish between non-recombinant and recombinant samples, we used “EHEP” and “akuBGL” as purified from the native source and recomEHEP and recomakuBGL as the samples overexpressed from E. coli in this manuscript. So, we added the definition in [Introduction] (Line 100-101).

      Line 134 - "The residues A21-V227 of A21-K229..." This sentence could be written more clearly.

      Following the comment, we re-wrote it to “The residues A21–V227 in purified EHEP (1–20 aa were cleaved during maturation) were built” (Line 135-136).

      Line 136 - switch "appropriately visualized" for "tracable"?

      Following the comment, we modified it to “built” (Line 136).

      Line 158 - use "70% of backbone in a loop conformation" >

      We modified as the comment (Line 159-160).

      Line 184 - reword "map showed an electron density blob". (Map showed positive electron density)

      Following the comment, we modified it to “map showed the electron density” (Line 188).

      Line 193-194 - Is EHEP really more stable when bound to TNA? It is not shown experimentally? It is difficult to see which loop changes. Is the difference a result of crystal packing? Please switch "decrement" for another term

      The regions with conformation change between EHEP and EHEP–TNA are close to TNA but not at the intermolecular interface. As the reviewer mentioned, we could not clarify the EHEP stability depended on TNA-binding, and deleted the descriptions in the second paragraph of [TNA binding to EHEP].

      Following the comment, we redraw Fig. S1B (Fig. S3B in the revised version) to show the conformation changes clearly. We also modified "decrement" to "decrease" (Line 197).

      Fig S1B - Can an extra figure be added to show the secondary differences more clearly? >

      We redraw this figure (Fig. S3B) using closeup view to show the differences.

      Line 212-213 - There is a slight discrepancy between the text and Figure 4B. Gallic acid 4 interacts with P201 and gallic acid 6 interacts with P77.

      We apologize for our mistake in the text. and corrected it to “gallic acid4 and 6 showed alkyl–π interaction with P201 and P77, respectively” (Line 216).

      Figure 4D - Change x axis from tube number to elution volume. Both chromatograms could also be superimposed for interpretability.

      Since we used raw data from the experiment, we kept the x-axis in tube number with additional “2.7 ml/tube” information (Fig.3D).

      Line 229 - Please change "there was no blob of TNA in the electron density" to there was no electron density for TNA or something similar.

      Following comment, we modified it to “there was no electron density of TNA or something similar in the 2Fo–Fc and Fo–Fc map” (Line 232).

      Line 231 - asymmetric unit is a more standard term (also in Fig S2 legend)

      We modified as the comment (Line 235 and 885).

      Line 234-235 - Reword "the residues L26-P978 of L26-N994" to make it more concise. >

      Following the comment, we deleted “of L26-N994” (Line 239).

      Lines 296-299 could be written more carefully - pi stacking with what? >

      We apologize for our mistake and corrected it to CH–𝜋 (Line 293).

      Line 349 - which putatively enables it to......

      We modified it as the commend (Line 353 in the revised manuscript).

      Line 370 - "nonstructural" is the wrong term because they remain structured - use something akin to non-classical secondary structure

      Following the comment, we modified it to“are unfolding proteins with randomly coils in solution " (Line 374)

      Throughout - use phenix autobuild, not autobuil

      We apologize for our mistakes and corrected them throughout the manuscript.

      Figure 1 - the graphs would be more interpretable with all data points shown overlaid

      The two graphs in Figure 1 showed two experiments with different reaction conditions. Figure 1A presents various TNA concentrations, while Figure 1B maintains a constant concentration of 40 μM for TNA with varying EHEP concentrations. So, overlaying the graphs is not feasible. Therefore, we would like to keep them separated and added the reaction condition in figure legend.

      Figure 4 - in part D add an extra statement outlining what the S-100 analysis demonstrated

      S-100 analysis is using a gel filtration column with Sephacryl S-100 media. We added an extra statement in the method and the legend (Fig. 3, Lines 515 and 879).

      Figure 5 (and elsewhere) - the structures referred to need a PDB code and reference given in legend

      Following the comment, we checked the manuscript carefully and added PDB code to the referred structures.

      Fig S1 - please add an additional panel showing part D but in proper structure form, not schematic shapes

      Since we do not have enough experiments to validate the TNA-crosslink, we deleted the discussion of the TNA crosslink and Fig. S1D.

      Figure sig 4 - Text contains in depth information of side chain hydrogen bonding and π-π interactions between akuBGL and laminarittrose. However, the figure only shows a surface model. Consider adding a figure showing these interactions.

      Following the suggestion, we added a closeup view to show these detailed interactions (Fig. S6B).

    1. Author Response

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      DeKraker et al. propose a new method for hippocampal registration using a novel surface-based approach that preserves the topology of the curvature of the hippocampus and boundaries of hippocampal subfields. The surface-based registration method proved to be more precise and resulted in better alignment compared to traditional volumetric-based registration. Moreover, the authors demonstrated that this method can be performed across image modalities by testing the method with seven different histological samples. This work has the potential to be a powerful new registration technique that can enable precise hippocampal registration and alignment across subjects, datasets, and image modalities.

      We thank the Reviewer, and feel this is an accurate summary of our work.

      Reviewer #3 (Public Review):

      Summary:

      In the current manuscript, Dekraker and colleagues have demonstrated the ability to align hippocampal subfield parcellations across disparate 3D histology samples that differ in contrast, resolution, and processing/staining methods. In doing so, they validated the previously generated Big-Brain atlas by comparing across seven different ground-truth subfield definitions. This is an impressive effort that provides important groundwork for future in vivo multi-atlas methods.

      Strengths:

      DeKraker and colleagues have provided novel evidence for the tremendously complicated curvature/gyrification of the hippocampus. This work underscores the challenge that this complicated anatomy presents in our ability to co-register other types of hippocampal data (e.g. MRI data) to appropriately align and study a structure in which the curvature varies considerably across individuals.

      This paper is also important in that it highlights the utility of using post-mortem histological datasets, where ground truth histology is available, to inform our rigorous study of the in vivo brain.

      This work may encourage readers to consider the limitations of the current methods that they currently use to co-register and normalize their MRI data and to question whether these methods are adequate for the examination of subfield activity, microstructure, or perfusion in the hippocampal head, for example. Thus the implications of this work could have a broad impact on the study of hippocampal subfield function in humans.

      Weaknesses:

      As the authors are well aware, hippocampal subfield definitions vary considerably across laboratories. For example, some neuroanatomists (Ding, Palomero-Gallagher, Augustinack) recognize that the prosubiculum is a distinct region from subiculum and CA1 but others (e.g. Insausti, Duvernoy) do not include this as a distinct subregion. Readers should be aware that there is no universal consensus about the definition of certain subfields and that there is still disagreement about some of the boundaries even among the agreed upon regions.

      We thank the Reviewer, and feel this is an accurate summary of our work that also provides useful scientific context.

      Reviewer #2 (Recommendations For The Authors):

      The authors have done a great job with the revisions and have addressed all my concerns. They have clarified aspects of the method and procedure and have included a helpful walk-through explanation of an example subject. The authors have also expanded the discussion and addressed the motivation and justification for certain steps of the procedure.

      We thank the Reviewer.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed my previous comments and I believe the impact and take home message of the paper is more clear.

      We thank the Reviewer.

      In Figure 1, is the proximal-distal label reversed for panel B? I think P (proximal) should be closer to CA4/DG and D (distal) should be closer to subiculum. Am I misreading the graph?

      We thank the Reviewer for this consideration, but the label is as intended. The terms proximal/distal in the hippocampal literature are sometimes relative to the dentate gyrus and sometimes relative to the rest of the cortex. In our case, we use the terms relative to the neocortex, following Ding and Van Hoesen (2015). We have now added the following to clarify this point at the first use of these terms (p.5):

      “The current work, however, defined this tessellation as a regular mesh grid in unfolded space consisting of 256×128 points across the anterior-posterior (A-P) and proximal-distal (P-D) (relative to the neocortex) axes of the unfolded hippocampus, respectively.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful assessment of our work and their valuable critiques which we will address in the “Recommendations for the authors” section below. In particular, we appreciate Reviewer #3 noting the value of the C. elegans model system and our efforts to bridge models with our study. We agree with the reviewer that there is a need to clarify the rationale, presentation and interpretation of our results. We have substantially revised the text in our manuscript and Figure legend to address this issue, and provided extensive new commentary and citations to lay out the logic behind our experiments. Indeed, it was our oversight not being more thorough about this initially. We have further adjusted our conclusions to be less unequivocal. Finally, we added an RPM-1 signaling diagram (Fig. 8A) to more clearly annotate the players in the RPM-1/MYCBP2 signaling network that were evaluated genetically in Fig. 8. Importantly, we provide clearer commentary on how genetic enhancer effects with known RPM-1 binding proteins and the absence of genetic suppression in vab-1/Eph receptor double mutants with components of the RPM-1/FSN-1 ubiquitin ligase complex are consistent with the biochemical finding that MYCBP2 stabilizes but does not degrade EphB2. Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      Following extensive discussions between the three reviewers, all three agree that the C. elegans data, as presented, does not add to, and in fact might harm, your bottom line. Our combined suggestion is to take this data out unless you plan to improve it substantially. All reviewers are perplexed by Figure 2F and the presumed interactions of cytosolic proteins with the extracellular domain of EPHB2. At the very least, please provide some suggestions/model/interpretation.

      We have adjusted our manuscript substantially to address this. Please see detailed comments in the individual Reviewer sections below.

      We would like to thank the reviewers for their thorough examination of our manuscript, constructive criticisms, and helpful suggestions.

      Reviewer #1 (Recommendations For The Authors):

      The work is extensive in my view, and mostly of high quality. See minor comments on some of the figures below.

      Thank you very much.

      Two more major comments :

      • I don't think the C. elegans work adds to - in fact I think it hurts - the statement that this regulatory mechanism is specific to EphB2. I would advise the authors to take it out.

      We agree that C. elegans has a sole Eph receptor called VAB-1 and is therefore not a specific model for EPH2B. However, testing MYCBP2 specificity for EPHB2 was not the goal or our perceived value for the C. elegans experiments. We now clarify this in the text of the Results section.

      Rather, we are providing evidence that the C. elegans ephrin receptor interacts genetically with known MYCBP2/RPM-1 binding proteins. Moreover, we now provide an extensive array of citations to note that genetic enhancer interactions between different RPM-1/MYCBP2 binding proteins is well established. The reviewer has nicely highlighted for us that we handled the C. elegans genetics in too cursory a fashion in our original manuscript. We appreciate this being noted and have now aimed to make this substantially clearer. We hope the reviewer agrees that our revised C. elegans section accomplishes this goal.

      Furthermore, we extensively revised the text of the Results to emphasize a key point: our observation that axon termination defects are not suppressed in vab-1; fsn-1 and vab-1; rpm-1 double mutants excludes the possibility that the VAB-1 Eph receptor is a substrate that is inhibited or degraded by the RPM-1/FSN-1 ubiquitin ligase complex. If the VAB-1 Eph receptor were ubiquitinated and degraded by the RPM-1/FSN-1 complex, we would have observed a suppression of phenotype in vab-1; rpm-1 double mutants. The precedent for this genetic relationship between the RPM-1 ubiquitin ligase and its substrates that are degraded has been established by several prior studies (PMID: 15707898; PMID: 31676756; PMID: 35421092). We now more clearly note that the absence of genetic suppression in vab-1; rpm-1 double mutants and vab-1; fsn-1 double mutants is consistent with the non-canonical stabilizing role of MYCBP2 on EPHB2 that was observed in our biochemical experiments with mammalian cells.

      We also adjusted the text of the manuscript to stress that we are testing genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This is a key point, as genetic enhancer interactions are consistent with the Eph receptor functioning in the RPM-1 signaling network. This concept has been well established for RPM-1 binding proteins as now noted in our revised text with an extensive number of additional citations to published work.

      Based on the above arguments, we respectfully disagree with the reviewer that our C. elegans data should be removed from the paper. To re-iterate, we are not trying to evaluate specificity for MYCBP2 and EPHB2 in C. elegans. Rather, our goals are twofold: 1) To ask whether there is an evolutionarily conserved functional genetic link between Eph receptors and known RPM-1 binding proteins. 2) To provide further in vivo genetic evidence invalidating the hypothesis that Ephrin receptors could be ubiquitination substrates that are inhibited/degraded by MYCBP2.

      Text edits reflecting these points are in the abstract, the C. elegans results section starting on line 411, and the discussion on lines 499, 502-504 and 541.

      • The cellular responses are not robust and the effects of MYCBP2 KO - although significant - are minor in most cases. But I don't think more experiments will help here.

      We interpret the comment about the robustness to mean that the extent to which a given cellular response is affected by the loss of MYCBP2 is minor. First, the cellular responses themselves are typical of previous studies and depend on the cellular biology underlying them. For example, a growth collapse of ~50-60% over a background of 10% (Fig. 7) is typical for these sorts of assays (PMID: 37369692; PMID: 33972524; PMID: 17785182). A decrease of cell area by ~25% (Fig. 3) is quite substantial if one considers how much of a cell’s volume is taken up by the nucleus and organelles. Second, the phenotypes elicited by the loss of MYCBP2 are likely brought on by a decrease in EphB2 protein levels, but not its complete absence, as suggested by our biochemical experiment. Given that EphB2 complete loss only affects the cellular responses to a limited extent, the minor effects are not a surprise (e.g. for GC collapse: PMID: 23143520). Nevertheless, the subtle changes in cellular phenotypes, elicited by EPHB2 signaling are often sufficient to achieve proper cell positioning and cell response to guidance cues. For instance, regulation of the growth cone collapse of the outgrowing axons requires delicate changes that are dynamic and temporal.

      Minor:

      Fig 1C - EPHA3 and EPHB2 seem to run in different sizes, is this the case? In 2A they run at the same size.

      We believe this size discrepancy is due to different percentages of SDS-PAGE gels used to resolve proteins. In Fig. 1C, we used a 6% gel for a Western blot analysis of both EPHA3/-B2-FLAG (~130 kDa) and MYCBP2 (~510 kDa). In Fig. 2A however, we performed Western blot analysis using 10% resolving gel to separate and detect EPHA3/-B2-FLAG along with MYC-FBXO45 (~30 kDa). We have reviewed the results obtained from additional biological replicates of this experiment, and observed a similar pattern in gel migration of EPHA3/-B2-FLAG across all replicates.

      Fig1F - I can't trust the MYCBP2 blot.

      Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the results replace the previous Fig. 1F panel as mentioned on line 158.

      In Fig2b the authors claim that there is enhancement in the binding of MYCBP2 and EPHB2 upon FBXO45 expression. For this type of statement quantification is required.

      The quantification is now included in Fig. 2C and its significance is mentioned on line 180. Our conclusion about the enhancement stands.

      Fig2G - it remained unclear to me where the binding site to MYCBP2 is, how long is the cytoplasmic tail in the DeltaICD protein?

      Based on our experimental observations from Fig. 2E-H, we concluded that the fragment encompassing the extracellular domain(s) and/or transmembrane (TM) domain of EPHB2 is necessary for the protein complex formation with MYCBP2. We would like to accentuate that the EPHB2-MYCBP2 interaction might not be direct, and might involve other transmembrane protein(s) acting as a scaffold for EPHB2 and MYCBP2 binding. We did not pursue experiments to determine the exact region of the extracellular-TM portion of EPHB2 that is required for the interaction with MYCBP2.

      The cytoplasmic tail in ΔICD protein consists of 25 aa of the N-terminal fragment of EPHB2 juxtamembrane (JM) region, which is adjacent to the TM helix, and followed by the 8 aa FLAG tag (EPHB2 ΔICD domain composition: extracellular domains – TM domain – 25 aa fragment of JM region – FLAG). We have determined the TM and JM sequences based on Hedger et al. (PMID: 25779975) and included the N-terminal portion of the JM region to facilitate proper ΔICD protein localization within the plasma membrane (PMID: 35793621). We modified the schematic in Fig. 2G to better visualise the EPHB2 truncations and now provide information on their size in the figure legend.

      Always good to have a model of how all these proteins work together.

      While we acknowledge that this would be helpful, we do not have a clear answer on how the EPHB2-MYCBP2 complex formation occurs. This requires further elucidation of the putative proteins involved in this ternary complex or testing the possibility that a MYCBP2 fragment is extruded extracellularly. Without these experiments there are too many possibilities to summarise into a clear model figure. We thus did not make any edits regarding these possibilities in the section starting on line 195.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the experiments are classical experiments of co-immunoprecipitations, swapping experiments, collapse assays, and stripe assays which all are well carried out and are convincing.

      Thank you for your encouraging comments.

      Controls for the stripe assay may include Fc / Fc stripe assays.

      We have performed these control experiments and now include their quantifications in the results sectioning concerning Fig. 3, starting on line 249, and those concerning Fig. 6 on line 381.

      It is not clear to me why SD and not SEM has been used here for presentations.

      Standard deviation (SD) measures the dispersion of a dataset relative to its mean. The standard error of the mean (SEM) measures how much discrepancy is likely in a sample’s mean compared with the population mean. Thus, SEM includes a statistical inference about the sampling distribution while SD is a less “processed” measurement that by definition is larger than SEM. SEM might make the data look less dispersed and many journals encourage the use of SD in bar graphs (PMID: 16223828).

      Fig 7A: it is rather difficult to see 'branches' in Fig. 7A, better pictures and close-ups should be provided. How are branches defined? This piece of work needs more attention.

      To remedy this shortcoming, we now provide inverted images with GFP signal in dark pixels overlaid on Fc (white) / eB2 (pink) stripes next to the original images.

      Reviewer #3 (Recommendations For The Authors):

      1) My most important suggestion to the authors would be to more carefully describe the results and their interpretation of the results. Sometimes, the distinction is not clear.

      We modified the text throughout the manuscript to address this.

      2) There are several cases, when the authors report on trends that are not statistically significant (1D, for example), or report no change, when it is clear that the addition of one more sample could have dramatically made a difference (4M - see point 12).

      We agree that some of the nonsignificant differences could become significant if we added more Ns. But we prefer not to move our experimental design towards N-chasing and p-hacking (PMID: 25768323). The number of biological replicates is normally pre-determined before the onset of the experiment. Of course, some replicates can be discarded if there is a valid reason, such as a technical issue with the experiment or a positive control not working but this is not relevant for the dataset we have provided.

      3) Data in 1F is very difficult to interpret.

      As in response to Reviewer #1: Indeed, the MYCBP2-EPHB2 co-IP with endogenous proteins was not convincing. We now repeated this experiment using rat cortical neurons, and the improved results are in revised Fig. 1F.

      4) Figure 2 puts Figure 1 in a strange perspective. If I understand correctly, fig 2 claims that EPHB2 interaction with MYCBP2 depends on FBXO45 - if that is the case then how does the binding in Figure 1 occur?

      Indeed, we propose that the EPHB2-MYCBP2 interaction depends on FBXO45. In Fig. 2, we reveal that FBXO45 enhances the formation of the EPHB2-MYCBP2 complex. Thus, we suspect that the endogenous FBXO45 present in HeLa cells and neurons would mediate the interaction between EPHB2 and MYCBP2 in Fig. 1 experiments. We were unable to show this by Western blotting due to lack of reliable commercial antibodies against FBXO45, the complex containing endogenous FBXO45 and EPHB2 is also implied by our AP-MS data (Fig. 1B) and published databases.

      5) I am still trying to wrap my mind around the results in 2G-H. So do MYCBP2 and FBXO45 bind the extracellular domain of EPHBP2? What does that mean?

      (see also our response to Reviewer #1, end of their section) Based on our experimental observations from Fig. 2G-H, we conclude that the fragment encompassing the extracellular domain(s) and/or transmembrane domain of EPHB2 is necessary for the protein complex formation with MYCBP2 and FBXO45. Although there is a possibility that MYCBP2 directly binds the extracellular portion of EPHB2, we have not formally tested this hypothesis. MYCBP2 has been previously shown to interact with the extracellular portion of transmembrane N-cadherin (CDH2) via BioID proximity labeling and AP-MS proteomics approaches (PMID: 32341084).

      Considering the results in Fig. 2A-B, we suspect that EPHB2-MYCBP2 interaction is indirect, as FBXO45 enhances this association. Secretion of FBXO45 and direct binding of FBXO45 to the extracellular cadherin (EC1-2) domains of N-cadherin has been documented (PMID: 25143387; PMID: 32341084). Although, not tested, this is also a possibility for EPHB2-FBXO45 mode of interaction. Nevertheless, we also cannot rule out the possibility that an unknown transmembrane protein binds EPHB2 extracellularly and the same unknown protein binds MYCBP2/FBXO45 intracellularly. Resolving this model is beyond the scope of this study and will require us to pursue extensive new lines of investigation.

      6) I don't understand the stable Hela cell line CRISPR - is this a stable MYCBP2 deletion? In which case why is there only a reduction, not complete elimination of the protein? Or, is this a stable integration of a plasmid generating gRNA against MYCBP2? In which case, I would expect a homozygous null to emerge at some point. In any case, this is not well explained.

      These lines are not derived from single cells infected with the CRISPR sgRNA-carrying viruses, therefore they are not clonal and probably contain some cells that express normal levels of MYCBP2, hence its detection on a Western. This is now clarified starting on line 221 and on line 608.

      7) In 3C - is this the right statistical analysis?? I would say you want to claim the different effect of the control +/- eB2 compared to the effect in the mutant +/- eB2. Still should be significant but I think a more correct analysis.

      We now include this comparison in Fig. 3C as well in the results section starting on line 234.

      8) The robustness of the assay in Figure 3D is underwhelming – how was the area measured?

      This is a live imaging experiment. Fig. 3D plots cell area at 60 minutes after ephrin-B2 addition as a fraction of the same cell’s area at 0 minutes (ephrin-B2 addition). For control cells that is a decrease of ~25%. If one considers that a cell’s nucleus and organelles like the Golgi Apparatus take up most of its volume, the magnitude is not that surprising.

      9) Figure 3F – did you try to plot the relative area of overlap divided by the total cellular area? You might get a more striking phenotype. Also – claiming that this confirms that MYCBP2 is REQUIRED for EPHB2 function is a bit overstated, especially given that we don’t know (do you?) the EPHB2 mutant phenotype in this assay.

      We preferred to stay with the original method of image quantification which we use for other assays. With respect to the requirement of MYCBP2 for EPHB2 function in the stripe assay, our logic is rooted in the observation that native HeLa cells do not respond to ephrin-B2 stripes (45.46 ± 7.62% of cells on eB2 stripes v. Fc; data not shown). When they are transfected with EPHB2 expression plasmids they do, therefore we assume that EPHB2 expression endows them with a sensitivity to eB2 stripes. A loss of MYCBP2 attenuates this sensitivity. We clarified this starting on line 246 and on line 251.

      10) I didn't quite get the difference between 4A and 4B.

      We apologize for the confusion. In Fig 4A, we used a stable HeLa cell line that has tetracycline-inducible expression of EPHB2-FLAG. Using these cells, we subsequently generated CTRLCRISPR or MYCBP2CRISPR cells. In these cells we then induced EPHB2 expression with tetracycline and observed that deletion of MYCBP2 resulted in the reduction of EPHB2 protein levels. To confirm this observation and to rule out the possibility that EPHB2 protein reduction is an effect of the CRISPR lines generation, we tested whereas MYCBP2 deletion reduces EPHB2, which has been transiently overexpressed (Fig. 4B). We hence conclude that loss of MYCBP2 decreases EPHB2 that was either expressed from a stable locus (Fig. 4A) or from transient transfection (Fig. 4B). We modified the Results section starting on line 262 to make this point clear.

      11) The entire link to lysosomal degradation should be strengthened. Perhaps I am confused, but if the reduced EPHB2 levels in MYCBP2 mutant cells result from impaired lysosomal degradation then inhibiting the lys-deg should bring the protein levels back to normal (i.e. CRISPR control) - no? As currently presented, I do not understand nor do I think the claim is strongly supported by the data.

      Before treatment with inhibitors, EPHB2 levels in MYCBP2CRISPR cells are already 40% lower than they are in CTRLCRISPR cells and in all our attempts, inhibitors can only rescue/restore EPHB2 in MYCBP2CRISPR cells to a level that is lower than in CTRLCRISPR cells. But this restoration is greater in MYCBP2CRISPR than in MYCBP2CTRL cells (BafA1: 19% increase in CTRL cells and 40% in MYCBP2CRISPR cells; CoQ: 10% comparing to 35%). This indicates that EPHB2 degradation through the lysosomal pathway in MYCBP2CRISPR cells is stronger, explaining why EPHB2 degradation is promoted in MYCBP2CRISPR cells, compatible with reduced EPHB2 levels and enhanced EPHB2 ubiquitination.

      12) 4M, O - reporting ns based on these data seems a bit strange to me... Add one point and it will be strongly significant.

      See our response to point (2), above. We prefer not to invoke potential p-hacking.

      13) 7d - so what are you claiming? That the cellular response to eB1 but not eB2 is affected by the addition of FBD1? this is almost the opposite of what you wrote in the text...

      We treated the cells with two different ephrin-B ligands to make a stronger conclusion. When using ephrin-B1, growth cone collapse in FBD1 WT is not significant comparing to Fc treatment. When using ephrin-B2, growth cone collapse in FBD1 WT is not as significant as it is in FBD1 mut group (* versus ). We interpret this as meaning that the EPHB2-mediated growth cone collapse to both ligands is dampened, when we disrupt the EPHB2-MYCBP2 association. The difference between these two ligands might be due to their different affinities for the receptor or signalling kinetics.

      14) By far the weakest link in this paper is the worm part. I think it's a pity because strengthening this would affect the significance of the finding. First, the authors mention new genes without introducing their relationship to the signaling pathway tested. Second, the textual logics should be strengthened. Finally and most importantly, when the difference between the phenotypic severity is so strong (vab-1 and rpm-1) then I think it's impossible to say anything from the double mutant.

      We appreciate the reviewer noting that they appreciate the value and importance of the C. elegans model. The goals of our C. elegans experiments were twofold:

      1) To evaluate genetic interactions between the VAB-1 Eph receptor and known RPM-1 binding proteins. This was not clearly explained in the original manuscript nor was the published precedent for these types of genetic enhancer experiments provided. We have now rectified this by substantially revising the text of the Results C. elegans section starting on line 431 and by adding several citations.

      2) Our C. elegans genetics confirmed that the VAB-1 Eph receptor is not inhibited/degraded by the RPM-1/MYCBP2 ubiquitin ligase complex. We have now revised the text to draw this point out more clearly.

      To further address the reviewer’s concerns, we have added a new schematic (Fig. 8A) to show the relationship between the RPM-1 and the RPM-1 binding proteins (FSN-1/FBXO45 and GLO-4/SERGEF) we are testing. We chose FSN-1 because it is part of the RPM-1 ubiquitin ligase complex and we chose GLO-4 because it functions outside the context of RPM-1 ubiquitin ligase signaling via the GLO-1 Rab GTPase to influence late endosomal/lysosomal biogenesis.

      Regarding the reviewer’s concern that different penetrance/frequency of defects between rpm-1 mutants and vab-1 mutants means outcomes with vab-1; rpm-1 double mutants cannot be interpreted. We respectfully disagree. An extensive number of published studies have demonstrated that RPM-1 binding proteins have milder phenotypes than rpm-1 mutants and display genetic enhancer effects as double mutants with one another (PMID:17698012, PMID: 22357847, PMID: 25010424, PMID: 24810406). We now make this point much more clearly. While the frequency of axon termination defects in rpm-1 mutants is high it is not completely saturated as the defect is not 100%. Moreover, a major point of the vab-1; rpm-1 double mutants is that they do not have a significant reduction in phenotypic penetrance/frequency. Thus, our system is fully capable of resolving genetic suppression, which did not occur. We now make this point much more carefully and clearly.

      To further address the reviewer’s concern, we have softened language about the VAB-1/Eph receptor functioning in the same pathway as RPM-1 throughout the manuscript. While we think this is still the case, because the frequency of axon termination defects is not fully saturated in rpm-1 mutants and defects could potentially become more severe (i.e. the hook might occur closer to the head of the animal rather than in the midbody). Nonetheless, this is not a critical point and we think it is more important to be clear about the two major goals and objectives of our C. elegans experiments. We hope the reviewer agrees that our rationale, logic and conclusions are more clearly and accurately drawn in the revised paper.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the main conclusions are well-evidenced, this paper would be further improved if the following concerns can be properly addressed.

      1) The key data to demonstrate the role of condensin in telomere disjunction is reduced telomere foci in cut14 mutants at the restrictive temperature (Fig 2A). However, this could be due to defected telomere declustering or failed separation of sister telomeres since authors suggested that condensin functions in both processes. To distinguish these, authors can directly measure the separation of sister telomeres using FISH or TETO-labelled telomeres.

      We now provide strong evidence for the role of condensin in telomere disjunction by simultaneously visualizing the behavior of centromeres 3L (imr3-tdTomato), Gar1-CFP (nucleolus), and telomeres 1L (Tel1-GFP) during mitotic progression (Figure S2B). As previously reported (Tada et al. 2011), we visualized the centromere of chromosome 3 by simultaneously inserting tetO repeats into the imr3 region (1093757-1094520 and 1094521-1095451 of chromosome 3) and expressing td-tomato fused to tetR. The left arm of telomere 1 was visualized by inserting lacO repeats into this telomeric region (9282-9805 and 9806-10254 of chromosome 1) and expressing green fluorescent protein (GFP) fused to LacI. With these additional data, we confirm that a cut14-208 mutant grown at non-permissive temperature exhibits a striking defect in the disjunction of Tel1L.

      Note, however, that such an experimental approach is not without risk, as it has been reported that LacO repeats tightly bound by LacI proteins form a barrier to the recoiling activity of condensin (PMID: 31204167). This is discussed further below in our response to point 2).

      2) To prove the defective telomere disjunction in condensin mutant is not due to failed transmission of pulling force from centromeres, the authors showed that Top2 inactivation has no effect on telomere disjunction (Fig 2E). However, this result contradicts a previous study in budding yeast (MBC, 2002, 13:632-645). This needs careful discussion. Moreover, it is puzzling why Top2 inactivation would not cause defective decatenation of telomeres.

      We thank the reviewer for bringing this apparent discrepancy to our attention. A likely explanation is that we monitored telomere separation using the shelterin protein Taz1 tagged with GFP, whereas in the study mentioned by the reviewer, the authors used LacO arrays inserted in the vicinity of TELV and bound by LacI-GFP. It has been shown in budding yeast that such a construct constitutes a barrier for the recoiling activity of condensin in anaphase (PMID: 31204167). Thus, this insertion of LacO/LacI arrays at TELV most likely created an experimental condition in which condensin activity at TELV was reduced, thereby revealing the otherwise dispensable contribution of Topo II. This is now mentioned in the Discussion section as follows:

      Our results do not rule out the possibility that Topo II contributes to telomeres disentanglements, but nevertheless imply that Topo II catalytic activity is dispensable for telomere separation provided that condensin is active. The close proximity of DNA ends could explain Topo’s dispensability. It has been reported in budding yeast that the segregation of LacO repeats inserted in the vicinity of TelV is impaired by the top2-4 mutation (Bhalla et al. 2002). At first sight, this appears at odds with our observations made using the telomere protein Taz1 tagged with GFP. However, since LacO arrays tightly bound by LacI proteins constitute a barrier for the recoiling activity of condensin in anaphase (Guérin et al. 2019), the insertion of such a construct might have created an experimental condition in which condensin activity was specifically impaired at TELV, hence revealing the contribution of Topo II.

      In addition, we would like to point out that the telomere structure in budding yeast and fission yeast is significantly different. Budding yeast protects its telomeres via two independent factors, Rap1 and the Cdc13-Stn1-Ten1 complex, whereas in fission yeast Taz1 and Pot1 are bridged by a complex protein interaction network (Rap1-Poz1-Tpz1). This is a remarkable conserved structural feature between the shelterin of S. pombe and the human shelterin. Recently the group of M. Lei showed that some of the telomeric components of S. pombe can dimerize leading to a higher complex organization of the shelterin (Sun et al., 2022). It is likely that dimerization of Taz1, Poz1, and the Tpz1-Ccq1 subcomplex may also contribute to the clustering of sister and non-sister chromatid telomeres. The architectural differences in telomere organization between budding and fission yeast may require different mechanisms to properly segregate telomeres during mitosis.

      3) The authors claimed that the reduced telomere disjunction in condensin mutants is because compromising condensin function defects the resolution of cohesin-mediated cohesion of sister telomere. The evidence is that cohesin's inactivation remedied telomere disjunction defect in condensin mutants (Fig 6A). However, there could be an alternative explanation: abnormal telomere structure caused by defective condensin might lead to the entanglement of sister telomeres, which requires telomere cohesion. If cohesin is inactivated before the G2 phase, which is the likely case in this experiment, the entanglement would not happen. To distinguish these, the experiment in Fig 6 can be repeated using G2-synchronised cells.

      The hypothesis raised by the reviewer is certainly relevant. To test this possibility, we purified cut3-477 and cut3-477 rad21-K1 mutant cells in early G2 using a lactose gradient. After cell selection of the two mutants grown at permissive temperature, the entire cell population was in G2 (0% of cells in mitosis or cytokinesis). After releasing the cells to the non-permissive temperature of 36°C, we measured the number of telomeric foci as a function of spindle size as the cells entered the first mitosis. The results presented in Figure S6 confirm that cohesin inactivation in G2 cells is able to complement the telomere disjunction defects of a condensin mutant.

      4) The authors further revealed that compromising condensin function leads to overaccumulation of cohesin at the telomere (Fig 6B). Then they proposed that condensin counteracts cohesin at telomeres. However, the over-accumulated telomeric cohesin was observed at the G2 phase (t=0 min, Fig 6B) in the condensin mutant. At this stage, cells were grown at the permission temperature, and condensin activity is expected to largely remain (Fig 2A). The subsequent inactivation of condensin didn't further increase the telomeric association of cohesin (t=30 min, Fig 6B). Moreover, condensin does not bind telomeres at G2 phase (1B). It is difficult to reconcile the scenario that condensin would inhibit cohesin telomere association even though condensin is absent.

      We suspect that there was a misunderstanding because T=0 min in Figure 6B corresponds to cells arrested in G2 and shifted to 36°C while still arrested, as mentioned in the original text "Cells were arrested at the G2/M transition, shifted to the restrictive temperature and released into a synchronous mitosis (Figure 6B)".

      However, this experimental setup has been made clearer in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Further analysis of the telomere segregation foci data could provide additional support for the claim that condensin promotes the uncoupling of telomeres (vs telomere disjunction), in addition to the hiC data presented in Fig 3. The observation that many data points in Figure 2 have less than six foci ( often 2-4) suggests that this data not only shows a defect in disjunction but also in telomere uncoupling. If somehow the two defects could be unpicked in the dataset that would be beneficial to their argument.

      We agree with the reviewer that our data show not only a defect in disjunction but also in telomere uncoupling (confirmed with HiC). We now provide new microscopy data showing the role of condensin in telomere disjunction (as opposed to uncoupling) by simultaneously visualizing the behavior of the centromere 3 (imr3-tdTomato), nucleolus (Gar1-CFP) , and telomere 1L (Tel1-GFP) during mitotic progression (Figure S2B). We confirm that the cut14208 mutant grown at non-permissive temperature has a striking defect in telomere disjunction as opposed to centromere disjunction.

      Reviewer #3 (Recommendations For The Authors):

      The experiments are robust, and the results are well described. However, it should be explicitly stated that the main finding that condensin is needed for chromosome end disjunction could have been anticipated from previous studies (as outlined below). Its novelty does not need to be overstated.

      1) Reyes et al. (2015) previously demonstrated that sister telomere disjunction requires the Aurora B kinase. They also showed that a phosphomimic condensin allele reinstates sister telomere disjunction in cells lacking Aurora B, indicating that condensin is likely the target activated by Aurora B and the primary driver of sister telomere disjunction.

      2) Berthezene et al. (2020) previously revealed the requirement of condensin for sister telomere disjunction during the first meiotic division (Meiosis I).

      3) The Tanaka group described in 2010 the role of condensin in promoting sister chromatid separation by antagonizing residual cohesin during anaphase (DOI 10.1016/j.devcel.2010.07.013). This study should be cited and discussed.

      The novelty of our study resides in the fact that we now provide evidence that condensin contributes to TEL separation in cis, and not through the recoiling of chromosome arms, which had not been previously addressed in our previous manuscripts (Reyes et al. 2015, Berthezene et al. 2020).

      We have now added and discussed the reference from Tanaka's group.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This paper provides valuable information regarding visuospatial working memory performance in patients with MS compared to healthy controls, using a relatively novel continuous measure of visual working memory. There are some weaknesses with the way the clinical groups were matched, but those limitations are acknowledged and the strength of evidence for the claims is otherwise convincing. The paper will be of interest to those working in the field of clinical neuroscience.

      We are grateful to the editors and reviewers for their careful review of our manuscript and their dedicated time and effort. Their valuable feedback has been instrumental in improving the quality of our work.

      Reviewer #1 (Public Review):

      This study compares visuospatial working (VWM) memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to disentangle different contributions to overall performance. The results identify a specific decrease in the precision of VWM recall in MS, although the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out.

      Although we try to address this matter by clinical screening, as the reviewer mentioned, the possibility that visual and/or motor impairments contribute to performance decrements on the memory task cannot be ruled out. Therefore, in future studies, including a control condition matched to the experimental paradigm where only the memory components are removed is needed to elucidate this issue.

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures. This represents an advance beyond prior work in this area.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined. The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease-modifying drugs, for example. The present study does not compare the continuous-report testing with a discrete measure task so it is unclear whether the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer brought up an important point, but as they stated, it was not the focus of our current study. Nevertheless, it is a valuable suggestion for future research to compare continuous with discrete measure paradigms to assess their sensitivity and feasibility in the MS population.


      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their thorough reading of this manuscript and valuable suggestions. We appreciate the time and effort they have put into this manuscript to provide feedback for improving our work. Based on their comments, we carefully considered their suggestions and revised the manuscript to address their concerns. Our one-by-one response to reviewer comments is as follows.

      Reviewer #1 (Public Review):

      This study compares visuospatial working memory performance between patients with MS and healthy controls, assessed using analog report tasks that provide continuous measures of recall error. The aim is to advance on previous studies of VWM in MS that have used binary (correct/incorrect) measures of recall, such as from change detection tasks, that are not sensitive to the resolution with which features can be recalled, and to use mixture modelling to potentially disentangle different contributions to overall performance. This aim is met in part, but there are some problems with the authors' interpretation of their findings:

      1) How can the authors be confident the performance deficits in the patient groups are impairments of working memory and not visual or motor in nature? I appreciate there was some kind of clinical screening, but it seems like there should have been a control condition matched to the experimental tasks with only the memory components removed.

      We appreciate the reviewer’s concern regarding the potential confounding effects of visual or motor impairment on the outcomes of our study.

      While we acknowledge that a control condition with only the memory components removed could have further strengthened our results, we did not include one, which is a limitation of the current study.

      To address this limitation, we conducted clinical screening to ensure that the observed deficit was due to working memory impairment and not visual or motor in nature. As part of the expanded disability status scale (EDSS) evaluation, we did not include individuals with issues such as visual acuity, visual field, and extraocular movement impairment, scotoma, nystagmus, and tremors in the upper extremity, which could interfere with the study. Moreover, participants were screened using the 9-Hole Peg Test (9-HPT) before entering the study. These evaluations helped us to ensure that participants with impaired visual or motor performance, which could potentially confound the study, were not included. Our effort to remove the confounding factors with clinical screening provided additional insight into the interpretability of the results. We have updated our inclusion/exclusion criteria accordingly and included this limitation in our discussion.

      2) The participant groups are large, which is definitely a strength, but not particularly well-matched in terms of demographics, with notable differences in age (mean and spread), years of education and gender. These could potentially contribute to differences in performance between groups and tasks.

      We appreciate the reviewer's comment and agree that a matched control group would be ideal. However, we addressed this issue using hierarchical regression analysis.

      Our study assessed visual working memory (VWM) resolution using two analog recall paradigms: the sequential paradigm with bar stimuli and memory-guided localization (MGL). While the demographic data of gender, age, and education in the MGL paradigm were matched between patients and control group, there was a significant difference in these factors between groups in the sequential paradigm.

      To address this issue, we performed hierarchical regression analysis to compare recall parameters in the sequential paradigm with 3-bar and 1-bar stimuli, respectively. We assessed for the confounding effect of gender, age, and education, and the results were presented in supplementary tables 3 and 5.

      In the sequential paradigm with 3-bar stimuli (high memory load condition), we found that all recall parameters were significantly different between groups. However, after adjusting for age and education, the result became insignificant for uniform response proportion. In the 1-bar paradigm (low memory load condition), while the results were significantly different between groups, after adjusting for gender, age, and education, target and uniform response proportions became insignificant (uniform proportion = 1 – target proportion, since there was no swap error in the 1-bar condition).

      3) The authors interpret the mixture model parameter described as "misbinding error" as reflecting failures of feature binding, and propose a link to hippocampus on that basis, however there is now quite strong evidence that these errors (often called swaps) are explained mostly or entirely by imprecision in memory for the cue feature (bar color in this case), e.g. McMaster et al. (2022), already cited in the ms.

      We thank the reviewer for this valuable comment regarding interpreting the mixture model parameter, described as a “misbinding error” in our study.

      Swap error has been attributed to different mechanisms, including the variability in cue feature dimension, cue-independent sources, and strategic guessing. As the reviewer mentioned, in a recent study by McMaster et al., a comprehensive evaluation of these hypotheses was performed and determined that the variability in cue feature dimension could solely explain the occurrence of swap error.

      In response to this comment, we have added a discussion of this matter, the neural correlates of swap error, and the possible explanation for this phenomenon in multiple sclerosis (MS) population to the seventh paragraph of the discussion. Additionally, since our study did not include neuroimaging assessment, we have discussed the results from neuroanatomical points of view to further explain the possible structures involved in the occurrence of swap errors in MS. The seventh and eighth paragraphs of the discussion have been revised for further clarification.

      4) The methodology of the ROC analyses should be described in more detail: it is not clear what measures are being used to classify participants or how.

      This matter is clarified in the results and the last paragraph of materials and methods. In both paradigms, recall error was used for classification purposes.

      5) There are a number of unusual choices of terminology that could potentially confuse or mislead the reader: The tasks are not "n-Back" tasks by the usual meaning: they are analog report tasks with sequential presentation. The terms recall "error", "variability", "precision" and "fidelity" are used idiosyncratically. Variability and precision usually refer to the same thing: they describe the dispersion or spread of errors. The measure described as recall error in the sequential tasks is presumably absolute (or unsigned) error. For the mixture model parameters I suggest describing them more explicitly in terms of the mixture attributes, e.g. "Von Mises SD", "Target proportion", "Non-target proportion" "Uniform proportion".

      We thank the reviewer for this suggestion. We have made revisions to clarify the terminology used in our study.

      The term "n-back" is changed to an analog recall paradigm with sequential presentation. Additionally, as mentioned in the materials and methods, the recall error in the MGL paradigm is the Euclidian distance between the target's location and subject response in visual degree. In the sequential paradigms, this value is the angular difference between the response and target value, in which both are absolute errors. To avoid confusion, we have added the term "absolute error" alongside the term "recall error" to provide a clear understanding of this measurement. Moreover, as the reviewer suggested, we changed "recall variability" to "von Mises SD" for a more precise description. We also changed the remaining terms to "target proportion", "swap error (non-target proportion)", and "uniform proportion".

      Reviewer #2 (Public Review):

      The authors applied two visual working memory tasks, a memory-guided localization (MGL), examining short-term memory of the location of an item over a brief interval, and an N-back task, examining orientation of a centrally presented item, in order to test working memory performance in patients with multiple sclerosis (including a subgroup with relapsing-remitting and one with secondary progressive MS), compared with healthy control subjects. The authors used an approach in testing and statistically modelling visual working memory paradigm previously developed by Paul Bays, Masud Husain and colleagues. Such continuous measure approaches make it possible to quantify the precision, or resolution, of working memory, as opposed to measuring working memory using discretised, all-or-none measures.

      The authors of the present study found that both MS subgroups performed worse than controls on the N-back task and that only the secondary progressive MS subgroup was significantly impaired on the MGL task. The underlying sources of error including incorrect association of an object's identity with its location or serial order, were also examined.

      The application of more precise psychophysiological methods to test visual working memory in multiple sclerosis should be applauded. It has the potential to lead to more sensitive and specific tests which could potentially be used as useful outcome measures in clinical trials of disease modifying drugs, for example.

      However, there are some significant limitations which severely affect the scientific validity and interpretability of the study:

      1) There is a striking lack of key clinical information:

      1.1) There is a striking lack of key clinical information. The inclusion and exclusion criteria are unclear and a recruitment flowchart has not been provided. Therefore it is unclear what proportion of MS patients were ineligible due to, for example, visual impairment.

      We thank the reviewer for raising this matter. To address this issue, we revised the first section of materials and methods to include detailed inclusion/exclusion criteria information. However, it is important to note that we recruited the patients in a full-census manner, where only the patients who fulfilled the inclusion criteria participated. Unfortunately, we did not record the number of patients who did not meet the inclusion criteria.

      1.2) Basic clinical data such as EDSS scores, disease duration, treatment history, and performance on standard cognitive testing were not provided. Basic clinical and demographic data for each subgroup were not provided in a clear format. This severely limits the interpretability of the study and its significance for this clinical population. For example, might it be that the SPMS patients performed worse on the MGL task because they were more cognitively impaired than RRMS patients? That question might be easily answered, but the answer is unclear based on the data provided.

      We appreciate the reviewer for bringing up this important concern. To further clarify the basic clinical and demographic data, we have revised tables 1 and 2 to include detailed information regarding gender, age, education, cognitive ability, disease duration, EDSS score, and disease-modifying therapy (DMT) for each group, respectively. The information is reported as mean ± standard deviation except for the categorical data.

      Regarding the participants' cognitive ability, we added the Montreal cognitive assessment test results for both paradigms. MoCA is a standard cognitive screening tool that has a score of 0 to 30. The different ranges of MoCA scores related to the different levels of cognitive function, in which a score ≥ 26 is considered normal cognitive ability, 18-25 denotes mild cognitive impairment, 10-17 determines moderate cognitive impairment, and a score ≤ 10 is considered severe impairment.

      First, we classify the participants based on their MoCA value and compare groups with each other. While the primary results showed that patient groups were more impaired compared to healthy controls, our results remained significant after adjusting for MoCA using hierarchical regression analysis. This suggests that the observed difference was not solely due to more cognitive impairment in the patients' population.

      Moreover, the information regarding the treatment history of patients is added in the following format. DMT is classified into two groups, i.e., platform and non-platform treatments. In our study, the platform treatments include interferon beta-1a and glatiramer acetate, and non-platform treatments include rituximab, ocrelizumab, fingolimod, dimethyl fumarate, and natalizumab. In both paradigms, the patients did not significantly differ based on the received therapy. The MoCA assessment and treatment history information is added to tables 1 and 2 and supplementary tables 1, 3, and 5. Moreover, the second paragraph of materials and methods, second paragraph of statistical analysis in materials and methods, and the appropriate sections of the results are revised.

      2) The study is completely agnostic to the underlying pathophysiology. There is no neuroimaging available, therefore it is unclear how the specific working memory impairments observed might relate to lesioned underlying brain networks which are crucial for specific aspects of working memory. This severely limits the scientific impact of the results. This limitation is acknowledged by the authors, but the authors did not put forward any hypotheses on how their results may be underpinned by the underlying disease processes.

      We appreciate the reviewer for this valuable suggestion. To further strengthen the connection between our findings and the possible underlying mechanisms of WM dysfunction in MS, we have added a discussion from the neuroanatomical perspective in the eighth paragraph of the discussion section.

      3) The present study does not compare the continuous-report testing with a discrete measure task so it is unclear if the former is more sensitive, or more feasible in this patient group, although this may not have been the purpose of the study.

      The reviewer pointed out an interesting matter. However, this was not the focus of the current study. Nonetheless, it is a valuable suggestion for future work to compare continuous vs. discrete measure paradigms to determine their sensitivity and feasibility in the MS population.

    1. Author Response

      We outline reviewer/editor queries, our responses are indicated below we thank the reviewers for their suggestions that we address below and with minor edits (that do not appreciably change the content such as figure lettering and methods information).

      Reviewer #1 (Public Review):

      The paper by Dongsheng Xiao, Yuhao Yan and Timothy H Murphy presents a timely approach to record neuronal activity at multiple temporal and spatial scales. Such approaches are at the forefront of system neuroscience and a few examples include, among others, fMRI alongside electrophysiology (Logothetis et al, 2021. Nature) or widefield calcium imaging (Lake et al, 2020. Nat Meth) , or functional ultrasound imaging and multi unit recording (Claron et al, 2023 Cell Reports), The method presented here combines "low resolution" (i.e. cortical regions) widefield calcium imaging across most of the dorsal portions of the murine cortex combined with electrical recording of single neurons in specific cortical and subcortical locations (as a matter of fact, this later components can be used everywhere in the murine brain).

      The method presented here is straightforward to implement and very well documented. Examples of novel insights that this approach can generate are well presented and demonstrate the strength of the presented approach, some aspects of the analysis require clarification.

      For example, the author reveal Spike-Triggered average cortical activation Maps (STMs) linked to the activity of single neurons (Figs 4 and 5) This allows to directly asses the functional connectivity between cortical and sub-cortical areas. It nevertheless unclear what is the stability of the established relationships. The nature of the "recordings" in Fig 4. is unclear. It looks like these are imaging sessions on the same day, the length of these recordings as well as the interval between them is not stated. It will be fundamental to build a metric to compare STMs variability across sessions/recordings/days; a root-mean-square from an average map across all recordings could provide a starting point.

      Our goal was to present a well-documented protocol for implanting electrodes (tetrodes and peripheral nerve) that do not impede cortical mesoscale imaging and support chronic investigation of spike trains. We do provide examples of repeated spiking measurements across days from the same electrodes and animals. Unfortunately, due to the pandemic interrupting data collection and other factors, this dataset does not contain a thorough analysis of response longevity using these electrodes, but we do show examples in the figures. In Figure 1F, G, we showed that the single unit activity was relatively stable during one week, two weeks, and two months of recordings after implantation. In Figure 4B we showed spiking activity in the hippocampus was stable across day 8 and day 9. We also showed that the STM of the hippocampus neuron was consistently associated with the RSP, BCS, and M2 region for 10 recording sessions across days. In Figure 4D, We showed that the STMs of a midbrain neuron were relatively stable over 2 months. The spiking activity of the neuron on different days was consistently correlated with the lower limb, upper limb, and trunk sensorimotor areas on both hemispheres of the cortex.

      Also with respect to the STMs analysis, the data-driven choice of 10 clusters might need a bit more explorations. While the silhouette clustering accuracy peaks at 10 (Fig 5A), this metrics comes without a confidence intervals making it difficult to know if a difference of less than 10% (i.e. 11 or 13 clusters) should be deemed different. Maybe a bootstrapping approach could be used here to build such confidence intervals. Another approach to reach the number of cluster to use could be based on "consensus" between different partitioning algorithms (e.g. Strehl, A. & Ghosh, J. itions. J. Mach. Learn. Res. 3, 583-617 (2001). A much stronger argument should be provided to use the 0.3 correlation cutoff value which seems to be arbitrarily low. The main point here is that the authors should show that their conclusions hold within a range of parameter values (number of clusters and correlation threshold).

      Thank you for the interesting suggestions regarding cluster numbers. We agree that the number (10 clusters) could be taken as an arbitrary value. However, we have done previous work examining cortical connectivity maps in Mohajerani et al. 2013 Nature Neurosci. and found that cortical mesoscale activity has a degree of freedom (number of unique elements) in the range of 10-15. This number is also supported by major structural networks found by the Allen Brain Connectivity Atlas and within functional imaging data. In other work using unsupervised methods Xiao et al. 2021 Nature Comm a similar number of clusters were identified so these numbers are without some basis.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed very much reading the manuscript!

      Minor comments (aesthetics and typos)

      Please clarify how the hemodynamic correction was performed. The text refers to "substracted". This usually involves the computation of a general of per-pixel weight. Is this correction constant along the longitudinal imaging session (i.e. over weeks)?

      The hemodynamic correction was calculated based on the results of each daily session. Typically these corrections have minimal impact on overall values and are not expected to appreciably change over time.

      In Figure 3, authors might reconsider scaling down the size of panel A and enlarging the data presented in D. Also, with respect to panel D, what does the gray band represent, confidence intervals, standard dev? Please clarify.

      The gray bands correspond to the standard deviation of random trigger average traces.

      Lines in 4E could be made thicker.

      In the caption of fig6, panel D is mentioned twice (should be E).

      Thanks for catching this mistake we have changed the caption in the online version.

      Reviewer #2 (Public Review):

      The article presents 'Mesotrode,' a technique that integrates chronic widefield calcium imaging and electrophysiology recordings using tetrodes in head-fixed mice. This approach allows recording the activity of a few single neurons in multiple cortical/subcortical structures, in which the tetrodes are implanted, in combination with widefield imaging of dorsal cortex activity on the mesoscale level, albeit without cellular resolution. The authors claim that Mesotrode can be used to sample different combinations of cortico-subcortical networks over prolonged periods of time, up to 60 days post-implantation. The results demonstrate that the activity of neurons recorded from distinct cortical and subcortical structures are coupled to diverse but segregated cortical functional maps, suggesting that neurons of different origins participate in distinct cortico-subcortical pathways. The study also extends the capability of Mesotrode by conducting electrophysiological recordings from the facial motor nerve. It demonstrates that facial nerve spiking is functionally associated with several cortical areas( PTA, RSP, and M2), and optogenetic inhibition of the PTA area significantly reduced the facial movement of the mice.

      Studying the relationship between widefield cortical activity patterns and the activity of individual neurons in cortical and subcortical areas is very important, and Murphy's lab has been a pioneer in the field. However, the choice of low-yield recording methods (tetrode) instead of more high-yield recording techniques, such as silicon probes, makes the approach presented in this study somewhat less appealing. Also, the authors claim that a tetrode-based approach can allow chronic recordings of single neural activity over days - a topic that is very controversial. In terms of results, I was under the impression that most of the conclusions presented in the bulk of the paper ( Figures 1-5) are very similar to what previous work from Murphy's lab and other labs has shown using acute preparation. In this respect, the paper can benefit from a more in-depth analysis of the heterogeneity of single-neuron functional coupling. The last part of the facial nerve recording is interesting (Figure 6), but I think it can be integrated better into the rest of the paper.

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      1) The methodology described in the paper is based on chronic tetrode recordings combined with widefield calcium imaging. The authors emphasize the advantages of using tetrodes in that they are 1) easy to implant 2) have a small footprint, and 3) allow to record the same neurons over days.

      I agree regarding the first advantage, however, the ability to reliably record the activity of the same neurons over days using electrophysiological recordings is controversial. The authors claim that:

      'We found that the single unit activity was relatively stable, during one week, two weeks, and two months of recordings after implantation (Figure 1F, G)',

      The only 'proof' the authors show for recording stability are waveforms of one neuron on one channel (out of presumably four channels), which seem to differ in amplitude over days. Two-dimensional plots of the neuron waveform for all channel combinations could be a more convincing way to make this claim. But, as I already mentioned - the ability to record from the same neurons chronically with electrophysiological methods is rather controversial, especially with tetrodes that don't allow for laminar profiling of neuronal response to account for a potential drift over time.

      We now make it more clear that examples of mesotrode stability are indicated in the figures. Furthermore, we acknowledge caveats that spike sorting experiments required to more conclusively identify single neurons would be improved with larger format silicon probes. Our work employs compact tetrode electrodes that permit simultaneous resolution of single units and mesoscale GCAMP activity. It is conceivable that improvements in spike sorting fidelity could be made by switching to more densely spaced silicon probes. While this is an obvious advantage, these probes do not have a compact footprint and would interfere with regional imaging.

      2) The authors present little analysis justifying the advantage of conducting chronic electrophysiological recordings instead of acute recordings with their data. In fact, throughout the paper, the authors mention that the results were consistent with their previous work with acute recordings. The only longitudinal analysis in this paper is qualitative and suggests that cortical maps were stable over days. I believe this was also shown in the past already. More in depth analysis of across days dynamics or showcase of an experiment centered on across days dynamics will strengthen the appeal of this approach. Generally speaking, there is very little quantitative analysis of longitudinal maps/functional coupling of single neurons over days. The paper will benefit from at least some quantification of this part.

      To our knowledge data showing the persistence of spike-associated maps longer than an acute experiment is novel. However, due to a low yield of recorded single neurons, we have not been able to follow these maps over a longer period in a population that would permit group statistics. We suggest that future experiments could be done using silicon probes with larger yields which would help to better align electrophysiological features with mesoscale GCAMP maps.

      3) Recording with tetrodes gives very low yields compared to silicon probe recordings. While silicon probes have a larger footprint and may occlude the widefield imaging on the side of the silicon probe implant, it is unclear why not to use denser electrode arrays on one side of the brain and image from the other hemispheres, given that the maps are very correlated across hemispheres

      Taking advantage of mirrored activity in the opposite hemisphere is a great idea. Future studies could include experiments that would take advantage of bilateral symmetry by placing high-resolution silicon probes in one hemisphere and then reading out mesoscale maps in the other.

      4) The advantage of the electrophysiological recordings is in providing access to single-neuron activity at high temporal resolution. The authors could add more quantifications regarding individual neuron functional coupling diversity. For instance, in the per-area distributions in Figure 5D -- did all neurons from a given area participate in the same functional maps, or did different neurons show diversity in the functional coupling. Did simultaneous recordings of neurons from the same tetrode show more similar maps, than recordings of other neurons from the same area conducted on different days/in different animals? Did the map differ when the neurons were bursting/were at specific phases of the LFP, etc.

      Unfortunately the yield of neurons was not enough to investigate some of the interesting state-dependent phenomena the reviewer describes. In previous work we have examined heterogeneity between single neuron responses in more detail Xiao et al. 2027 in acute work.

      5) Facial nerve stimulation. This part feels detached from the rest of the paper and is not explained/discussed in sufficient detail. For example, there is no description of the surgical procedure or the electrode used for facial nerve recordings in the Methods (in the Results section, the authors mention 'micro-wires', but the Method section only contains information about tetrodes).

      Thank you for bringing up the issue of surgical details for facial nerve experiments are now in the methods. This information is also available by contacting the authors and below.

      For facial nerve recordings, peripheral nerve activity was measured by fine wire recording directly from the nerves subserving the whisker. During surgery, mice will be anesthetized and positioned on a warming pad connected to a rectal probe, and the temperature maintained at 37 °C. A skin incision was made, exposing a small part of the buccal branch of the left facial nerve. Magnification of the surgical field with a dissecting microscope allowed a careful dissection of a nerve branch with minimum disruption of the tissues and blood supply surrounding the nerve. The appropriate site of exposure was determined by using two projection lines: a vertical line running downward, posterior from the outer corner of the eye, and a horizontal line running in the caudal direction, starting at the whisker E-row. Then two insulated fine wires (about 25 µm tips) were hooked and placed around the nerve separated about 2 mm from one another. The insulation at the ends of the wires was removed and a knot was made on each wire to prevent it from slipping. The opposite ends of each wire were soldered to a mini connector attached by dental cement to the skull. Finally, 6-0 silk sutures were used to close the skin incisions.

      The functional maps associated with facial nerve spiking show different patterns from the optogenetic stimulation maps that led to significant facial nerve responses. Specifically, the STM maps show responses in the posterior parts of the cortex, but the photostimulation map showed almost an opposite pattern, where the effects were observed in the anterior parts. The authors do not discuss this mismatch in sufficient detail. Further, the authors refer to area PTA but use partitions based on the Allen Institute, which does not indicate this area.

      The posterior parietal area location is based on our previous work Mohajerani et al. 2013 and using the Allen Institute Brain Atlas for guidance.

      Minor comments

      6) The authors mention that "on average, we obtained 3-5 neurons per tetrode implanted, and this yield was consistent across regions (Figure 2C). " -- for how long, on average, could the authors record single-neuron activity from each tetrode?

      The 3-5 neurons obtained per tetrode were recorded 1 week after tetrode implantation.

      7) Figure 4B - it is unclear what the labels "recording 1, ...5, " correspond to. Are these different recording sessions within the same day "day 8"?

      The labels "recording 1, ...5, " correspond to different recording sessions within the same day.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) In general given several of the "equivalence groups" were distinguished from each other in Packer et al's annotation, can the authors comment more on why they aren't able to distinguish them? Are the markers listed for those cell states in Packer not expressed appropriately in these data? Or are they expressed but the states are not different enough to form discrete clusters? I suggest the possibility that the analysis choices of 20 "initial dimensions" or 1000 most variable genes filtered out some of these differences which may be encoded in later principle components, or that the use of t-SNE projection is not sufficient to resolve these distinct states.

      2) I was a bit confused by the spatial gene expression analysis. Several distinct ideas appear to be posed in the text. These ideas aren't really supported by any quantitative analysis, just the visual patterns in Figure 4B/C which I'm not sure I always agree with.

      For example, ceh-43 expression is mentioned as having "physically proximate" expression. But it is well established that different lineages form specific spatial territories (e.g. Schnabel et al 1997). Thus it seems logical that genes with specific lineage patterns will have specific spatial patterns as well. If the claim is that the observed patterns are more clustered along the A-P axis than expected by chance given their lineal complexity then I'm not sure this is shown. Maybe some comparison with control lineage patterns of similar complexity of non-TFs or non-HD TFs could get whether these genes specifically are more spatially patterned? Visually it looks to me like some patterns are more like "blobs" or even lateral or D-V specific patterns than they are like "stripes."

      In addition there is a long history in the literature discussing the origin of position-specific patterns in C. elegans - most I'm aware of support the idea that positional information arises primarily from intrinsic lineage mechanisms (e.g. Cowing and Kenyon 1996). Perhaps the authors are making this same argument here, but if so this isn't clear from the text.

      Or maybe the authors are trying to make the argument that combinations of TFs encode more precise position than individual TFs? This seems more likely to me from the images presented still not well-supported without quantitative or statistical analyses.

      3) The comparison with Drosophila is interesting but also under-developed. I think all I would feel comfortable claiming from the data as shown is that genes that are spatially patterned in early fly development are also usually patterned in the C. elegans lineage. But to even say this is an enrichment over expectation would require more analysis.

      Minor comments:

      Methods: some statement about temperature control during cell isolation would be useful. In other words were embryos continuing to develop or put at low temperature such as in a cold room to prevent temporal differences between the first and last cells collected from a given embryo?

      Current links to data at GEO are incorrect and link to Levin et al 2016 instead. I was not able to access the raw single cell data, just the processed data in Table S6.

      The standardization of expression in embryos isn't well explained - would be good to expand a little on the types of batch effects being addressed and how this approach was chosen or a relevant citation.

      Page 2: Including P0 and cell deaths there are 1,341 branches in the hermaphrodite lineage (2n-1 for 671 terminal cells including deaths).

      -"as their each have" (grammar error)

      -"very large nuclear hormone receptor domain" (add "family")

      Page 3: As noted Packer et al largely missed cells prior to the 50-cell stage as described - but the reason for this is likely that the use of 10 micron filters or centrifugation to remove undissociated embryos also removes early stage cells.

      -"few new expressions occur" (grammar). Also, in both Tintori and Hashimshony datasets there well over 1000 newly expressed genes detectable (see for example Sivaramakrishnan et al 2021 biorxiv).

      Figure S1 would be easier to interpret with a legend explaining what fates are represented by each color

      Some genes listed as markers in Figure S2 are not included in the marker table such as flh-3, oma-2, sma-9.

      "New markers were required" - this is plural but only F19F10.1 is mentioned. Were other markers examined this way or should it be singular?

      In Figure S2 the lower ("robustness") plots are nice but could be explained more clearly. What is the nature of the "cell similarity score"? How many (if any) cells were excluded due to not being most similar to their own cluster?

      "transcriptomically very similar shortly after division" - can the authors comment on any information they have about how long after division the cells were collected?

      GFP reporter lineaging - the methods are minimally described (what brand of microscope, which strains/transgene/CRISPR configurations etc). And data are not presented. If these embryos are all incorporated into Ma et al 2021, that is fine, but should be clearly cited. Otherwise it is important in my view to include some way to access the quantitative values from the lineaging and understand these details.

      "as illustrated for ceh-43, dmd-4 and unc-30" - were there other examples as suggested from this wording? I'd also note that similar fluorescent reporter imaging data have been published previously for all three genes listed (Walton et al 2015 for UNC-30, Ma et al 2021 for DMD-4 and CEH-43 protein reporters, Murray et al 2012 for dmd-4 and ceh-43 promoter reporters).

      Zacharias and Murray are cited as promoting "continuous symmetry breaking" but actually that review argued for a "non-monophyletic" architecture similar to that supported by the data .

      The text and figure don't always agree. For example mec-3 expression is listed in the text as part of one of the stripes, but mec-3 is not labeled on the figures.

      The stage of each embryo in figure 4B/C should be explicitly labeled (and maybe also given specific figure panel designations to clarify what statements in the text correspond to which figures).

      In the discussion it is unclear what the numbers "97 to 104" refer to

      The scRNA-seq reads were mapped to a relatively old genome build and annotation set (WS230) - thus current users may find discrepancies with current gene names in WormBase. Also, since the CEL-seq data are 3' biased, it is worth noting that Packer et al found that a substantial number of genes (~1000) in a slightly later annotation set (WS260) were undercounted (sometimes dramatically) with the similarly biased 10x data due to incomplete 3'UTR annotations. While I would be reluctant to ask for a requantification for the purposes of the manuscript given the challenges of repeating the various analyses, it is worth explicitly mentioning whether this was dealt with.

      Reviewer #2 (Recommendations For The Authors):

      The writing was otherwise good, at least to my eye, and the data was presented very well and made freely available to other researchers. I am not as well-versed in the statistical methods and will leave comments on these to a better-equipped reviewer(s).

      Fig. 1 legend 'P' should be P4 (subscript 4).

      p. 9 'ceh-51' should be italicized. Only one factor seems to have been confirmed by smFISH, F19E10.1. There are available reporters, did they show a similar pattern? From CGC website: RW12347 F19F10.1(st12347[F19F10.1::TY1::EGFP::3xFLAG]) V endogenous tagged reporter; RW11620 unc-119(tm4063) III; stIs11620 [F19F10.1::H1-wCherry + unc-119(+)] array reporter.

      Reviewer #3 (Recommendations For The Authors):

      Typo: on page 11, where it says nanog it should read nanos.

      Reviewer #4 (Recommendations For The Authors):

      I found some sentences and paragraphs to be a bit unclear. There are no page or line numbers in the manuscript, so I point in the general direction, and hope the authors find what I am referring to.

      • 2nd paragraph of the Introduction - "their" should be "they", but the sentence as a whole is not clear.

      • 3rd para. of the Intro. - The last sentence of this paragraph doesn't make sense. Please rephrase and/or break up into shorter sentences.

      • 1st Para. of Results - "the maternal deposit" is not clear. Perhaps "maternally deposited transcripts" or something similar.

      • 1st Para. after Figure 3. The last sentence "Thus, continuous symmetry breaking..." is unclear. What is "continuous symmetry breaking"? Please define and expand.

      • Fig. 4 - the genes seem to be listed from posterior to anterior. The common way of presenting Hox gene lists and other regionally expressed genes is from anterior to posterior.

      • For the benefit of the non-C. elegans crowd, please give names of Drosophila homologs where relevant (e.g., when comparing to Drosophila expression patterns)

      In a few places there are citations of popular science books or general textbooks (e.g., Carrol et al., 2004; Wolpert et al., 2019) . I think it would be better to cite review papers from the scientific literature or relevant primary papers.

      I am very happy to submit the revised manuscript. We were very happy to have received reports from four reviewers!

      We have decided not to prepare a separate response to the public comments of the reviewers, as we did not undertake any further major revisions.

      We did address most of the minor editorial suggestions.

    1. Author Response

      eLife assessment

      This paper presents a series of experiments investigating the role of cadherin-11 mediated interactions between cancer cells and fibroblasts in metastasis using updated 3D cell co-invasion assays. The primarily descriptive data are a valuable contribution to our understanding of the nature of cross cell-type interactions in metastasis, but are incomplete with respect to the far-reaching conclusions about the central role cadherin-11, especially given the complex nature of the phenotype and the need to better contextualize these observations in a complete picture of metastasis.

      We extend our gratitude to eLife for affording us the opportunity to publish our manuscript as a peer-reviewed preprint. We acknowledge that our exploration of the novel cell hijacking mechanism underlying cancer metastasis remains an evolving endeavor. Being the inaugural study to introduce this innovative phenotype, substantiated by comprehensive in vivo investigations that underscore its real-world significance, we eagerly anticipate forthcoming research in this domain. The inception of the concept of cancer metastasis dates back to the 18th century. Throughout the extensive journey marked by a multitude of millions of publications in this field, our work introduces a transformative and disruptive dimension with the unveiling of this cell hijacking mechanism. Simultaneously, it initiates a deeper exploration of the intricacies within the metastatic process. We sincerely value the meticulous assessment of our work and look forward to subsequent investigations that will elucidate these findings within the broader context of metastasis.

      Joint Public Review:

      The authors of this manuscript studied cell-cell interaction between fibroblast and cancer cells as an intermediary model of tumor cell migration/invasion. The work focused on the mesenchymal cadherin-11 (CDH11) which is expressed in the later stages of the epithelial mesenchymal transition (EMT) in tumor cellular models, and whose expression is correlated with tumor progression in vivo. The authors employed 3-D matrix and live cell imaging to visualize the nutrient-dependent co-migration of fibroblast and cancer cells. By siRNA-based suppression of CDH11 expression in tumor cell line and/or fibroblast cells, the authors observed decreased co-movement and attenuated growth of mixed xenograft. Accordingly, the authors conclude that post-EMT cancer cells are capable of migrating/invading through CDH11-mediated cell-cell contact.

      While the data point to the involvement of CDH11 in fibroblast mediated co-invasion, as it stands it is difficult to fully contextualize these observations within the broader context of the molecular mechanisms underlying metastasis, and in particular do not firmly establish a primary role for CDH11 at this time. The reviewers were specifically concerned about indirect effects of CDH11 manipulation on the physiology and cell biology of the tumor cells, and the possibility that several of the results could be consequences of these changes rather than due specifically to CDH11 mediated interactions.

      The reviewers acknowledge the difficulty in fully controlling for these phenomena, and believe this work will be of interest to the large number of researchers investigating the molecular basis for metastasis and specifically of trans cell-type interactions. However until experiments establishing the specific formation and CDH11-mediated interactions in co-invasion are carried out, the author's conclusions about the prominent role of CDH11 should be treated as intriguing, but speculative.

      We extend our sincere gratitude to the peer reviewers for their invaluable and constructive feedback. We also wish to express our appreciation for the concise summary of our study and the recognition of the challenges posed by the current technological landscape in fully elucidating the phenotype.

      In response to the reviewer's concerns regarding the indirect effects of CDH11 manipulation on the physiology and cellular biology of tumor cells, we encourage readers to revisit Figure 3. In this figure, we not only silenced CDH11 in cancer cells but also in fibroblasts. The outcomes of this intricate experiment have been comprehensively discussed in the main text and are visually summarized in Supplemental Figure S2.

      Furthermore, we draw attention to a comprehensive review of our in vivo studies presented in Figure 6, wherein we exclusively silenced CDH11 in fibroblasts without any manipulation of the cancer cells. These findings underscore the molecular underpinnings of CDH11 as the mediator of cell hijacking. Consequently, we are confident that the reviewer's concerns regarding potential side effects of CDH11 manipulation on tumor cells, which could weaken the manuscript's conclusions, can be addressed.

      In conclusion, we wish to emphasize that we shared the same initial concerns as our reviewers when designing these studies. We have diligently endeavored to alleviate these concerns through a series of comprehensive in vitro, ex vivo, and in vivo experiments. Once again, we strongly encourage readers to explore our supplemental data for a more in-depth understanding. Thank you.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their remarks which significantly improved the paper. Following these remarks we completed the analysis and validation of our cryo-EM data and peformed several biochemical tests to support our conclusions, lending credbility to the paper. Please find our detailed answers bellow each recommendation of the reviewers.

      Major recommendations

      1) Errors and omissions in the presentation make the manuscript difficult to access.

      a) The text should be edited for grammatical errors more carefully

      • We corrected the grammatical errors.

      b) Figures should be labeled to allow the reader to follow the logic of the presentation and identify the features being discussed. Identification through the color coding (the identity of the histones, the location of zinc fingers, the active site, and so on) would be helpful.

      • We labeled the Rossman fold and Zn-finger domains in Figure 1 and described the histone color codes. The active site of SIRT6 is depicted in Figure 4.

      2) The recent publications from the Farnung/Cole and Peterson/Tan/Armache labs need to be cited and the results from Smirnova et al. compared and contrasted with those publications explicitly.

      • We added the following paragraph to the discussion section:<br /> “While this manuscript was under review two studies describing the structure of SIRT6-NCP appeared in press (Wang et al., 2023 ; Chio et al., 2023). The conclusion of these papers regarding the position of SIRT6 on the nucleosome and the unwinding of DNA by the enzyme are similar to our findings. We however dissected in addition the movements of SIRT6 on the nucleosome and analyzed via molecular dynamics the conformations of the H3 tail with respect to the SIRT6 active site. Our results point to the importance of the flexibility between the globular domains of SIRT6 and also explain how SIRT6 can access lysines that are much closer to the histone core than H3K9.”

      a) Notably, the Peterson/Tan/Armache labs suggest that H3K27 cannot be deacetylated by SIRT6 whereas the Farnung/Cole labs show deacetylation of H3K27 by SIRT6. Do the results of the Smirnova et al. structure help to resolve this situation?

      • We performed deacetylation tests of H3K27Ac nucleosomes and show that SIRT6 deacetylate H3K27Ac albeit at somewhat lower efficiency than H3K9Ac. Our molecular dynamics simulations explain how H3K27, which is close to the histone core, can still be reached by SIRT6 active site. We added the following text to the paper: “To lend support to this claim we tested whether SIRT6 can deacetylate residue H3K27 that was first acetylated by SAGA (Supplemental Fig. 7c). We find that indeed SIRT6 could efficiently deactylate H3K27Ac, although at a somewhat slower rate than H3K9Ac. We conclude that partial DNA unwrapping by SIRT6 allows H3-tail conformations that make lysines that are close to the core of H3 accessible to the enzyme.”

      b) The Farnung/Cole labs have visualized an intermediate state of deacetylation. How does this compare to the structure presented in this manuscript? Addressing these points would facilitate further research and discussion in the community.

      • We believe the resolution of the SIRT6 Rossmann fold precludes addressing these points.

      c) Can the authors exclude the possibility that the additional density observed in Supplemental Figure 6 is not coming from the H3 tail, as observed in the two other structures?

      • One density is the continuation of the H2A histone tail. We strongly believe that this density corresponds to this tail. The other density indeed can originate from the H3 tail. Therefore, we didn’t model anything inside it.

      d) It would be useful to comment on how much flexibility has been observed in the other structures for the SIRT6 interaction with the acidic patch, and also how other acidic-patch binding proteins compare with the results here.

      • We refrain from estimating the flexibility observed in the other structures as no such analysis is provided by these papers. Regarding the interaction with the acidic patch we mention that R175 packs against H2B L103 and serves as a classical “arginine anchor motif” and refer the reader to a review on the topic.

      e) Does the presence or absence of NAD+ affect the comparisons among the structures?

      • NAD+ binding might affect the fine structure of the active site although NAD+ was not observed in crystal stuctures of SIRT6 in its presence. The resolution of this part precludes further addressing this issue.

      3) The lack of biochemical validation of conclusions should be acknowledged and the reasoning behind this choice discussed.

      • We added experiments to validate our conclusions with biochemical tests. We produced nucleosomes with acetylatexd histone H3 by employing purified SAGA acetyltransferase complex. We isolated SIRT6 where the four residues implicated in interactions with the acidic patch are mutated to alanines (SIRT6-4A). We show that this mutant has very weak interaction with the nucleosome and much lower H3K9Ac deacetylation activity than WT. Similarly SIRT6-3A with mutations in the residues we suggest involved in binding to nucleosomal DNA also shows weak activity and binding to the nucleosome. We added Supplement Figure 7 that depicts the results of these experiments and embedded reference to these results in the approporiate sections of the text. Furthermore, we also show that SIRT6 is active in deacetylating H3K27Ac. This supports our molecular dynamics simulations showing that when SIRT6 binds the nucleosome, H3 tail can assume conformations where H3K27 is accessible by the enzyme’s active site. These results also appear in Supplement Figure 7.

      4) The authors nicely analyze and discuss the conformational flexibility of SIRT6 binding. This is an interesting finding, but Fig. 2 does not adequately convey this flexibility.

      • We now considerably improved Figure 2. We added panels c and f which depict clearly the movements we observe.

      5) The authors need to explain why two cryo-EM datasets were collected but were not merged, and the labeling of the datasets in the Supplemental Table appear to be switched.

      • The two datasets were collected with two very different pixel spacing therefore merging the two was possible only in Relion. This process, however, did not improve the resolution of the SIRT6’s Rossmann fold domain. We thank the reviewer to notice the discrepancy in the text and the Supplemental Table 1, it was corrected.

      6) Supplemental Figure 4 should be expanded to show additional representative densities with the respective fit of the model. This will allow the reader to better judge the quality of the data. At least the acidic patch interaction, the DNA-SIRT6 interactions, and the H2A should be shown in this context.

      • To illustrate the high-resolution features of the structure as well as the key regions we added Supplemental Figure 4.

      7) Standard elements of data analysis and validation should be included (angular distribution plots for cryo-EM reconstructions, a 3D FSC sphericity plot, a Q-score and EMRinger score for the cryo-EM data and atomic model, a model-to-map FSC curve). In general, model building is poorly described as it is unclear which maps (or to what degree different maps) were used for this process. This should be clarified in the methods section and in the Supplemental Table 1.

      • The model validation and data analysis details were added to Supplemental Figures 2 and 3 as well as in Supplemental Table 1.

      8) The provided maps also do not fully recapitulate the path of the H2A tail. The various density maps and PDB provided for this review do not support the final modeled residues of H2A between residues #118/119-123. This affects the validity of figure 3E and the discussion of the proximity of the potential substrates to the active site. The authors should clarify how they inferred that this is the H2A tail rather than the loosely bound SIRT6 Nterminal loop (whose stability could be altered by the presence or absence of NAD+) as suggested by overlaying the relevant crystal structures.

      • We added a panel to Supplemental Figure 4 (d) depicting the density where the H2A tail was modelled.

      9) The authors should explain how the data produced an asymmetrically oriented complex with a single SIRT6 molecule bound to one face. Were complexes with two SIRT6 molecules excluded? Is supplementary figure 4A the basis for the orientation and is this sufficient for this purpose?

      • Complexes with two SIRT6 molecules were present but only at around 1.5 percent of the whole dataset. These images were excluded from the refinement (shown in Supplementary Figure 2). The DNA orientation is depicted in Supplementary Figure 5A. The resolution obtained at the dyad (~2.5Å) allowed us to distinguish purine and pyrimidine bases. The Widom 601 sequence is asymmetric and the densities clearly show that there is only one orientation of the DNA observed with respect to SIRT6.

      10) The authors should clarify how supplemental figure 4B supports the conclusion that DNA is unwrapped. The density is not readily visible and docking of a simple DNA model in the ZN-focused map does not clearly rule out the possibility that this density comes from the H3 N-terminal tail.

      • We added to this figure the cryo-EM densities used to model the DNA path and the orientation of SIRT6. This image is now Supplemental Figure 5c.

      Minor recommendations

      1) The scale bar is missing for the 2D classes shown in Supplemental Figure 2.

      • We added the scale bar to the image depicting the 2D classes in Supplemental Figure 2.

      2) Masked classifications should be shown in the classification tree (Supplemental Figure 2 +3) with the masks shown as a transparent volume.

      • We now show the mask used for the 3D classifications of the SIRT6’s Rossman fold domain in Supplemental Figure 2.

      3) Supplemental Figure 3 should show the indicated 3D classifications in the classification tree.

      • We added the 3D classifications in Supplemental Figure 3.

      4) The authors should consider applying local CTF refinement and particle polishing to improve their resolution.

      • We did local and global CTF refinements. Polishing didn’t improve the resolution as movie frame alignment was done outside of Relion.

      5) The descriptions of the Widome 601 sequence orientation should be less ambiguous, perhaps mentioning the AT-rich and AT-poor arms instead of left and right arms.

      • We corrected the text as required.

      6) The statement "Such a large change in DNA trajectory is reminiscent of the chromatin-remodeler ATPases or pioneer transcription factors binding to nucleosome but was not observed in other histone modifiers" requires a citation.

      • We added approporiate references.

      7) The authors should provide a supplemental figure of the nucleosome-SIRT6 and PRC1-nucleosome structure comparison to complement the discussion section.

      • We refer the reader to the paper describing the PRC1-nucleosome structure.
    1. Author Response

      We would like to express our gratitude to the Editors and Reviewers for their thoughtful and helpful comments. We sincerely appreciate the opportunity to submit our revised manuscript titled “Predicting Ventricular Tachycardia Circuits in Patients with Arrhythmogenic Right Ventricular Cardiomyopathy using Genotype-specific Heart Digital Twins” to eLife. We are delighted that our research in ARVC has garnered the interest of the three reviewers. Below, we provide our point-by-point responses to the reviewers’ comments. We have also incorporated the suggestions provided by the reviewers in our revised manuscript.

      Comments from Reviewer 1

      We thank Reviewer 1 for their positive assessment and thoughtful suggestions. Here are the responses to the comments of reviewer 1:

      Comment 1: One addition that could add more insight is to predict the effect of structural remodeling alone well, considering only normal electrophysiological models.

      We thank the reviewer to give this thoughtful suggestion to our experiment design. We would like to highlight that this suggestion was indeed taken into consideration in our study as all the patients’ hearts were modeled using the gene-elusive cell model before the structural-EP mismatch was implemented. The gene-elusive cell model is a baseline ten Tusscher (TT2) human ventricular model described in the “Cell-level modeling” of our Methods. Therefore, we have already examined the impact of structural remodeling alone in the study.

      Comment 2: Another interesting approach would be a sensitivity analysis, to determine how sensitive the VT circuits are to the specific geometry of the patient and remodeling that occurs during the disease, such an approach could also be used to determine how sensitive the outputs are to electrophysiological model inputs.

      We think this suggestion is of great value and could benefit our future ARVC studies. The reviewer pointed out the importance of investigating how sensitive the VT circuits are to the specific geometry/remodeling of the patient during disease progression. To achieve this, for each patient, a sequence of LGE-CMR images at different stages of this disease is required for model reconstruction; unfortunately, our cohort for this study does not incorporate such data.

      Comments from Reviewer 2

      We thank Reviewer 2 for the positive assessment, and here are the responses to the comments:

      Comment 1: I appreciate that the types of computational models detailed in this paper take enormous time to develop. However, to identify bottlenecks in the clinical workflow (and thus targets for future research), it may be nice for the authors to discuss the time taken to generate and run the models for each patient?

      We sincerely appreciate the valuable feedback from the reviewer. We recognize the importance of considering model generation and run time. In the introduction, we have highlighted the clinical challenge in managing ARVC ablation procedures, which is the inability to capture all the VT due to an incomplete understanding of VT mechanisms. We acknowledge the reviewer’s concern regarding the potential time taken by the model to predict VT circuits and whether this could hinder the integration into the current ablation procedure. However, it is important to clarify that our model is primarily based on clinical images obtained in advance of the procedure. As a result, there is sufficient time available to generate the results required for ablation planning.

      Comment 2: In the Materials and Methods section, some references are underlined? Is this a typo or meant to convey some particular information?

      We thank the reviewer for pointing this typo out and we have removed the underlining of references in our revised manuscript.

      Comment 3: The authors state that the cellular models are available from the CellML model repository. This is an excellent practice. However, the URL that is given points to the entire CellML website. It will be more useful for URLs that point to the specific models used in the study so that readers can be sure they are looking at the correct model.

      We appreciate the reviewer for this suggestion, and we have edited the URL in Data Availability to link to a specific cell model on the CellML website.

      Comment 4: In the abstract, the authors report the sensitivity, specificity, and accuracy of their computer models but fail to comment in the abstract that they are comparing against recordings from the patient during a previous EPS study. To assist further readers who are scanning the abstract, the authors may wish to add a sentence or two to detail what they are comparing their model results to.

      We thank the reviewer for the suggestion. This is a retrospective study. We recognize the importance of wording clarity in the abstract; in response, we have added a sentence in the abstract to clarify that we compared VT locations of Geno-DT with the ones recorded during clinical EPS to obtain sensitivity, specificity, and accuracy.

      Comment 5: In Table 1 some of the data is discrete e.g., the number of patients on a beta-blocker. The authors give a p-value for comparing the GE and PKP2 data and state in the caption that a Student's t-test has been used. Strictly speaking, a t-test is not really appropriate for the population proportion with non-parametric data. That said, the size (n) of the data here makes the p-values from any statistic very unreliable. Perhaps the authors might like to reconsider if p-values add anything to such data? If so, then the statistical test should be reconsidered.

      We truly appreciate the reviewer for pointing out this typo in the caption of Table 1. For the non-parametric discrete data, we used z-test, a common statistical method used to compare percentages, to get the p values, but we mistakenly only mentioned t-test in our caption. We acknowledge the limitation of our sample size and we have corrected this typo in our revision.

      Comment 6: I found Table 1 and its caption a little confusing. The authors put the range in [] brackets and then abbreviated standard deviation with () brackets. On initial reading, I incorrectly assumed that the numbers in the table in () brackets were standard deviations when, in fact, they are percentages. Perhaps the authors could consider changing the caption so that the percentage is in, say, {} brackets and make the caption say that values are given as n {%} etc.

      We appreciate the reviewer for pointing this out and we recognize that certain expression in the Table 1 caption is confusing. In our revised manuscript, we used n {%} to replace n (%) and deleted the abbreviated standard deviation which has not been used.

      Comment 7: In the caption for Figure 2 the authors present action potentials "at steady state". Adding the pacing frequency (or cycle length) for the steady state would be useful.

      We thank the reviewer for pointing this out. We agree that showing pacing frequency is important and we have made the edit in our revision.

      Comment 8: In Table 2 the VT locations are compared between the EPS and the Geno-DT model. The comparison metrics listed in the table should be better described in the table caption. It is unclear if the authors compare VT locations in the AHA segments or if the specific geometric location is used. If it is a geometric location, then I would have expected to see information on the mean error distance or similar information? If it is a comparison of AHA segments, there could be a problem if a VT location was very close to the border between segments. The predicted VT location might be very close to the measured VT location but may end up in a different segment? The authors may like to clarify the methodology and/or discuss these issues.

      We thank the reviewer for this comment. We recognize the need for clarification on the comparison metrics of Table 2. In the text related to Table 2, we used the wording “anatomical location” to avoid excessive repetition of mentioning AHA segments. However, we agree that reverting it back to the “AHA segment” will reduce confusion. Regarding the point of comparing exact locations the reviewer mentioned, in clinical settings, clinicians primarily rely on AHA segments to describe the VT locations during ablation and descriptions in the EP report, rather than using exact coordinates. As such, a match between our predicted AHA segments and clinical AHA segments is a direct comparison. This alignment provides a meaningful comparison and is sufficient for assisting ablation procedures.

      Comment 9: In Figure 7, activation maps are shown, and the row is labelled as Induced VTs/Geno-DT. Are the colour maps from the model or the EPS measurements? The last sentence of the caption indicates they are from the measurements, but such detailed full-wall maps seem to be from a model. The authors may like to clarify what the figure shows.

      We thank the reviewer for this comment. We understand the reviewer’s concern regarding the clarity of Figure 7’s caption. While we believe that the first bold sentence in the caption adequately clarifies that the results in Figure 7 are derived from the Geno-DT model, we agree with the reviewer that it is needed to further enhance the wording clarity. In response, we have made the necessary edits to the caption in our revised manuscript.

      Comments from Reviewer 3

      We thank Reviewer 3 for giving the positive assessment. Here are the responses to the comments.

      Comment 1: The small sample size is a limitation but has already been acknowledged and documented by the authors.

      We thank the author for this comment, and we acknowledged the small sample size as a limitation in our manuscript.

      Comment 2: Another limitation is the consideration of only two of the possible genotypes in developing the cell membrane kinetics, but again has been acknowledged by the authors.

      We thank the author for this comment, and we acknowledged the consideration of only two genotypes as a limitation in our manuscript. We hope to enlarge the genotype groups in our future ARVC studies.

    1. Author Response

      We thank the reviewers for their helpful comments and thorough assessment of our manuscript which will allow us to improve the work in a subsequent revision. Many suggestions, such as mutating residues to help validate the proposed site will be included in a future revision. Below we clarify three aspects that led to confusion in the initial review

      The comment of reviewer 2 that “... the main interaction site of PIPs with Nav1.4 is the VSD-DIV and DIII-DIV linker, an interaction that is expected to delay fast inactivation if it happens at the resting state." is true. However, as explained in our manuscript (Fig. 7), we don’t expect binding at this position to happen in the resting state as the C-terminal domain is bound to this region, impeding PIP binding.

      Reviewer 2 also suggests that we produce a resting state model of Nav1.4 to replace/supplement the results we obtained using our resting Nav1.7 model. We chose to model Nav1.7 due to the availability of structures with different VSDs in the deactivated conformation, something that is not true for Nav1.4. While we plan to explore a Nav1.4 resting state based on the reviewer's suggestion, we note that this introduces an extra layer of uncertainty. However, due to sequence conservation of the gating charges and proposed binding site residues between Nav subtypes, we propose very similar modes of PIP binding among the Nav subtypes across the different conformations.

      Finally, we strongly disagree with the reviewer’s assessment that there are ‘There are a lot of incorrect statements in many areas’ and this may have come from a misreading of the mentioned sentence. The sentence in question reads "These diseases 335 are associated with accelerated rates of channel recovery from inactivation, consistent with our observations that an interaction between PI(4,5)P2 and the residue corresponding to R1469 in other Nav 337 subtypes could be important for prolonging the fast-inactivated state." To which the reviewer 2 states ‘Prolonging the fast inactivated state would actually reduce recovery from inactivation and not accelerate it.’ The statement quoted is not incorrect – from the original experiments we know that the presence of PIP prolongs the time spent in the fast inactivated state. Mutations at the PIP binding site are likely to reduce PIP binding, and with less PIP present the channel will recover from inactivation more quickly. We appreciate that this sentence could be reworded for clarity and will address this in our revision to prevent such misreading.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your recent editorial decision on our manuscript. I have included a revised version of our manuscript in which we have addressed all of the required editorial and referees’ comments as requested. In summary, we have added substantial amounts of new data and analysis (new Fig. 5D; Supplementary Figures S1E, S3C, S3E, S3I, S4C), amended several figures (Figures 2 and 3), added a new supplementary Table (Table S2) and we have changed the text and figure labelling/presentation in appropriate places to clarify or correct the issues raised by the reviewers.

      In summary, we firmly believe that we have addressed all the outstanding issues in a positive manner and that the manuscript is now suitable for publication in eLife. I look forward to receiving your final editorial decision on this manuscript.

      eLife assessment:

      ZMYM2 is a transcriptional corepressor but little was known about how it is recruited to chromatin. This study reveals that ZMYM2 homes to distinct classes of retrotransposons bound by the TRIM28 and ChAHP complexes in human cells, an important finding in the field of transcriptional regulation. The evidence supporting the claims of the authors is solid, although inclusion of more functional data would have strengthened the original model proposed.

      We have taken all the comments on board and provided additional new experimental data where requested and more data analysis to substantiate our claims.

      Reviewer #1 (Public Review):

      Owen D et al. investigated the protein partners and molecular functions of ZMYM2, a transcriptional repressor with key roles in cell identity and mutated in several human diseases, in human U2OS cells using mass spectrometry, siRNA knockdown, ChIP-seq and RNA-seq. They tried to identify chromatin bound complexes containing ZMYM2 and identified known and novel protein partners, including ADNP and the newly described partner TRIM28. Focusing mainly on these two proteins, they show that ZMYM2 physically interacts with ADNP or TRIM28, and co-occupies an overlapping set of genomic regions with ADNP and TRIM28. By generating a large set of knockdown and RNA-seq experiments, they show that ZMYM2 co-regulates a large number of genes with ADNP and TRIM28 in U2OS cells. Interestingly, ZMYM2-TRIM28 do not appear to repress genes directly at promoters, but the authors find that ZMYM2/TRIM28 repress LTR elements and suggest that this leads to gene deregulation at distance by affecting the chromatin environment within TADs.

      A strength of the study is that, compared to previous studies of ZMYM2 protein partners, it investigates binding partners of ZMYM2 using the RIME method on chromatin. The RIME method makes it possible to identify low-affinity protein-protein interactions and proteins interactions occurring at chromatin, therefore revealing partners most relevant for gene regulation at chromatin. This allowed the identification of novel ZMYM2 partners not identified before, such as TRIM28. The authors present solid interaction data with appropriate controls and generated an impressive amount of datasets (ChIP-seq for TRIM28 and ADNP, RNA-seq in ZMYM2, ADNP and TRIM28 knockdown cells) that are important to understand the molecular functions of ZMYM2. These datasets were generated with replicates and will be very useful for the scientific community. This study provides important novel insights into the molecular roles of ZMYM2 in human U2OS cells.

      The authors could have been more precise in the manuscript title and abstract to emphasize that these findings apply to human cells, as indeed there is no demonstration yet that the findings presented here can be transposed to mouse cells.

      We have slightly changed the title and abstract to emphasise that the findings are in human cells.

      The manuscript's main conceptual advance is that the authors propose a novel model of gene regulation whereby transcriptional repressors of transposable elements could regulate genes at distance by modulating the local chromatin environment within TADs. Additional experiments would be needed to strengthen this model. For example the authors could have performed TRIM28 ChIP in ZMYM2-kd cells to test if ZMYM2 favors the recruitment of TRIM28 to its genomic targets, as well as ChIP-seq of repressive chromatin marks (such as H3K9me3) in ZMYM2-kd cells to investigate if the loss of ZMYM2 leads to reduced H3K9me3 in ERVs and over large regions surrounding the ERVs.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Public Review):

      In this study the authors investigate functional associations made by transcription factor ZMYM2 with chromatin regulators, and the impact of perturbing these complexes on the transcriptome of the U2OS cell line. They focus on validating two novel chromatin-templated interactions: with TRIM28/KAP1 and with ADNP, concluding that via these distinct chromatin regulators, ZMYM2 contributes to transcriptional control of LTR and SINE retrotransposons, respectively.

      Strengths and weakness of the study:

      • The co-localization of ZMYM2 with ADNP and TRIM28 is validated through RIME, ChIP-seq and co-IP. (Notably, since both RIME and ChIP-seq rely on crosslinking, and the co-IP with TRIM28 required crosslinking due to being SUMO-dependent, only the ZMYM2-ADNP co-IP experiment demonstrates an interaction in the absence of crosslinking).

      This is not correct as the co-IP experiments between endogenous ZMYM2 and TRIM28 were not performed in the presence of cross linkers. They did have NEM added, but this was to inactivate SUMO proteases rather than to cross link proteins.

      • It is good that uniquely-mapped reads are used in the ChIP-seq analysis given the interest in repetitive elements. Likewise, though the RT-qPCR data in Fig5 should be complemented by analysis of the RNA-seq data that the authors already have, it seems that the primers are carefully designed such that a single retrotransposon copy is amplified.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      • The top-scoring interactors are highly-abundant nuclear proteins: for example, data from the contaminant repository for affinity purification mass-spec data (https://reprint-apms.org/) show that TRIM28 is identified in 466 / 716 AP-MS experiments with a mean spectral count of 16. While this does not indicate that the ZMYM2-TRIM28 interaction is not 'true', it would have been helpful to further dissect the interaction to strengthen this conclusion. For example, it would be nice to see the co-IP (fig 3A) repeated from the cells expressing the ZMYM2 mutant that is no longer competent to bind SUMO (used in the ChIP-seq data of Fig 2). Alternatively - if the model is that ZMYM2 recruits SUMOylated TRIM28 with well-characterized TRIM28 mutants that lack SUMOylation.

      We are aware that TRIM28 is often present as an apparent contaminant in many mass spec studies. However we have provided co-IP, PLA and ChIP-seq data to support their co-association on chromatin. We also convincingly show that ZMYM2 and TRIM28 functionally converge on regulating the same gene expression programmes. As requested by the referee, we have added further data showing that the ZMYM2 protein that is defective in SUMO binding (ZMYM2(SIM2mut); new Supplementary Fig. S3C) shows reduced binding to TRIM28 in co-IP assays. This further strengthens the (SUMO-dependent) association between ZMYM2 and TRIM28.

      • The transcriptional response using bulk RNA-seq in ZMYM2-depleted cells is rather gene-centric despite the title of the paper being about TE transcription. In fact the only panels about TE transcription are the RT-qPCR data in Fig 5D,F. I may be missing something (and there aren't many details given about the RNA-seq experiments) but why not look at TE transcription in an unbiased way with the transcriptomic data at hand? I appreciate potential hazards of multi-mapping etc but it would be interesting to see at least some subfamily analysis (e.g. using the TEtranscripts tool). On a similar point, why not show some RNA-seq in the genome browser snapshots of the epigenomics - together with a RepeatMasker annotation track of TEs...

      See response to the same point above.

      While the results broadly support the authors' conclusions, I have the overall impression that the central claim of TE transcriptional regulation by ZMYM2 could be strengthened a lot with some fairly straightforward additional experiments and analyses.

      Reviewer #3 (Public Review):

      ZMYM2 is a transcriptional repressor known to bind to the post-translational modification SUMO2/3. It has been implicated in the silencing of genes and transposons in a variety of contexts, but lacking sequence-specific DNA binding, little is known about how it is targeted to specific regions. At least two reports indicate association with TRIM28 targets (Tsusaka 2020 Epigenetics & Chromatin, Graham-Paquin 2022 bioRxiv) but no physical association with TRIM28 targets had been observed. Tsusaka 2020 theorizes an indirect, potentially SUMO-independent, interaction via ATF7IP and SETDB1.

      Here, Owen and colleagues show that a subset of ZMYM2-binding sites in U2OS cells are clearly TRIM28 sites, and further find that hundreds of genes are silenced by both ZMYM2 and TRIM28. They next demonstrate that ZMYM2 homes to chromatin, and interacts with TRIM28, in a SUMOylation-dependent manner, suggesting that ZMYM2 is recognizing SUMOylation on TRIM28 itself. ZMYM2 separately homes to SINE elements bound by the ChAHP complex, in an apparently SUMOylation independent manner. Although this is not the first report to show physical interaction between ZMYM2 and ChAHP, it is the first to show that ZMYM2 homes to ChAHP-binding sites and functions as a corepressor at these sites.

      The mode by which ZMYM2 and TRIM28 coregulate genic targets remains somewhat unclear. TRIM28/ZMYM2 bind to LTR elements, loss of these proteins results in upregulation of genes distal to (but in the same TAD as) these binding sites.

      Overall, the manuscript is well-written, convincing, and fills a significant hole in our understanding of ZMYM2's mechanistic function.

      We thank the referee for his/her positive evaluation of the mechanistic insights we provide. We have further added to these through addressing the specific issues raised in their “recommendations for authors”.

      Recommendations for the authors:

      The reviewers appreciated the novelty of the findings, and in particular, the use of the RIME method to identify the protein partners of ZMYM2 while bound on chromatin, and multiple validation steps of these novel ZMYM2 interactors. However, they also felt that the model presented at the end of the manuscript seems preliminary and would deserve additional experiments to be really supported, the essential ones being listed below:

      1 - Despite the claimed scope of the manuscript on TE regulation, their expression analysis is limited to RT-qPCR and targeted to a few families or copies. Please use the RNA-seq data generated in U2OS cells depleted for ZMYM2 to assess retrotransposon expression genome-wide, performing both family-level and copy-level analyses, and compare with TRIM28-depleted U2OS cells.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data beyond a few additional transposable elements. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      2 - Clarify the relationship between dysregulated genes and TAD boundaries, as this seems important to support the model of distant gene regulation by the action of ZMYM2 on local chromatin environment within TADs (see comment of Reviewer #1 and #3).

      We have now provided further support for the idea that ZMYM2 functions within TADs as detailed below in response to the reviewers comments. New bioinformatics analysis has been done which is incorporated into the paper in Fig. 4D and Supplementary Fig. S4C.

      3 - Perform TRIM28 ChIP-seq in ZMYM2-kd cells, to prove that ZMYM2 indeed participates to TRIM28 recruitment to TE loci. This could be complemented by H3K9me3 ChIP-seq, to see if ZMYM2 depletion reduces H3K9me3 at retroytransposons, and over the regions surrounding ERVs. This last experiment seems also important for reinforcing the distant regulation model of nearby genes through ZMYM2-mediated repression of retrotransposons.

      As suggested by the referees below, we have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #1 (Recommendations For The Authors):

      • Figure S1D is not clear. The authors want to investigate if ADNP and ZMYM2 regulate gene expression in the same directionality. They compare the genes down in siADNP and up in siZMYM2 (or vice versa) and show very small overlaps. If I understand correctly, this shows that very few genes are regulated in opposite directions by ADNP and ZMYM2 and consequently that they tend to regulate genes in the same directionality. This is not what is said in the text page 19 ("with no clear common roles as either an activator or repressor") and should be clarified. Furthermore, to compare if ADNP and ZMYM2 regulate genes in the same directionality, there are better ways to represent this, for example scatter plots of log2 FC in ADNP kd vs ZMYM2 kd. Similar criticisms apply to Fig S3F.

      We agree that the text could be clearer and have rewritten it as “….although the large numbers of genes directionally co-regulated by these two proteins (ie either positively or negatively) indicates no clear common role as either an activator or repressor”. We have also added a scatter plot to the supplementary data (Fig. S1E) to further emphasise the common directionality of effect as suggested by the reviewer. Similarly, we changed the text and have added a scatter plot to support the conclusions on ZMYM2 and TRIM28 functional interactions (new Fig. S3I).

      • The authors suggest an indirect control of genes by ZMYM2 within TADs (Fig 4C). Yet Fig 4C does not seem to address this point. Fig 4C shows that TADs with a ZMYM2/cluster 1 peak contain more upregulated than downregulated genes, but the key question should be: are upregulated genes significantly enriched in TADs containing a ZMYM2/cluster 1 peak compared to other TADs or other genomic regions?

      We have taken this suggestion on board and determined the frequency distribution of the number of TADs containing a gene upregulated (fold change >1.6; Padj <0.01) following ZMYM2 depletion. 10,000 iterations were performed by randomly selecting 216 TADs across all 3062 TADs. The observed number of TADs containing an upregulated gene (42) from 216 TADs containing a cluster 1 ZMYM2 peak is a clear outlier in this distribution (P-value = 0.0002) (see Supplementary Fig. S4C).

      • A key question not addressed in the manuscript is whether ZMYM2 participates in the recruitment of TRIM28 to ERVs. I recommend performing TRIM28 ChIP in ZMYM2-kd cells.

      We have tested whether ZMYM2 is required for TRIM28 binding at several loci and find no evidence for this (new Supplementary Fig. S3E). We now discuss this in the results text and discussion where we already suggested that TRIM28 is likely recruited by KRAB-zinc finger proteins and ZMYM2 is subsequently recruited to this complex. Future extensive work is required to understand the mechanistic functions of ZMYM2 in these regions.

      Reviewer #2 (Recommendations For The Authors):

      Please give more details of RNA-seq analyses in the experimental section (this will be particularly important if the comment about analysing TE transcription genome-wide is acted on).

      We have now expanded on the description of the RNA-seq analysis including adding in the mapping statistics to a new Supplementary table. We followed the referee’s useful suggestion of looking at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs).

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      • The relationship of TRIM28/ZMYM2 repression of LTRs and silencing within/between TADs is interesting but underdeveloped. Upon ZMYM2 depletion, the authors observe simultaneous upregulation of genes within TADs more often than would be expected by chance, but this analysis does not distinguish "proximal to" from "in the same TAD". If a ZMYM2 binding site is X bases from a gene TSS, is it more likely to regulate that gene if it is in the same TAD? This can and should be tested bioinformatically.

      The basic question the referee is asking is whether ZMYM2 affects gene expression at a certain distance irrespective of whether the TSS of the gene is in the same TAD. We have now tested this and added text to the results section. Basically we took all of the ZMYM2 regions associated with genes upregulated by ZMYM2 depletion that resided in the same TAD and calculated the peak to TSS distance. Then we searched in the opposite direction for the TSS of genes at a similar distance (+/-25%) that resided in an adjacent TAD. We then asked whether these genes were upregulated by ZMYM2 depletion. 102 ZMYM2 peaks were positioned within these distance constraints with at least one gene in an adjacent TAD (716 genes in total). Of these genes, only 11 were upregulated following ZMYM2 depletion. There is therefore not a general spreading of deregulation around ZMYM2 peaks in a distance-dependent manner.

      Furthermore, the authors note in the text and discussion that LTRs can demarkate TAD boundaries, but this is a distinct concept from the idea that they regulate genes within a TAD. Is there evidence that ZMYM2 binding sites are found at TAD boundaries?

      We have provided more evidence to support the associations of ZMYM2 peaks with TADs and now show that they are closer than randomly expected to TAD boundaries (Fig. 4D). However they are clearly not all located very close to the boundaries.

      • The analysis of transposons expression was limited to qPCR of a handful of elements. Since the authors have conducted RNA-seq of U2OS cells depleted for both TRIM28 and ZMYM2, they can determine if certain classes of transposons are globally upregulated.

      We re-analysed our RNA-seq data using the TEtranscripts tool and looked at TE transcription genome-wide. However very few TEs were expressed at high enough levels to get any statistically significant additional data. This likely results from the relatively low read depth we used and the lack of specific protocols being followed to preserve TE transcripts. We will return to the genome-wide effects in future studies where we plan to switch cell types and will generate more bespoke datasets (the current ones were designed for analysing effects on protein coding gene expression before we made the connection to TEs). We added additional text to the results section to indicate that we could not see widespread deregulation of subclasses of TEs but that this needs further work.

      Minor Comments:

      • Typo: "human HEK393 cells". They are HEK293 cells.

      We have corrected this error.

      • "These ADNP peaks showed enrichment of binding motifs for several transcription factors with the top two motifs for HBP1 and IRF both found in over 35% of target regions (Figure 1D)." According to Ostapcuz 2018, ADNP has its own motif (CGCCCYCTNSTG). It is intriguing that this does not appear enriched in ADNP sites in U2OS cells, this seems worthy of comment.

      This is a good point, so we did an additional search using the motif found in Ostapcuk 2018 and found this in 15% of ADNP binding regions. This value is substantially lower than the 63% seen previously. It therefore is present but is not the dominant motif. This new data and its implication regarding chromatin targeting mechanisms is now discussed in the Results section around Fig. 1D.

      • Figures S2F and S2G are central to the paper and belong in the main text.

      We have now added these to the main figures as requested (meaning that Fig.2 has now been split into two separate figures {2 and 3} as became too large for a single figure).

      • A supplementary table including libraries generated and mapping statistics should be included.

      We have now added this (new Supplementary Table S2)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for submitting your article "Microhomology-Mediated Circular DNA Formation from Oligonucleosomal Fragments During Spermatogenesis" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the assessment has been overseen by a Reviewing Editor and Diane Harper as the Senior Editor.

      eLife assessment

      This study provides valuable information on the biogenesis of eccDNAs during spermatogenesis, i.e., eccDNAs in spermatogenic cells are not derived from miotic recombination hotspots but represent oligonucleosomal DNA fragments from apoptotic male germ cells, whose ends are ligated through microhomology-mediated end-joining. The study is currently incomplete because the method of bioinformatics needs more details and data interpretation should take the amplification bias into consideration.

      We highly appreciate the positive assessment of our manuscript. Following the insightful suggestions by editors and two reviewers, we have fully addressed two major concerns, i.e., the missing of method detail and the biased data interpretation.

      First, to provide the detail of our bioinformatics methods, i) We have illustrated the principle and steps of our eccDNA detection method by Figure 4C and Figure 4-figure supplement 2B, and submitted our source codes to GitHub (website); ii) We compared the performance of our methods in comparison with four established bioinformatics tools on both simulated and real datasets, and revealed that it has comparable sensitivity and specificity (Figure 4—figure supplement 2C and E), and much higher accuracy on the assignment of eccDNA boundaries (Figure 4—figure supplement 2A, D and F); and iii) we have added more description to help readers to better understand our method (see Methods – eccDNA Detection).

      Second, the amplification bias is indeed a problem of Circle-seq. Following editors’ and Reviewer #1’s insightful suggestions, we analyzed other datasets generated by amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021) and long-read sequencing (Henriksen et al., Mol Cell, 2022). We identified the presence of homologous sequences surrounding eccDNA breakpoints in both datasets (Figure 5-figure supplement 1E and F), suggesting the involvement of MMEJ-medicated ligation for the unexplored size populations of eccDNAs by Circle-seq as well. We have discussed this point and added one section to remind readers of the limitations of rolling-circle amplification-based Circle-seq (the 2nd paragraph of Discussion section).

      For your and reviewers’ convenience, all changes in the revised manuscript have been marked in red. We hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study aims to address the mechanism of eccDNA generation during spermatogenesis in mice. Previous efforts for cataloging eccDNA in mammalian germ cells have provided inconclusive results, particularly in the correlation between meiotic recombination and the generation of eccDNA. The authors employed an established approach (Circle-seq) to enrich and amplify eccDNA for sequencing analyses and reported that sperm eccDNA is not associated with miotic recombination hotspots. Rather, the authors reported that eccDNAs are widespread, and oligonucleosomal DNA fragments from sperm undergoing apoptosis, with the ligation of DNA ends by microhomology-mediated end-joining, would be a major source of eccDNA.

      The strength of the study includes evaluating the eccDNA contents not only in sperm but also from earlier stages of cells in spermatogenesis. The differences in eccDNA size peaks between sperm and other progenitors, in particular, the unique peak in sperm around 360 bp, are intriguing. Results from sequencing data analysis were presented elegantly.

      We are grateful to Reviewer #1 for his or her recognition of the strength of this study.

      I also have critiques. First, the lack of eccDNA quality control step is a concern. Previous studies employed electron microscopy to ensure that DNA species are mostly circular before rolling-circle amplification. Phi29 polymerase is widely used for DNA amplification, including whole genome amplification of linear chromosomal DNA. Phi29 polymerase has a high processivity and strand displacement activity. When those activities occur within a molecule, it creates circular DNA from linear DNA in vitro. In vitro-created eccDNA from linear DNA would be randomly distributed in the genome, which may explain the low incidence of common eccDNA between replicates. Therefore, it will be crucial to show that DNA prior to amplification is dominantly circular. Electron microscopy would be challenging for the study because the relatively small number of cells were processed to enrich eccDNA. An alternative method for quality controls includes spiking samples with linear and circular exogenous DNA and measuring the ratios of circular/linear control DNA before and after column purification/exonuclease digestion. eccDNA isolation procedures can be validated by a very high circular/linear control DNA ratio.

      We greatly appreciate Reviewer #1's valuable suggestions. We have introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures according to Reviewer #1's suggestion. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (new Figure 1-figure supplement 2A). These results affirm the high selectivity of our protocol in enriching eccDNAs.

      Another critique is regarding the limitation of the study. It is important to remind the readers of the limitations of the study. As the authors mentioned, rolling circle amplification preferentially increases the copy numbers of smaller eccDNA. Therefore, the native composition of eccDNA is skewed. In addition, the candidate eccDNAs are identified by split reads or discordant read pairs. The details of the mapping process are unclear from the methods, but such a method would require reads with high mapping quality; the identification of eccDNA is expected to require sequencing reads that are mapped to genomic locations uniquely with high confidence, and reads mapped to more than one genomic location, such as highly similar repeat sequences or duplications, are eliminated. Such identification criteria would favor eccDNA formed by little or no homology at the junction sequences, and eliminate eccDNA formed by long homologies at the ends, such as eccDNA formed exclusively by satellite DNA. Therefore, it is not surprising that the authors found the dominance of microhomology-mediated eccDNA. It remains to be determined whether small eccDNA with microhomologies are the dominant species of eccDNA in the native composition. In this regard, it is noted that similar procedures of eccDNA enrichment (column purification, exonuclease digestion, and rolling circle amplification ) revealed variable sizes and characteristics of eccDNA in sperm (human from Henriksen et al. or mice from this study), dependent on the methods of sequencing (long-read or short-read sequencing). Considering these limitations, the last sentence of the introduction, "We conclude that germline eccDNAs are formed largely by microhomology mediated ligation of nucleosome protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots" needs to be revised.

      We thank Reviewer #1 for bringing attention to the limitations of the study. Since rolling circle amplification preferentially increases the copy numbers of smaller eccDNA, the exact size distribution of eccDNA in native composition is yet to be determined. As pointed out by Reviewer #1, our mapping and eccDNA detection processes might indeed introduce some biases since we only focused on uniquely-mapped reads. We have addressed and incorporated Reviewer #1’s perspectives in our revised manuscript, as detailed in the 2nd paragraph of Discussion section.

      Despite these limitations, microhomology mediated ligation of DNA fragments seems to be the major mechanism of eccDNA biogenesis nonetheless. We analyzed eccDNA datasets generated through long-read sequencing (Henriksen et al., Mol Cell, 2022) or amplification-free strategies (Mouakkad-Montoya et al., PNAS, 2021). Although these eccDNAs represented size populations that were largely missed by this study, our sequence feature analyses also revealed the presence of homologous sequences surrounding eccDNA breakpoints, as depicted in the newly added Figure 5-figure supplement 1E and F. Considering that we could not totally overcome these biases in this study, we have toned down some statements and revised the last sentence of the introduction as follows: “We conclude that germline eccDNAs are likely formed by microhomology mediated ligation of nucleosome-protected fragments, and barely contribute to de novo genomic deletions at meiotic recombination hotspots.”

      Small eccDNA (microDNA) data from various mouse tissues are available from the study by Dillion et al., (Cell Reports 2015). Authors are encouraged to examine whether the notable findings in this study (oligonucleosomal-sized eccDNA peaks and the association with apoptotic cell death) are unique to sperm or common in the eccDNA from other tissues.

      We are thankful to Reviewer #1 for this suggestion. We analyzed eccDNA data from various mouse tissues (Dillion et al., Cell Rep, 2015) to see whether our findings are unique to sperm or common for other tissues. Sequence-based prediction revealed significantly higher nucleosome occupancy probability for ~180 bp and ~360bp eccDNA regions, suggesting their origin from oligonucleosomal fragments (Figure 5-figure supplement 1A). In contrast to simulated controls (~20%), more than 1/3 of eccDNAs had microhomologous sequences, most of which were shorter than 5bp (Figure 5-figure supplement 1B). The remaining 2/3 of eccDNAs had the same sequence motifs between eccDNA starts and sequences following eccDNA ends, and between eccDNA ends and sequences in front of eccDNA starts (Figure 5-figure supplement 1C). The genomic distribution of eccDNAs closely matched with that of eccDNAs whose generation was dependent on apoptotic DNA fragmentation (new Figure 5-figure supplement 1D). Altogether, these results indicate microhomology directed ligation of oligonucleosomal fragments in apoptotic cells significantly contributes to eccDNA biogenesis in different mouse tissues. We have described this part in the revised manuscript (see the last 2nd paragraph of Results section).

      Reviewer #2 (Public Review):

      This study presents a useful investigation of eccDNAs in spermatogenesis of mouse. It provides evidence about the biogenesis of eccDNAs and suggests that eccDNAs are derived from oligonucleosmal DNA fragmentation during apoptosis by MMEJ and may not be the direct products of germline deletions. However, the method of data analyses were not fully described and data analysis is incomplete. It provides additional observations about the eccDNA biogenesis and can be used as a starting point for functional studies of eccDNA in sperms. However, many aspects about data analyses and data interpretations need to be improved.

      We thank Reviewer #2 for his or her critical reading. We have provided more method details, performed additional analyses and made some clarifications in our revised manuscript (see below).

      • Most of the conclusions made by the work are only based on the bioinformatics analyses, the validation of these foundlings using other method (biochemistry/molecular biology method) are missing. For example, no QC results presented for the eccDNA purification, which may show whether contaminates such as linear DNA or mitochondria DNA have been fully removed. Additionally, it is also helpful to use simple PCR to test the existence of identified eccDNAs in sperm or other samples to validate the specificity of the Circle-seq method.

      Following both this Reviewer’s and Reviewer #1’s suggestions, we performed quality control of eccDNA purification. First, we introduced an exogenous circular DNA (pUC19) into our samples and measured its abundance relative to a linear DNA locus (H19 gene) before and after eccDNA isolation procedures. As anticipated, we observed significant enrichment of pUC19 following eccDNA isolation (Figure 1-figure supplement 2A). Second, mitochondria DNA is supposed to be cleaved into linear DNA by PacI and degraded by exonuclease. As expected, the abundance of mitochondria DNA significantly decreased after eccDNA isolation procedures (Figure 1-figure supplement 2B). Third, we performed PCR using outward primers and validated three randomly-selected eccDNAs (Figure 1-figure supplement 2C).

      • The reliability of the data analysis methods is uncertain, as the authors constructed and utilized their own pipeline to identify eccDNAs, despite the availability of established bioinformatics tools such as ECCsplorer, eccFinder, and Amplicon Architect. Moreover, the lack of validation of the pipeline using either ground truth datasets or simulation data raises concerns about its accuracy. Additionally, the methodology employed for identifying eccDNA that encompasses multiple gene loci remains unclear.

      We thank Reviewer 2 for pointing out this problem. In the original version of our manuscript, focusing on one eccDNA dataset generated in this study, we have compared the performance between our method and established methods for identification of eccDNA regions, such as Circle_finder, Circle_Map and ecc_finder. Our method has comparable sensitivity and specificity with existing methods, especially Circle_finder and Circle_Map (original Figure 4—figure supplement 2C). We also used one specific genomic region to show that existing methods identified the same eccDNA regions but misassigned the eccDNA boundaries (original Figure 4—figure supplement 2A). In the revised manuscript, we have further included ECCsplorer for comparison. Since Amplicon Architect is more specifically designed for detection of ecDNAs, it was not included in our comparison. Following Reviewer #2’ suggestions, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all methods for eccDNA identification. In total, 97.9%, 97.9%, 97.4%, 95.3% and 91.1% eccDNA regions could be detected by our method, Circle_Map, Circle_finder, ecc_finder and ECCsplorer, respectively (Figure 4—figure supplement 2C). This result suggest that our method has comparable performance in detecting eccDNA regions. However, only our method could faithfully assign breakpoints with 97.4% accuracy, in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      As pointed out by Reviewer #2, similar to ECCsplorer, Circle_finder, Circle_Map and ecc_finder, our method fails to identity eccDNAs that encompass multiple gene loci. We have reminded readers of this limitation in our revised manuscript. Besides the schematic workflow (Figure 4—figure supplement 2B), we have included more method details to help readers better understand how our method works (see Methods – eccDNA Detection).

      • Although the author stated that previous studies utilizing short-read sequencing technologies may have incorrectly annotated eccDNA breakpoints, this claim requires careful scrutiny and supporting evidence, which was not provided in the manuscript.

      Following this Reviewer’s suggestions, we conducted a systematic evaluation of the performance of various existing methods, namely Circle_finder, Circle_Map, ECCsplorer and ecc_finder, for eccDNA breakpoint annotation.

      First, we simulated paired-end reads derived from a set of eccDNAs with homologous sequences around breakpoints and employed all different methods for eccDNA identification. As expected, our method could correctly assign breakpoints for 97.4% eccDNAs (Figure 4—figure supplement 2D), in contrast to no more than 15% by other methods (Figure 4—figure supplement 2D).

      Second, we examined the performance of all methods on one dataset generated in this study. Our method detected 59,680, 54,898, 32,993 and 22,019 eccDNAs with homologous sequences that were also detected by Circle_finder, Circle_Map, ECCsplorer and ecc_finder, respectively. Remarkably, we observed that at least 60% of breakpoints were misannotated by the existing methods (Figure 4—figure supplement 2F).

      We have included an example in Figure 4—figure supplement 2A, where all existing methods incorrectly annotated the eccDNA breakpoints when homologous sequences were present. These results highlight the advantage of our method over existing methods in accurately annotating eccDNA breakpoints in the presence of homologous sequences.

      • The similarity between the eccDNA profiles of human and mouse sperm remains uncertain, and therefore, analyses of human eccDNA data and comparisons between the two are necessary if the authors claim that their findings of widespread eccDNA formation in mouse spermatogenesis extend to human sperms.

      Our Fig. 5 have shown that human sperm eccDNAs are originated from oligonucleosomal fragmentation (Fig. 5A-C), not associated with meiotic recombination hotspots (Fig. 5D and E) but formed by microhomology directed ligation (Fig. 5F and G). These findings are consistent with what we observed in mouse sperm eccDNAs. To further substantiate our findings, we analyzed an additional eccDNA dataset from human sperms generated by long-read sequencing (Henriksen et al., Mol Cell, 2022). Although predominantly composed of large-sized eccDNAs, the analysis of sequence features also indicated their association with microhomology directed ligation (Figure 5-figure supplement 1E). Overall, the eccDNA profiles in human and mouse sperm exhibit notable similarities.

      Reviewer #1 (Recommendations For The Authors):

      In the last sentence of the abstract, the authors stated, "provide a potential new way for quality assessment of sperms." There is no basis for the claim in the abstract. The authors need to mention the association of eccDNA with apoptosis somewhere to claim it.

      We have revised the Abstract as suggested.

      Some of the references need to be clarified. For example, Coquelle et al., 2002 described the BFB cycles and common fragile sites, but the report does not seem to be relevant to eccDNA. Mouakkad-Montoya et al., 2021 enriched eccDNA without rolling-circle amplification.

      Thanks for pointing this out. We cited Coquelle et al., 2002 to list known biogenesis mechanisms for ecDNAs but not eccDNAs. We have deleted Mouakkad-Montoya et al., 2021 in our revised manuscript, as it did not involve rolling-circle amplification.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear why the authors took 3000bp as the cutoff to divide eccDNAs into short and long categories. How many long eccDNAs in these samples?

      Henriksen et al identified size range of sperm eccDNAs as ~3–50 kb. We therefore used 3kb as an arbitrary cutoff to better compare two different eccDNA populations with those reported by Henriksen et al. SPA, RST, EST and sperm cells have 278, 609, 373 and 691 eccDNAs respectively that are longer than 3000bp. We have clarified this in the revised manuscript.

      • In figure 2D,2E, what is the zero point in the heatmaps? The 5', 3' end or center of eccDNA? Please make it clear in figure and main text.

      The zero point represents the center of eccDNA regions. We have clarified this in the revised manuscript.

      • In line 245, the author mentioned that "periodic distribution of nucleosomes was observed for ~360bp eccDNAs but not for ~180bp ones, indicating that eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays (Figure 2E)". Please explain how to make the conclusion from the Figure 2E?

      Taking the H3K27me3-marked nucleosome as an example, vertical stripes were distributed every ~180bp for ~360bp eccDNAs, as shown by heatmap (more evident if in an enlarged view), and periodic signal distribution was apparent for ~360bp eccDNAs (Figure 2E), as shown by meta-gene analysis on top of heatmap (Figure 2B). However, such pattern was not observed for ~180bp eccDNAs. Similar results could also be observed for nucleosomes marked with other histone variants and histone modifications (H3, H3K27ac, H3K4me1, H3K9ac, H3K36me3, H3K9me3 in Figure 2E). Thus, eccDNAs from di-nucleosomes but not mono-nucleosomes preferentially originate from well-positioned nucleosome arrays in sperm.

      • In line 261, the author mentioned: "the large-sized sperm eccDNAs detected in this study also displayed weak but apparent negative correlation with gene density and Alu elements (Figure 3C and D)". However, the data didn't show the "apparent negative correlation", as only one or two data points may support this conclusion and the p-values are not even close to 0.05.

      Many thanks for pointing this out. We have toned down this statement as “the large-sized sperm eccDNAs detected in this study displayed a weak negative correlation with gene density or Alu elements (Figure 3C and D)”.

      • The enrichment of both active (H3K27ac, H3K9ac) and repressive (H3K9me3) histone markers in the original loci of eccDNA poses an intriguing question: how can this seemingly contradictory pattern be explained? In the H3K9me3 heatmap, the average level of H3K9me3 in eccDNA is lower than control's, how to interpret the result?

      We found that small-sized eccDNAs were more enriched at H3K27ac-marked euchromatin regions (Figure 2C-E and 3A), while large-sized ones were more enriched at H3K9me3-marked heterochromatin regions (Figure 3A). This is probably because heterochromatin regions are too condensed to be fragmented into smaller pieces for small-sized eccDNA formation, in comparison with euchromatin regions. We have included this information in our revised manuscript.

      H3K9me3 histone marks are enriched at repeat sequences that are widely distributed within the mouse genome. Moreover, the H3K9me3 ChIP-seq dataset we analyzed in this study had the highest number of ChIP-seq peaks, compared to ChIP-seq datasets of other histone modifications. Thus, even random control would probably have stronger ChIP-seq signals than small-sized eccDNAs (e.g., ~180bp or ~360bp eccDNAs) that were preferentially generated from active regions.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Developing vaccination capable of inducing persistent antibody responses capable of broadly neutralizing HIV strains is of high importance. However, our ability to design vaccines to achieve this is limited by our relative lack of understanding of the role of T-follicular helper (Tfh) subtypes in the responses. In this report Verma et al investigate the effects of different prime and boost vaccination strategies to induce skewed Tfh responses and its relationship to antibody levels. They initially find that live-attenuated measles vaccine, known to be effective at inducing prolonged antibody responses has a significant minority of germinal center Tfh (GC-Tfh) with a Th1 phenotype (GC-Tfh1) and then explore whether a prime and boost vaccination strategy designed to induce GC-Tfh1 is effective in the context of anti-HIV vaccination. They conclude that a vaccine formulation referred to as MPLA before concluding that this is the case.

      Clarification: MPLA serves as the adjuvant, and the vaccine formulation is characterized as a Th1 formulation based on the properties of the adjuvant.

      Strengths: While there is a lot of literature on Tfh subtypes in blood, how this relates to the germinal centers is not always clear. The strength of this paper is that they use a relevant model to allow some longitudinal insight into the detailed events of the germinal center Tfh (GC-Tfh) compartment across time and how this related to antibody production.

      Weaknesses: The authors focus strongly on the numbers of GC-Tfh1 as a proportion of memory cells and their comparison to GC-Tfh17. There seems to be little consideration of the large proportion of GC-Tfh which express neither CCR6 and CXCR3 and currently no clear reasoning for excluding the majority of GC-Tfh from most analysis. There seems to be an assumption that since the MPLA vaccine has a higher number of GC-Tfh1 that this explains the higher levels of antibodies. There is not sufficient information to make it clear if the primary difference in vaccine efficacy is due to a greater proportion of GC-Tfh1 or an overall increase in GC-Tfh of which the percentage of GC-Tfh1 is relatively fixed.

      We appreciate the reviewer's comment. Indeed, while there is substantial literature on Tfh subtypes in blood, the strength of our study lies in utilizing a relevant model to provide longitudinal insights into the dynamics of the germinal center Tfh (GC-Tfh) compartment over time and its relationship to antibody production. Regarding the concern about the comprehensive analysis of GC Tfh subsets, including GC-Tfh1, GC-Tfh17, and others not expressing CCR6 and/or CXCR3, we fully acknowledge its importance. To address this, we will conduct a detailed analysis of GC Tfh and GC Tfh1 frequencies, encompassing subsets without CCR6 and CXCR3 expression, to provide a more comprehensive view of the GC-Tfh population in our analysis.

      Reviewer #2 (Public Review):

      Summary:

      Anil Verma et al. have performed prime-boost HIV vaccination to enhance HIV-1 Env antibodies in the rhesus macaque model. The authors used two different adjuvants, a cationic liposome-based adjuvant (CAF01) and a monophosphoryl lipid A (MPLA)+QS-21 adjuvant. They demonstrated that these two adjuvants promote different transcriptomes in the GC-TFH subsets. The MPLA+QS-21 adjuvant induces abundant GC TFH1 cells expressing CXCR3 at first priming, while the CAF01 adjuvant predominantly induced GC TFH1/17 cells co-expressing CXCR3 and CCR6. Both adjuvants initiate comparable Env antibody responses. However, MPLA+QS-21 shows more significant IgG1 antibodies binding to gp140 even after 30 weeks.

      The enhancement of memory responses by MPLA+QS-21 consistently associates with the emergence of GC TFH1 cells that preferentially produce IFN-γ.

      Strengths:

      The strength of this manuscript is that all experiments have been done in the rhesus macaque model with great care. This manuscript beautifully indicated that MPLA+QS-21 would be a promising adjuvant to induce the memory B cell response in the HIV vaccine.

      Weaknesses:

      The authors did not provide clear evidence to indicate the functional relevance of GC TFH1 in IgG1 class-switch and B cell memory responses.

      We appreciate the recognition of our meticulous work in the rhesus macaque model and the potential of MPLA+QS-21 as an adjuvant for HIV vaccine-induced humoral immunity. We acknowledge the need to provide clearer evidence of the functional relevance of GC Tfh1 in IgG1 class-switching and B cell memory responses. We will attempt to address this concern in our revisions.

    1. Author Response:

      We thank the editors and reviewers for their thoughtful and constructive assessment of our manuscript. In the upcoming revision process, we plan to address key concerns highlighted by the reviewers. While the bulk of our data involved the use of chemical SOD1 inhibitors, we intend to assess their on-target efficacy by measuring SOD activity after treatment. Additionally, we plan to perform key experiments to measure oxidative stress and DNA damage in SOD1-deletion cell lines to compare against the effects of chemical SOD1 inhibition. We acknowledge the lack of consideration for SOD2 and plan to explore changes in mitochondrial SOD2 expression and function in PPM1D-mutant cells at baseline and after SOD1-deletion. We will refine the text to clarify the data interpretation and elaborate on the limitations of our study in the discussion. Altogether, we thank the reviewers for their suggestions to improve our study and we hope that these additional experiments will provide additional evidence that SOD1 is a dependency in PPM1D-mutant leukemia cells.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript by Liu et al entitled "Discovery and biological evaluation of a potent small molecule CRM1 inhibitor for its selective ablation of extranodal NK/T cell lymphoma" reports the identification of a novel CRM1 inhibitor and shows its efficiency against extranodal natural killer/T cell lymphoma cells (ENKTL).

      This is a very timely and very original study with potential impact in a variety of pathologies not only in ENKTL. However, the main conclusions of the work are not supported by experimental evidence.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is original with considerable translational impact to the field. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      The study claims that LFS-1107 reversibly inhibits the nuclear export receptor CRM1 but the authors only show that the compound binds to CRM1 and that the CRM1 substrate IκBα accumulates in the cell nucleus upon LFS-1107 treatment. The evidence is indirect and alternative scenarios are certainly possible.

      Many thanks for this critical comment. We have conducted extra experiments to demonstrate that LFS-1107 can reversibly inhibit the nuclear transport machinery mediated by CRM1. Namely, culturing the medium for two hours after LFS-1107 treatment restored the transport of IκBα from the nucleus to the cytoplasm. Please see Figure 2 -Figure Supplement 3 for more details.

      On the other hand, the manuscript is not always well-written and insufficiently referenced.

      Thanks for this critical comment. This has been fixed. We have checked through the manuscript with extensive language editing. Moreover, we have added more references to the manuscript.

      The nuclear translocation in figure 2G is not convincing. The western blot in figure 2G shows that LFS-1107 treatment induces IκBα expression, and both cytoplasmic and nuclear amounts increase in a dose-dependent manner. Together, these data do not support nuclear IκBα accumulation upon LFS-1107 treatment.

      Thanks for this critical comment. This has been fixed. We have reconducted the Western experiments and our results revealed that only nuclear IκBα amount was increased upon the treatment of LFS-1107. In contrast, cytoplasmic IκBα amount was decreased after the treatment of LFS-1107. Please see Figure 2J for more details.

      Reviewer #2 (Public Review):

      Indeed, ENKTL is a rather deadly tumor with unmet medical needs. The work is novel in the sense that they designed and identified a very potent inhibitor homing at CRM1 via a deep-reinforcement learning model to suppress the overactivation of NF-κB signaling, an underlying mechanism of ENKTL pathogenesis. The authors demonstrated that LFS-1107 binds more strongly with CRM1 (approximately 40-fold) as compared to KPT-330, an existing CRM1 inhibitor. Another merit of the small-molecule inhibitor is that LFS-1107 can selectively eliminate ENKTL cells while sparing normal blood cells. Their animal results clearly demonstrated that the small-molecule inhibitor was able to extend mouse survival and eliminate tumor cells considerably. Overall, the manuscript may provide a possible therapeutic strategy to treat ENKTL with a good safety profile. The manuscript is also well-written. The weakness of the manuscript is that some details for the design and evaluation of the small-molecular inhibitor are missing.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is relatively novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      “The authors use hM4Di to "silence" Fos-tagged neurons in the basal forebrain, but they have not validated the efficiency or the possible various effects of this reagent.

      It is possible that hM4Di actually has a relatively small effect on suppressing the AP activity of neurons. Nevertheless, hM4Di might still be an effective manipulation, because it was shown to additionally reduce transmitter release at the nerve terminal (see e.g. Stachniak et al. (Sternson) 2014, Neuron). Thus, the authors should evaluate in control experiments whether hM4Di expression plus CNO actually electrically silences the AP-firing of ChAT neurons in the BF as they seem to suggest, and/or if it reduces ACh release at the terminals. For example, one experiment to test the latter would be to perfuse CNO locally in the BLA; after expressing hM4Di in the cholinergic neurons of the BF. At the very least, the assumed action of hM4Di, and the possible caveats in the interpretation of these results should be discussed in the paper.”

      We find that activation of hM4Di with clozapine in basal forebrain cholinergic neurons results in clear alterations to neuronal activation in projection targets and in behavior (Figures 3, Figure 3-Supplement 1, Figure 5, Figure 5-Supplement 1, Figure 5-Supplement 2, Figure 6-Supplement 1 and Figure 8). Previous studies demonstrated that activation of hM3Dq or hM4di in cholinergic neurons results in changes to electrical activity and behavioral response (Zhang et al. 2017 & Jin et al. 2019). Though we are unable to distinguish whether the effects on behavior in our experiments are a result of decreases in ACh release at terminals, inhibition of action potential firing, or both, our behavioral findings are consistent with demonstrations that inhibition of basal forebrain cholinergic neurons can alter behavior. See Page 17 Lines 488-493 for a discussion.

      “The names of brain areas like "NBM/SIp" and "VP-SIa" need to be better introduced, and somehow contextualized (in the Introduction, and also at first reading in the Results).”

      We agree that our prior presentation of these regions was confusing and in general the boundaries of these regions are not well-defined in the field. We have included a description of anatomical landmarks and bregma coordinates to clarify our definitions of the regions NBM/SIp (Page 4 Line 103-104) and VP/SIa (Page 4 Line 107-108).

      “Figure 3C: Application of CNO on the memory recall day leads to a strong reduction in CS-driven freezing. However, in this experiment, and also in Fig. S7, the pre-tone value of freezing is also strongly reduced. This would indicate that the activity of NBM/SIp cells (or else, ACh-release from these cells - see also Major point 1), also influences contextual learning. The authors should, first, statistically, test these effects (I am not sure this was done). If these differences are significant, a possible role of ACh in contextual fear learning should be discussed. Has it been shown before whether ACh is involved in contextual fear learning? Does this indicate the involvement of another target area of ACh neurons (e.g., the hippocampus?).”

      We statistically compared the pre-tone freezing response between Sham and hM4Di groups across our experiments and found no significant differences in pre-tone freezing between the groups (Figure 3D- Sham vs. ADCD-hM4Di, Pre-tone p=0.3544; Figure 5B- Sham vs. hM4di, Pre-tone p=0.0679; Figure 5C- Sham vs. hM4Di, Pre-tone p=0.0966; Figure 5-Supplement 2A- Sham vs. hM4Di, Pre-tone p>0.99). These comparisons can also be reviewed in the statistical reporting table uploaded along with the manuscript.

      “The discussion could be improved by better comparing what they found, to the wider literature. For example, previous papers studying other neuromodulatory systems found evidence for a modulation of neuromodulator release after learning, e.g. see Martins and Froemke 2015 Nat. Neuroscience for the noradrenergic system, Tang et al. (Schneggenburger lab) 2020 J. Neuroscience for the dopaminergic system and fear learning; and Uematsu et al., 2017, Nat. Neuroscience for the noradrenergic system and fear learning. Maybe the authors could include these and similar references when revising their discussion to take into account a broader view of previous findings related to other neuromodulatory systems.”

      Our study joins the growing body of literature demonstrating stimulus-encoding and rapid stimulus-contingent responses in various neuromodulatory systems in learning and memory recall. We have now added a substantial discussion, detailing both the similarities and differences between our findings and those found in the dopaminergic, serotonergic, noradrenergic, and oxytocinergic systems in fear learning. See Pages 20-21 Lines 575-605.

      Reviewer 2 (Public Review):

      “Throughout the paper, the authors use comparisons of cell activity between groups to address questions about projection-specific and cue-specific cell activation and reactivation. However, statistical comparisons are sometimes done between biological replicates (e.g. Fig. 5A), whereas a lot of them are done between technical replicates (e.g. Fig. 2B, 5B, 7B). Adding statistics that compare biological replicates would help increase confidence in the results.”

      We have replotted our data as a comparison of biological replicate (by individual animal) in new versions of Figures 1-8, and Figure 1-Supplements 1-3, Figure 5-Supplements 1 & 2, Figure 6-Supplements 1 & 2, Figure 7-Supplement 1, and Figure 8-Supplement 1. Correspondingly, all statistical analyses have been conducted comparing biological replicates. To note, these changes have not changed the overall conclusions of each figure. The sample size, statistical test and p-values for our comparisons are included in the figure legends and in the newly included statistical reporting table.

      "To demonstrate engram-like specificity, in figure 4C the authors show fold change in cholinergic reactivation in low and high responders (animals that show low and high defensive freezing upon cue presentation) as normalized by cell activity while sitting in the home cage. However, the authors also collected a better control for this comparison, which is shown in figure S4, where the animals were exposed to an unconditioned tone cue. Comparing fold change to this tone-alone condition would provide stronger evidence for the authors' point, as this would directly compare the specificity of cholinergic reactivation to a conditioned vs an unconditioned cue. A discussion of the same comparison is relevant for figure 2 (and is shown in figure S4) but is not mentioned in the text.”

      We have evaluated the cholinergic response to the tone using GRABACh3.0 as a readout of ACh release in the BLA, and using IEG expression as a readout of cholinergic neuron activation. We find no significant increase in ACh release in the BLA in response to tone presentation (Figure 1C-left, 1D-left) and no significant increase in tone associated reactivation of cholinergic neurons (using IEG as a readout, 2C/D, Figure 1-Supplement 2, Figure 1-Supplement 3, Figure 6-Supplement 1A) unless the tone has been previously paired with a foot shock(see Figure 1C-right, 2C, 3D). In addition, we find no statistically significant differences between home cage and tone alone conditions (Figure 2C – home cage-home cage condition vs. tone-tone condition, p=0.5012; Based on these analyses, we use the home cage group as our control group for comparison.

      “The significant correlation between cue-evoked percent change in defensive freezing from pretone and fold change in cholinergic cell activity relative to the home cage that is shown in figure 4D is somewhat confusing. Is the correlation considering all the points shown (high and low responders as depicted by black and grey points)? It's first reported as one correlation but then is discussed as two populations that have different results. Further, is the average amount of reactivation for the home-cage controls used here the same denominator for each reported animal? Similarly to the point above, a correlation looking at fold change from tonealone would also be helpful to determine the degree to which cholinergic reactivation is specific to threat-association learning versus the more general attentional component that this system is known for.”

      We have substantially modified this figure, now new Figure 6, to clarify our point. Along with this revision, we have removed the correlation plots and corresponding analyses from the revised version of the manuscript and figures.

      Figure 6 now begins with behavior data from a distinct cohort of mice outlining our criteria for high vs. low responders (Figure 6A/B). In Figure 6C, conducted in a separate cohort of mice that only underwent behavioral testing to clarify the definition of high vs. low responders, we note via schematic that ADCD labeling was carried out during the recall session (unlike Figure 2). In panel D, we show fold change of activated cholinergic neurons stratified by High vs. Low responder status. This fold change is normalized to the average activation from the home cage control animals in each experimental cohort. Taken together we find animals with a ~2 fold increase in activation of cholinergic neurons display significant, distinguishable freezing in response to the tone as compared to pretone freezing. We find that this cluster of activated neurons is segregated to the anterior NBM/SIp (Figure 6E).

      Regarding the involvement of cholinergic reactivation tone response (attention) rather than learning - in Figure 1-Supplement 3, we evaluate ACh release and behavioral response in mice that were exposed to three shocks alone (no tone) on day 1 and then exposed to a single (novel) tone on day 2. In these mice we find no significant change in ACh release in the BLA in response to tone, and no significant increase in freezing behavior in response to the tone. In Figure 2D, we evaluate reactivation of cholinergic neurons in a similar context and find that this group does not significantly differ from the home cage → home cage group. Further, we present that this home cage group does not significantly differ from Low Responders. As such, we find significant reactivation of cholinergic neurons in animals with increased responsiveness to the CS tone during the recall session (High Responders).

      “The compelling argument of this paper is that the authors are separating out the general attention role typically attributed to the cholinergic system from a more specific, engram-based role. Given the importance of untangling this, it would useful to see the recorded traces and behavioral scoring for the data shown in figure S2B. For example, was the higher slope in the recorded cholinergic response during unconditioned tone 1 also accompanied by an increase in freezing, which later went away with additional non-reinforced tones? Given that the animals were not habituated to tones (according to the Methods), this activity could be related to a habituation/general attention response, which may then be weaker than the learned response.”

      We include individual traces of GRABACh3.0 release in the BLA in response to the unconditioned tone from a protocol with 3x tone presentation on Day 1 and tone presentation on Day 2 (Figure 1-Supplement 2C). We have also included average + SEM traces for the entire duration of the tone presentation for the three unconditioned tones in this paradigm along with an inset showing 1s before and after tone onset (Figure 1Supplement 2D). Finally, we include individual traces of GRABACh3.0 release in the BLA in response to the first (naïve) tone from mice that underwent the training (tone + shock) followed by recall (tone) paradigm in Figure 1-Supplement 4C, left. None of the unconditioned tone responses were statistically significantly different from the preceding baseline. Instead, we find the learned response is significantly higher than the response baseline (Figure 1D).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used MD simulations to investigate the role of N-terminal myristoylation and the presence of two SH domains on the allosteric regulation of c-Abl kinase. Standard established MD simulation methods and analyses were applied, including the force distribution analysis (FDA) method developed by Grater et al. some time ago.

      The system is large and the conformational changes are complicated. In light of this, and aggravated by the fact that direct comparison with - and critical testing against - experimental data is not possible in the present case, I consider the overall simulation times to be rather short (several repeats, but only 500 ns). So there might be statistical convergence issues. Especially also because at least some of the starting structures were generated from available experimental structures after some modifications/modelling, and they might thus be out of equilibrium and need some time to fully relax during the MD simulations.

      Unfortunately, I cannot find any convergence tests concerning the length of the simulations, which are usually considered to be standard analyses (Appendix Fig. 5 shows the effect of different thermostats and capping of the peptide chain, but no tests concerning simulation time). This could be critical in the present case, where the authors acknowledge themselves (e.g., on p. 4) that there are only subtle differences between the different simulation systems and the variations within a given system are larger than the relevant (putative) differences between systems (Fig. 1 C, D, E).

      We thank the reviewer for taking the time and critically assessing our manuscript. We appreciate and have addressed the raised concerns as follows. We have quadrupled the simulation time to 2 µs for 20 out of the 30 replicates and show the updated results for these. We refer the reviewer to the modified Fig. 2 and 3 (former Fig. 1 and 2) with the updated data. Our main conclusions remained unchanged, namely that Myr unbinding shifts the overall kinase domain dynamics towards an active state. We furthermore still observe allosteric signal propagation from the Myr binding site to the active site along the alpha_F helix and a collaborative effect of Myr and the SH domains. Only some minor points were not confirmed after analyzing the longer simulations, for example the force differences transmitted to the A-loop upon SH domain binding/unbinding (former Fig. 2D), and changes in amplitude of N- and C-lobe opening upon Myr unbinding (former Fig. 1E). Furthermore, to demonstrate convergence, we added block and autocorrelation analyses for Fig. 1 (now Fig. 2) to Fig. 2 – fig supplement 3, and observed good convergence across all systems. Finally, we also increased simulation times of the umbrella sampling from 50ns to 200ns, again without that the quantitative trends and our conclusions have changed (see also next point).

      Issues with statistical convergence are expected not only for the standard MD simulations but also for the umbrella sampling simulations, as 50 ns sampling per window is nowadays not considered state of the art and is likely insufficient for quantitative binding free energy calculation, especially for membranes (see, e.g., DOI 10.1021/ct200316w). However, worrying about this latter aspect might neither be useful nor needed, because in our view the statement that myristoyl groups can bind to the membrane and that they can compete with binding in the hydrophobic protein pocket can hardly be considered a surprise and would not have required any simulation at all in my view because the experimental K_D values are available (Table 1). The very unfavourable K_d values for unbinding of Myr from both the hydrophobic protein pocket as well as from the membrane in fact show that this is not how it is expected to work in reality. The fully solvated state will be avoided due to its high free energy. Instead, isn't the myristoyl expected to directly transition from the pocket into the membrane, after membrane binding of the kinase in a proper orientation?

      The experimental values were determined with different methods, i.e. estimated from zeta potential measurements in case of the membrane and calorimetry, which only considered the kinase domain instead of the SH3-SH2-kinase complex, in case of Abl. We thus found it appropriate to perform Umbrella Sampling simulations to ensure comparability. Additionally, these allowed us to study the effects of different alpha_I helix conformations, which had a significant impact on the free energy of Myr unbinding, precisely Abl with a partially unfolded helix reflected the experimental energy better than the crystal structure with a kinked helix. We highlight this more explicitly in the corresponding Discussion section. Regarding the simulation time per sampling window, we did a block analysis (Fig. 5 – fig supplement 1) as suggested in the cited reference and also extended the time of each sampling window from 50 ns to 200 ns. This did not significantly alter the results and, importantly, the relative differences between Abl and the membrane stayed the same and are in good agreement with the experimental values.

      Concerning the metadynamics simulations, these are usually done to obtain a free energy landscape. Why was this not attempted here? In the present case, the authors seemed to have used metadynamics only for generating starting structures, with different degrees of helicity of the alpha_I part, for subsequent standard MD simulations. Not surprisingly, nothing much happened during the latter, and conformers with kinked/partially unfolded alpha_I as well as conformers with straight alpha_I were both found to be "stable", at least on the short simulation time scale. It could also not be expected that the SH domain would spontaneously detach in response to helix straightening - again, this would require much longer simulation times than 500 ns. Nevertheless, alpha_I straightening might very well reduce the binding affinity towards SH - this can only be explicitly studied with free energy simulations, however.

      Our main goal was indeed to achieve different alpha_I helix conformations for subsequent Umbrella Sampling simulations, and found that helix formation is in principle possible without SH2 domain unbinding. We would like to emphasize the impact of the different helix conformations on the free energy of Myr unbinding, which further highlights the need to investigate these structures. We chose Metadynamics to obtain them because it only facilitates the transition away from the kinked conformation without biasing towards certain end structures or transition pathways, which we found advantageous compared to alternative methods such as targeted MD. The reason for not reporting a free energy surface is that we considered the helicity of all seven residues making up the kink within a single CV, which smeared the energy landscape to the point that it is almost completely flattened. Furthermore, orthogonal CVs such as new interactions between the alpha_I helix with the SH2 domain or positional adjustments of the SH2 domain would have to be considered for a reliable quantitative result. We nevertheless observed transient SH2 domain unbinding during the applied time scale and added histograms to Fig. 4 – fig supplement 1 (former appendix Fig. 4) to make this more obvious.

      Reviewer #2 (Public Review):

      The manuscript aims at understanding how the fatty acid ligand MYR inhibits the activity of Abl kinase. Despite a wealth of structural and biochemical data, a key mechanistic understanding of how MYR binding could inactive Abl was missing.

      The authors used equilibrium and enhanced molecular dynamics (MD) simulations to masterfully answer open questions left by extensive experimental data in the mechanistic understanding of this system. The authors took advantage of several state-of-the-art simulation techniques and carefully planned simulations to extract a coherent understanding from a wealth of experimental facts.

      The manuscript convincingly identifies an allosteric regulation by MYR. Allostery is often a source of confusion and sometimes is used as a magic catch-it-all explanation for poorly understood phenomena. Here, the authors show very compelling evidence of the existence of an allosteric mechanism. Also, they identify the physical origin of the allosteric pathway, providing a clear mechanistic understanding at the residue-level resolution. This is an impressive achievement.

      We thank the reviewer for appreciating our work and its significance for understanding Abl regulation.

      By leaving a pocket in the protein, MYR enables the protein's activation. But MYR is a highly hydrophobic molecule surrounded by water. Where could it go rather than quickly binding back to the protein pocket? By asking this reasonable question, the authors propose an exciting mechanistic hypothesis. The physical proximity of Abl kinase to a cellular membrane could lead to a competition between the protein and the membrane for MYR, leading to a novel layer of regulation for this kinase. Free energy calculations performed by the authors show that this hypothesis is reasonable from the thermodynamic point of view.

      From a broader perspective, this manuscript is an important contribution to the discussion of four outstanding topics. 1) myristoylation is an example of lipidation, a post-translational modification where an acyl chain is covalently linked to a protein. The role of post-translational modifications has been greatly underappreciated and investigated in the MD community. However, as all the work on Sars-Cov2 and this contribution show, post-translational modifications can be crucial to understanding function. Ignoring them could lead to severely biased results. 2) the debate on the nature of allostery is still on the rage. Some authors claim that looking for a residue-level mechanistic chain of events that explains the allosteric action does not make sense and that the only way of thinking about allostery is as a sudden global change of the conformational landscape. Here, the authors show that instead, it is possible and leads to an essential understanding. 3) The authors hypothesize a novel crosstalk between the Abl and cellular membranes mediated by MYR. This exciting and far-reaching hypothesis opens the door to new complex layers of regulation. I suspect that these crosstalks between cytosolic proteins, or the soluble domain of membrane-tethered proteins and membranes, are much more ubiquitous than what has been appreciated so far. 4) From a methodological point of view, this manuscript represents a masterful use of simulations to put existing experimental data in a coherent picture. It is an example of the use of MD simulations at its best, where the simulations make sense of experiments, integrate existing data into a unified picture, and lead to new hypotheses that can be tested in future experiments.

      We thoroughly appreciate the reviewers positive feedback and the valuable suggestions for improvement below.

      It would be superb if the authors could propose precise predictions that could inspire future experiments. Now that they present a residue-resolution allosteric pathway, can they suggest point mutations that would interrupt it?

      We have added a short segment to the end of the discussion proposing possible experiments.

    1. Author Response

      Thank you for providing us with the reviewer comments. We will provide the revised manuscript at a later stage as recommended.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used machine learning algorithm to analyze published exosome datasets to find biomarkers to differentiate exosomes of different origin.

      Strengths:

      The performance of the algorithm are generally of good quality.

      Weaknesses:

      The source datasets are heterogeneous as described in Figure 1 and Figure 2, or Line 72-75; and therefore questionable.

      We thank the reviewer for this assessment. The commonly used biomarkers of exosomes exhibit heterogeneous presence and abundance within the exosomes derived from different cell lines, tissue, and biological fluids. The primary goal of this study was to identify universal exosomal biomarkers that remain consistent across different sources of exosomes, unaffected by potential isolation and quantification bias. This objective was achieved through an integration of datasets from different sources, which allowed for the subsequent identification of common proteins associated with exosomes. Among the 18 protein markers identified, it is noteworthy that they are universally abundant in all cell lines and their exosomes. We believe that despite the heterogeneity of the datasets used here, the identification of 18 universal protein markers in exosomes from diverse sources is a strength of this analysis.

      Reviewer #2 (Public Review):

      Summary:

      This is a fine work on the development of computational approaches to detect cancer through exosomes. Exosomes are an emerging biomarker resource and have attracted considerable interests in the biomedical field. Kalluri and co-workers collected a large sample pool and used random forest to identify a group of protein markers that are universal to exosomes and to cancer exosomes. The results are very exciting and not only added new knowledge in cancer research but also a new and advanced method to detect cancer. Data was presented very nicely and the manuscript was well written.

      Strengths:

      Identified new biomarkers for cancer diagnosis via exosomes.

      Developed a new method to detect cancer non-invasively.

      Results were presented nicely and manuscript were well written.

      Weaknesses:

      N/A.

      We appreciate the the enthusiastic assessment of our study by the reviewer.

      Reviewer #3 (Public Review):

      In the current study, Li et al. address the difficulty in early non-invasive cancer diagnosis due to the limitations of current diagnostic methods in terms of sensitivity and specificity. The study brings attention to exosomes - membrane-bound nanovesicles secreted by cells, containing DNA, RNA, and proteins reflective of their originating cells. Given the prevalence of exosomes in various biological fluids, they offer potential as reliable biomarkers. Notably, the manuscript introduces a new computational approach, rooted in machine learning, to differentiate cancers by analyzing a set of proteins associated with exosomes. Utilizing exosome protein datasets from diverse sources, including cell lines, tissues, and various biological fluids, the study spotlights five proteins as predominant universal exosome biomarkers. Furthermore, it delineates three distinct panels of proteins that can discern cancer exosomes from non-cancerous ones and assist in cancer subtype classification using random forest models. Impressively, the models based on proteins from plasma, serum, or urine exosomes achieve AUROC scores above 0.91, outperforming other algorithms such as Support Vector Machine, K Nearest Neighbor Classifier, and Gaussian Naive Bayes. Overall, the study presents a promising protein biomarker signature tied to cancer exosomes and proposes a machine learning-driven diagnostic method that could potentially revolutionize non-invasive cancer diagnosis.

      We appreciate this positive assessment of our work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by O'Reilly and Delis provides a valuable data-driven framework for extracting task-related muscle synergies in a step towards the understanding and practical use of synergies in real scenarios (e.g., evaluation of patients in a clinical environment). The approach is incomplete since the authors did not compare their method with classical physiologically grounded approaches for assessing muscle synergies. In this sense, the comparisons with classical approaches would clarify if physiological assemblies were preserved and were not altered to incorporate task space variables. Despite limitations, the proposed framework would interest motor control and neural engineering researchers.

      We thank the editors for the positive assessment of our work and appreciate their constructive feedback. In our revised manuscript, we believe we have sufficiently addressed the identified limitations by a) comparing our approach to existing physiologically-based methods, providing thorough comparisons of their respective outputs, b) applying it to a dataset of post-stroke participants to demonstrate that it can identify physiologically-interpretable markers of motor recovery and c) providing examples to demonstrate how readers can interpret the novel perspective introduced.

      Reviewer #1 (Public Review):

      The proposed study provides an innovative framework for the identification of muscle synergies taking into account their task relevance. State-of-the-art techniques for extracting muscle interactions use unsupervised machine-learning algorithms applied to the envelopes of the electromyographic signals without taking into account the information related to the task being performed. In this work, the authors suggest including the task parameters in extracting muscle synergies using a network information framework previously proposed. This allows the identification of muscle interactions that are relevant, irrelevant, or redundant to the parameters of the task executed.

      The proposed framework is a powerful tool to understand and identify muscle interactions for specific task parameters and it may be used to improve man-machine interfaces for the control of prostheses and robotic exoskeletons.

      With respect to the network information framework recently published, this work added an important part to estimate the relevance of specific muscle interactions to the parameters of the task executed. However, the authors should better explain what is the added value of this contribution with respect to the previous one, also in terms of computational methods.

      We thank the reviewer for their constructive comments. We have adjusted the introduction section of the manuscript to better explain the added value of this framework over previous work. Specifically, we draw the reviewer’s attention to the following updated section of the introduction:

      “In [11], we considered, key limitations among current approaches to muscle synergy analysis in extracting functionally relevant and interpretable patterns of muscle activity [12]. We proposed a combinatorial approach based on information- and network-theory and dimensionality reduction (the network-information framework (NIF)) that significantly improved the generalisability of the extraction process by, among others, removing restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics [12]. By determining the pairwise mutual information between muscles, this innovation paved the way for the appropriate mapping of muscular interactions to the task space. To elaborate on the significance of this development, the extraction of motor patterns in isolation of the task space comes at the expense of both functional and physiological relevance [12,13]. Furthermore, effective methods for mapping large-scale physiological dynamics to behaviour is a current gap across the neurosciences [14]. Thus, here we build on this work by, for the first time, directly including task space parameters during muscle synergy extraction. In doing so, we address these current research gaps, progressing muscle synergy research and successful engineering applications in a fruitful direction [12,15,16]. This enables us, in a novel way, to dissect the concept of the muscle synergy and therefore quantify interactions between muscle activations with shared or complementary functional roles. “

      In general, the method proposed relies on several hyperparameters and cost functions that have been optimized for the specific datasets. A sensitivity analysis should be performed, varying these parameters and reporting the performance of the framework.

      We thank the reviewer for this comment which enabled us to clarify a potential misunderstanding. Our proposed framework does not require setting or varying hyperparameters to optimise cost functions.

      For model-rank specification, a modularity maximising cost-function is used which determines what partitioning of the networks results in maximal modularity. We have offered two alternative approaches using this cost-function which consistently converge on the same solution. To further ensure the representativeness of this solution, we also offer a consensus-based approach where we apply these alternative approaches to individual participant or task data, then group the collective partitions together and re-apply the approaches. One of these approaches (Equation 2.2) requires two hyperparameters, γ and ω, which adjust the intra- and inter- network layer resolutions. As stated in the manuscript, we set both of these parameters to 1, thus nullifying their presence in the cost-function and aligning our work with the classical notion of modularity. Across the two alternative approaches to model-rank specification, the solution is unique and data-driven and has a demonstratable generalisability across datasets.

      The only other cost-function present in the framework is during dimensionality reduction, which is a standard loss function used across the muscle synergy analysis literature. Thus, the approach is essentially parameter-free and we now have mentioned this more explicitly in the manuscript:

      “To empirically determine the number of components to extract in a parameter-free way, we then concatenated these adjacency matrices into a multiplex network and employed network community-detection protocols to identify modules across spatial and temporal scales (fig.3(D)) [29–32,44].”

      “In its generalised multilayer form, the Q-statistic is given an additional term to consider couplings between layers l and r with intra- and inter-layer resolution parameters γ and ω (Equation 2.2). Here, μ is the total edge weight across the network and γ and ω were set to 1 in the current study for classical modularity [30], thus removing the need for any hyperparameter tuning.”

      It is not clear how the well-known phenomenon of cross-talk during the recording of electromyographic muscle activity may affect the performance of the proposed technique and how it may bias the overall outcomes of the framework.

      Indeed artifacts such as crosstalk are a standard issue across the EMG literature and may impact the performance of subsequent analyses where prevalent in the dataset. Crosstalk is expected to be present irrespective of the task and so should not affect redundant and synergistic muscle representations, however it could be present in the task-irrelevant muscle interactions extracted. Due to the prominence of long-range functional connections with the task-irrelevant representations extracted, we suggest that such artifacts are unlikely to have played a prominent role in the extracted patterns. Nonetheless, we have recognised this possibility with the following updated sentence in the Discussion section:

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [65], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [20,50].”

      Reviewer #2 (Public Review):

      This paper is an attempt to extend or augment muscle synergy and motor primitive ideas with task measures. The authors idea is to use information metrics (mutual information, co-information) in 'synergy' creation including task information directly. My reading of the paper is that the framework proposed radically moves from attempts to be analytic in terms of physiology and compositionality with physiological bases, instead into more descriptive ML frameworks that may not support physiological work easily.

      We thank the reviewer for taking the time to provide a thorough commentary on this manuscript. An overall aim in developing this framework is to build on other recent developments in providing a more fine-grained functional architecture underlying movement control [1,2]. It is a requirement for the successful communication and introduction of this toolbox to the field to provide readers with an understanding of how to use the framework and an intuition on how to interpret the results. Thus, we agree with the reviewer that functional interpretations are of crucial use.

      We also agree with the reviewer that maintaining a physiological underpinning is a desirable direction for the field and should not be made secondary to functional descriptions. In our updated version of this manuscript, we have therefore included direct comparisons with the gold-standard in the field for muscle synergy extraction, namely non-negative matrix factorisation based muscle synergy extraction (see ‘Building on current approaches to muscle synergy analysis’ and fig.5-6 of revised manuscript) [3,4]. In these comparison, we show how our framework goes beyond this current approach in terms of functional insight while still maintaining physiological relevance. Indeed, in the revised manuscript we also include a fourth dataset comprising post-stroke participants and healthy controls (Fig.6). We demonstrate, through a simple example application to this dataset, how our proposed framework can produce more predictive representations of motor impairment than the gold-standard approach. The representations we identified were discriminative of motor impairment measured via the Fugl-Meyer assessment using just one trial per participant. This improves considerably upon the sensitivity of the current approach to altered motor patterns which have predominantly required many trials and participants to gain significance [5,6]. Thus, the patterns we extract are a more comprehensive representation of the actual underlying physiological state of the participants.

      This approach is very different from the notions of physiological compositional elements as muscle synergies and motor primitives, and to me seems to really be striving to identify task relevant coordinative couplings. This is a meta problem for more classical analyses. Classical analyses seek compositional elements stable across tasks. These elements may then be explored in causal experiments and generative simulations of coupling and control strategies. The present work does not convince me that the joint 'meta' analysis proposed with task information added is not unmoored from physiology and causal modeling in some important ways. It also neglects publications and methods that might be inconvenient to the new framework.

      We would be very interested in receiving the reviewer’s suggestions of existing approaches that we have not incorporated here and would be happy to discuss these in the revised manuscript.

      Information based separation has been used in muscle synergy analyses using infomax ICA, which is information not variance based at core. Though linear mixing of sources is assumed, minimized mutual information is the basis.

      We agree with the reviewer that ICA relies on information measures, however it does not incorporate task-space information. The novelty of our approach lies in the characterisation of muscle interactions with respect to the task at hand. If the reviewer could provide references to this statement, we would be able to consider this further.

      Physiological causal testing of synergy ideas is neglected in the literature reviews in the paper. Although these are in animal work, the clear connection of muscle synergy choices and analyses to physiology is important and needs to be managed in the new methods proposed. Is any correspondence assumed? Possible?

      We agree with τhe reviewer that this a crucial element of muscle synergy research and will aim to address it in our future work. However, we would like to point out that the current manuscript is a “tools and resources” article aiming to introduce a new framework. In our revised manuscript, we have incorporated an application of the framework to a dataset from post-stroke patients to demonstrate the use of the framework in clinical settings to identify biomarkers and use them to make predictions of motor recovery (see Fig.6 of updated manuscript).

      Questions and concerns with the framework as an overall tool:

      First, muscle based motor information sources have influences on different time scales in the task mechanics. Analyses of synergies in the methods proposed will be very much dependent on the number and quality of task variables included and how these are managed. Standardizing and comparing among labs, tasks sets and instrumentation differences is not well enough considered as a problem in this new proposed method toolset, at least in my reading. Will replication, and testing across groups ever be truly feasible in this framework?

      We agree with the reviewer that this important point can be a limitation of the applicability of the framework. For this reason, we chose a “holistic” approach, applying the framework to several datasets collected in different settings, and selecting different kinds of task variables to extract muscle networks from. Crucially, we used a leave-one-task-out and leave-one-participant-out cross validation procedure to specifically address this point. Our results showed that the extracted couplings are robust irrespective of the task variable and/or participant excluded and this lends credit to the generalisability of the framework.

      Muscle based motor information sources have influences on different time scales in the task mechanics. Kinematic analyses, dynamic analyses and force plate analyses of the same task may provide task variables that alter the results in the proposed framework it seems.

      As we have mentioned above, here we used all the above types of task variables together to illustrate the range of measures that can be included in the proposed framework and showed that the outputs are robust to the exclusion of any task/participant. This point is especially evident for dataset 3 results, where high levels of generalisability were found despite the inclusion of kinematic, dynamic and IMU data (see Table 1. of original submission and updated manuscript). We believe that this is an advantage of the approach as it allows researchers to apply the method to different kinds of measurements they may have collected and gain insights into the relationships of muscle couplings with kinematic/dynamic/force parameters. This will also enable scientists to attribute different functional roles to the identified couplings and it is something we plan to do in future applications of the framework.

      Second, there is a sampling problem in all synergy analyses. We cannot record all muscles or all task parameters. Examining synergies across multiple tasks seeks 'stationary' compositionality. Including task specific elements may or may not reinforce or give increased coordinative precision to the stationary compositionality.

      We fully agree that this is a limitation of all synergy analyses and aimed to consider this study a step in the direction of addressing this limitation by providing the research community with a toolbox that can be used to quantify muscle couplings that can have different levels of task specificity.

      To me the new methods proposed seem partly orthogonal to the ideas of stable compositionality. The 'synergies' obtained will likely differ, and are more likely to be coordinative control groupings of recurrent task and muscle motifs (based on instrumentation) which may or may not relate to core compositionality in physiology. Is there any expectation that the framework should relate to core compositionality and physiology. This is not clear in the paper as written.

      In our new analysis, we have compared the proposed approach to existing physiologically-based methodologies and showed that the new framework can capture several salient physiological features of movement that the current NMF-based approach cannot. For example, as we have moved away from optimising variance accounted for metrics, our framework can identify subtle muscle couplings that have important functional roles. These subtle couplings are often not captured in current muscle synergy analysis as, against physiological relevance, higher amplitude muscles often take prominence. Further, by directly including task parameters during extraction, we can determine the muscles that have a functional role concerning the included task parameter rather than inferring this relationship indirectly using knowledge about the task executed. In our updated manuscript, by applying the framework to post-stroke participants (see Fig.6), we were also able to demonstrate that the extracted couplings are associated with functional parameters of motor recovery and have a clear link with the physiological state of individual participants.

      It would be useful to explore the approach with a range of neuromechanical models and controllers and simulated data to explore the issues I am raising and convince readers that this analysis framework adds clarity rather than dissolving the generalizability and interpretability of analyses in terms of underlying causal mechanisms.

      The authors need to better frame their work in relation to causal analyses if they are claiming links to muscle synergies analyses and claim extension/refinement. Alternatively, these may not be linked, and instead parallel approaches exploring different hypotheses and goals using different organizational data descriptors.

      To address the reviewers concerns here, we have included in the updated manuscript a toy example simulating situations in which pairs of muscles would have a redundant or synergistic functional relationship (see Fig.2). This simulation gives clear intuition on situations where two muscles (e.g. an antagonist-agonist pair) may share functionally similar or complementary information about task direction (left vs right). In particular, within the main text describing this figure, we state how current NMF based approaches consider muscles functionally equivalent when they share similar magnitude activations, whereas our framework captures muscles with identical task information. Thus, our work is an extension of current approaches towards understanding causal mechanisms. The suggestion to use neuromechanical models is valuable, however we consider it beyond the scope of this work. This “Tools and Resources” paper is aimed at introducing the computational framework for the analysis of large-scale muscle couplings in task space. Our future work will use this framework to address unanswered questions in the field and we hope that it will be helpful for other scientists in testing their hypotheses.

      To me this appears a data science tool that may not help any reductionist efforts and leads into less interpretable descriptions of motor control. Not invalid, but sufficiently different that common term use muddies the water.

      We believe that the novel evidence we provided both on simulated and real data have contributed to a better interpretability of the approach outcomes. Specifically, we have introduced examples showing the functional roles of the different types of interactions as well as the predictive power of the outputs. Concerning the use of the term synergy, we have provided a clear description throughout the manuscript regarding the interpretation of synergy vs redundancy in the novel perspective we propose. For example in the discussion section:

      “ We thus sought to provide greater nuance to the notion of ‘working together’ by defining motor redundancy and synergy in information-theoretic terms [6,56]. In our framework, redundancy and synergy are terms describing functionally similar and complementary motor signals respectively, introducing a new perspective that is conceptually distinct from the traditional view of muscle synergies as a solution to the motor redundancy problem [3,6,7]. In this new definition of muscle interactions in the task space, a group of muscles can ‘work together’ either synergistically or redundantly towards the same task. In doing so, the perspective instantiated by our approach provides novel coverage to the partitioning of task-relevant and -irrelevant variability implemented by the motor system along with an improved specificity regarding the functional roles of muscle couplings [20–22]. Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      Reviewer #3 (Public Review):

      In this study, the authors developed and tested a novel framework for extracting muscle synergies. The approach aims at removing some limitations and constrains typical of previous approaches used in the field. In particular, the authors propose a mathematical formulation that removes constrains of linearity and couple the synergies to their motor outcome, supporting the concept of functional synergies and distinguishing the task-related performance related to each synergy. While some concepts behind this work were already introduced in recent work in the field, the methodology provided here encapsulates all these features in an original formulation providing a step forward with respect to the currently available algorithms. The authors also successfully demonstrated the applicability of their method to previously available datasets of multi-joint movements.

      Preliminary results positively support the scientific soundness of the presented approach and its potential. The added values of the method should be documented more in future work to understand how the presented formulation relates to previous approaches and what novel insights can be achieved in practical scenarios and confirm/exploit the potential of the theoretical findings.

      Strengths:

      This work proposes a novel framework that addresses physiologically non-verified hypothesis of standard muscle synergy methods: it removes restrictive model assumptions (e.g. linearity, same mixing coefficients) and the reliance on variance-accounted-for (VAF) metrics.

      The method is solid and achieves the prescribed objectives at a computational level and in preliminary laboratory data.

      A toolbox is available for testing the methods on a larger scale.

      The paper is well written and shows a high level of innovation, original content and analysis

      Weaknesses:

      Task performance variables could be specified in more quantitative definition in future work (e.g.: articular angles rather than a generic starting point- end point).

      We agree with this point and will incorporate it in future work. Our aim here was to show that the framework would work with any task variable and that scientists can use it to identify the relevance of muscle interactions to different types of task parameters.

      The paper does not show a comparison with previous approaches (e.g.: NMF) or recently developed approaches (such as MMF).

      We have now illustrated such a comparison on two datasets and explained more how the new framework can dissect the different types of muscle groupings (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript).

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      In our revised manuscript, we have introduced 2 new applications of the framework to real data to exemplify its use for a) functional interpretability and b) identification of biomarkers (see ‘Building on current approaches to muscle synergy analysis’ section and Fig.5-6 of revised manuscript). We also point towards its use in movement restoration and augmentation devices and in the clinical setting in the discussion section:

      “The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      In this work, the effort of the authors aimed at developing the field is clear. It is fundamental to develop novel frameworks for synergy extraction and use them to make them more interpretable and applicable to real scenarios, as well as more adherent to recent findings achieved in motor control and neuroscience that are not reflected in the standard models. At the same time, muscle synergies are being used more and more in research but their impact in practical scenarios is still limited, probably because synergies have rarely been analyzed in a functional context. This paper shows a very in-depth analysis and a novel framework to interpret data that links to the task space from a functional perspective. I also found that the results on the datasets are very well commented but could expand more to show why using this framework is advantageous.

      There are some key points for discussion that follow from this paper which can be described more, maybe in future work, and that might contribute to major developments in the field, including:

      The understanding of how the separation between relevant (redundant and synergistic) and irrelevant synergies impact on synergy analysis in practical works;

      We have now introduced new figures (Fig. 5 and 6) to the revised manuscript, demonstrating simple applications of the framework and providing intuition regarding the outputs. We have also added points to the Discussion commenting on the differences between types of couplings and how they can be interpreted in future works:

      “Our framework emphasises not only the role of functionally redundant muscle couplings that result from the underlying degeneracy of the motor system, but also of complementary, synergistic dependencies that are important for communication and integration across specialised neural circuitry [57,58]. Thus, the present study aligns the muscle synergy concept with the current mechanistic understanding of the nervous system whilst offering an analytical approach amenable to the continued advances in large-scale data capture [14,59].”

      “Although distinguishing task-irrelevant muscle couplings may capture artifacts such as EMG crosstalk, our results convey several physiological objectives of muscles including gross motor functions [64], the maintenance of internal joint mechanics and reciprocal inhibition of contralateral limbs [19,49]. Thus, task-irrelevant muscle interactions reflect both biomechanical- and task-level constraints that provide a structural foundation for task-specific couplings. The separate quantification of these muscle interaction types opens up novel opportunities in the practical application of muscle synergy analysis, as demonstrated in the current study through the identification of a significant predictor of motor impairment post-stroke from single-trials [5,12,65]. For instance, these distinct representations may encapsulate different neural substrates that can be specifically assessed at the muscle-level for the purpose of bodily restoration and augmentation [66]. Uncovering their neural underpinnings is an interesting topic for future research.”

      Interpreting how different synergistic organizations described in this work allows to better describe data from real scenarios (e.g.: motor recovery of patients after neurological diseases);

      We have now added an example application of the framework to a dataset of stroke patients (Fig.6) and identified a redundant muscle patterns that are predictive of functional measures.

      Discussing in detail how the presented findings compare with standard algorithms such as NMF to determine the added value provided with this approach;

      As indicated above, we have now shown such a comparison on two new datasets (see Fig.5-6 of revised manuscript).

      Describe how redundant synergies reflect real neural organization and - if their "existence" is confirmed - how they contribute to redesign the concept of muscle synergies and of modular/synergistic control in general.

      This is an important point that we have now addressed more in our Discussion by relating redundant muscle couplings to degeneracy in the motor system and synergistic couplings to integrative dynamics by higher-level processes. We have also added a simple simulation illustrating how synergistic and redundant interactions co-exist and represent different contributions to task performance (see Fig.2 of revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Summary of changes

      I thank the reviewers for their thorough feedback on this paper and providing me with such a detailed list of recommendations. I have been able to incorporate many of their suggestions, which I believe has greatly improved this paper.

      The most important changes:

      • I added comparisons to the lexicon- and rule-based sentiment algorithms TextBlob and VADER to Supplementary Fig. 4. This shows the superiority of ChatGPT in scoring the sentiment of scientific texts compared to existing and already-validated tools for sentiment analysis based on natural language processing. [Suggestion Reviewer 2]

      • I added the measure intra-class correlation to Fig. 3b, emphasizing the inconsistency in sentiment scores across different reviews of the same paper. [Suggestion Reviewer 3]

      • I added Supplementary Fig. 6, in which I directly propose different experiments to test the causes of the observed gender effects on peer review. [Suggestion Reviewer 3]

      • I further studied the issue of variability in responses by ChatGPT (Supplementary Fig. 2), and learned that this has greatly improved in the latest version of ChatGPT (for Version Aug 3, 2023, R2 values of 0.99 (sentiment) and 0.86 (politeness) were reached). I show these findings in Supplementary Fig. 2. [Suggestions Reviewers 1 and 3]

      • Throughout the manuscript (most notably in the Abstract and Discussion), I emphasize that this is a proof-of-concept study, and make suggestions on how to scale this up across journals and fields. I also toned down certain claims given the relatively small sample size of this study, including in the abstract. I also more prominently and elaborately discuss the limitations of the study in the Discussion section. [Suggestions Reviewers 1, 2 and 3]

      • I made many smaller changes to text, figures and references on the basis of the reviewers’ comments. [Suggestions Reviewers 1, 2 and 3]

      Notably, Reviewer 3 has provided me with a very detailed list of recommendations for follow-up experiments. I appreciate their ideas, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted papers. As suggested by this reviewer, I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review.

      Based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      Reviewer #1 (Public review)

      Strengths:

      The innovative method is the biggest strength of this article. Moreover, the method can be implemented across fields and disciplines. I myself would like to see this method implemented in a grander scale. The author invested a lot of effort in data collection and I especially commend that ChatGPT assessed the reviews twice, to ensure greater objectivity.

      I want to thank this reviewer for commending the innovative methodology of this study. I appreciate that this reviewer would like to see this methodology implemented at a grander scale, which is a view that I share. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores).

      The reviewers have provided me with a list of potential follow-up experiments, and I am currently considering different options for future work. Specifically I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript of a journal. In addition, as suggested by Reviewer #3, I am looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Importantly, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Weaknesses:

      I have several concerns regarding the methodology of the article. The first relates to the fact that the sample is not random. The selection of journal and inclusion and exclusion criteria do not contribute well to the strength of the evidence.

      Indeed, the inclusion of only accepted manuscript from a single journal is the biggest caveat of this paper. I have re-written much of the Abstract to emphasize that this is a proof-of-concept paper, hoping that other researchers concurrently expand this method to larger and more diverse datasets.

      An important methodological fact is that the correlation between the two assessments of peer reviews was actually lower than we would expect (around 0.72 and 0.3 for the different linguistic characteristics). If the ChatGPT gave such different scores based on two assessments, should it not be sound to do even more assessments and then take the average?

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #3. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations).

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      Reviewer #1 (Recommendations to author)

      I had some difficulties reading the article, so it would maybe help to structure the article more (e.g. In the introduction there are three aims stated, so the Statistical Analysis section could be divided in three sections, and instead of the link to figures, the author could state which variables were analysed in a specific manner) to be easier to comprehend the details. Also, I found on one place that the sample consisted of 572 reviews, and on other that it was 558.

      These are very good points. I re-wrote the statistical analysis for clarity (Page 7 of the manuscript). The 558 reviews was a mistake from my part, as I forgot to include the fourth review for the 14 papers that received four reviews in the histograms of Fig. 2b and the accompanying text. This has been updated.

      For figures 1a and 1b it could be considered to enter the table instead of several figures.

      I thank the reviewer for pointing this out. I tried this suggestion, but I found it to reduce the readability of the paper. As an alternative, I now provide an Excel spreadsheet with all the raw data, so people can find all the characteristics of the included papers.

      99.8% of the reviews analysed were assessed as polite. This is, in my opinion, extremely important finding, which shows that reviewers are still holding to certain degree of standards in communication, and it can be mentioned in the abstract.

      I very much agree with this reviewer; this has now been added to the Abstract.

      In results you state that QS World Ranking is "imperfect" measure. When stating that in the results section, it poses the question why it is used in the study, so maybe it is more suitable for the discussion.

      This point is well taken. Even though the QS World Ranking score is imperfect, I still think it can be useful, as a rough proxy of perceived prestige of an institution. I now removed this “imperfect measure” statement from the Results section, and moved it to the Discussion (Page 5).

      In the Results section, instead of using only p values, please add measures of effect (correlations, mean differences), to make it easier to place in the context.

      For the significant effects of Fig. 4, I have added these to the figure legends. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      I think the results interpretation should be softened a bit, or the limitations of the study should be placed as the second paragraph in the discussion, since this was only specific journal with specific subfield.

      I agree with this reviewer that the relatively small sample size of this paper demands more careful wording. Throughout the manuscript, I have toned down claims, and emphasized the “proof of concept” nature of this study (for example in the Abstract). I also moved the limitations section to the second paragraph of the Discussion, and elaborate more on the study’s caveats.

      Methods:

      The measure Review time was assessed from submission to acceptance, but this does not need to be review time since it takes a lot of time sometimes to find reviewers. that needs to be stated as the limitation.

      This point is well taken. I changed this to “Paper acceptance time” in Fig. 3 and the accompanying text.

      Gender name determination methods differed between the assessment of the first authors and the last authors, and that needs stronger explanation.

      I appreciate this reviewer raising this point, which has also been raised by Reviewer #3. For this paper, I have carefully weighed the pros and cons of automated versus manual gender determination. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process.

      I also realize that my rationale for the different methods of gender determination was not explained well enough in the original submission; I now explain my reasoning more elaborately on Page 7 on the manuscript.

      For sentiment analysis: Please state based on what the GPT made a decision? Which program? (e.g. for gender it used genderize.io)

      This has been added to Page 7.

      Finally, your entire analysis can be made reproducible (since everything is publicly available). You can share ChatGPT chats as online materials with variables entered with the dataset analysed and the code. This would increase the credibility of the findings.

      I will make the entire raw dataset available through the eLife website, including all reviews and their scores.

      Reviewer #2 (Public review)

      Strengths include:

      1) Given the variability in responses from ChatGPT, the author pooled two scores for each review and demonstrated significant correlation between these two iterations. He confirmed also reasonable scoring by manipulating reviews. Finally, he compared a small subset (7 papers) to human scorers and again demonstrated correlation with sentiment and politeness.

      2) The figures are consistently well presented and informative. Figure 2C nicely plots the scores with example reviews. The supplementary data are also thoughtful and include combination of first/last author genders. It is interesting that first author female last author male has the lowest score.

      3) A series of detailed analysis including breaking down reviews by subfield (interesting to see the wide range of reviewer sentiment/politeness scores in computational papers), institution, and author's name and inferred gender using Genderize. The author suggests that peer review to blind the reviewers to authors' gender may be helpful to mitigating the impoliteness seen.

      Thank you.

      Weaknesses include:

      1) This study does not utilize any of the wide range of Natural Language Processing (NLP) sentiment analysis tools. While the author did have a small subset reviewed by human scorers, the paper would be strengthened by examining all the reviews systematically using some of the freely available tools (for example, many resources are available through Hugging Face [https:// huggingface.co/blog/sentiment-analysis-python ]). These methods have been used in previous examinations of review text analysis (Luo et al. 2022. Quantitative Science Studies 2:1271-1295). Why use ChatGPT rather than these older validated methods? How does ChatGPT compare to these established methods? See also: colab.research.google.com/drive/ 1ZzEe1lqsZIwhiSv1IkMZdOtjPTSTlKwB?usp=sharing

      This was a great recommendation by this reviewer, and I have tested ChatGPT against TextBlob and VADER, the two algorithms also used by the Luo et al. study — see Supplementary Fig. 4. Perhaps unsurprisingly, these algorithms performed very poorly at scoring sentiment of the reviews. Please note that I also tested these two algorithms at scoring individual sentences, Tweets and Amazon reviews, which it did very well (i.e., the software package was working correctly). Thus, ChatGPT is better at scoring scientific texts than TextBlob and VADER, likely because these algorithms struggle with finding where in the review the sentiment is conveyed. I now discuss this on Pages 1, 3 and 4 of the manuscript.

      2) The author's claim in the last paragraph that his study is proof of concept for NLP to analyze peer review fails to take into account the array of literature already done in this domain. The statement in the introduction that past reports (only three citations) have been limited to small dataset sizes is untrue (Ghosal et al. 2022. PLoS One 17:e0259238 contains over 1000 peer review documents, including sentiment analysis) and reflects a lack of review on the topic before examining this question.

      I thank this reviewer for pointing me to this very useful study. I regret missing this one in my initial submission; I now discuss this paper in Pages 1 and 5 of the manuscript.

      3) The author acknowledges the limitation that only papers under neuroscience were evaluated. Why not scale this method up to other fields within Nature Communications? Cross-field analysis of the features of interest would examine if these biases are present in other domains.

      I share this reviewer’s opinion that it would be very interesting to expand this analysis to different subfields. I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals.

      The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Reviewer #3 (Public review)

      Strengths:

      On the positive side, I thought the use of ChatGPT to score the sentiment of text was novel and interesting, and I was largely convinced by the parts of the methods which illustrate that the AI provides broadly similar sentiment and politeness scores to humans who were asked to rank a sub-set of the reviews. The paper is mostly clear and well-written, and tackles a question of importance and broad interest (i.e. the potential for bias in the peer review process, and the objectivity of peer review).

      Thank you.

      Weaknesses:

      The sample size and scope of the paper are a bit limited, and I have written a long list of recommendations/critiques covering diverse aspects including statistical/inferential issues, missing references, and suggestions for other material that could be included that would greatly increase the usefulness of the paper. A major limitation is that the paper focuses on published papers, and thus is a biased sample of all the reviews that were written, which prevents the paper properly answering the questions that it sets out to answer (e.g. is peer review repeatable, fair and objective).

      I very much appreciate this reviewer taking the time to provide me with such a detailed list of recommendations. Below, I will respond to this list in a point-by-point manner.

      Reviewer #3 (Recommendations to author)

      My main issues with the paper are that it is not very ambitious, and gave me the impression the aim was to write the first paper using ChatGPT to address this question, rather than to conduct the most thorough and informative investigation that would have been feasible (many obvious questions that could be addressed are not tackled, since the sample size is small and restricted). There are also issues with selection bias, and the statistical analysis, that have possibly led to erroneous inferences and greatly limit what conclusions can be drawn from the analysis. I hope my comments of use in further improving the paper.

      The repeatability of ChatGPT when calculating the two linguistic characteristics is low. Taking the average of multiple assessments is one way to deal with this. To verify that taking the average of, say, 5 scores gives a repeatable score, the author could consider calculating 10 scores for a set of 20-30 reviews, calculating two scores for each review using the first 5 and second 5 ChatGPT ratings, and then calculating repeatability across the 20-30 reviews. It is important to demonstrate that ChatGPT is sufficiently repeatable for this new method to be useful.<br /> Also, it might be possible to automate this process a bit to save time - e.g. the author could change the ChatGPT prompt, like "please rate the politeness of this review from -100 to +100, do it 10 times independently, and print your 10 ratings as well as their average". Hopefully the AI is smart enough to provide 10 independently-computed ratings this way, saving the need to copypaste the prompt into the chat box 10 times per review.

      This was a great recommendation by this reviewer, and a point also raised by Reviewer #1. Based on their suggestion, I looked into how each additional iteration of scoring would reduce the variability of scoring for a subset of papers (thus being able to advice users on an optimal number of iterations). I also tested this Reviewer’s suggestion to ask ChatGPT to score many times, and give separate scores for each iteration — this worked very well.

      Interestingly, I observed that ChatGPT has become significantly more reliable in providing sentiment and politeness scores in recent versions. For the latest version (ChatGPT Aug 3, 2023), R2 = 0.992 for sentiment and R2 = 0.859 for politeness were reached for two subsequent iterations of scoring. Unfortunately, OpenAI does not allow access to previous version of ChatGPT, so the current dataset could not be re-scored. Yet, based on these data, there may no longer be a need for people to perform repeated scoring. I show these data in Supplementary Fig. 2, as I believe this is very useful information for people who are interested in using this tool.

      To my mind, the main reason to use an AI instead of one or more human readers to rank the sentiment/politeness of peer reviews is to save time, and thereby allow this study to have a larger sample size than would be feasible using human readers. With this in mind, why did you choose to download only 200 papers, all from the discipline of Neuroscience, and only from Nature Communications? It seems like it would be relatively easy to download papers from many more journals, fields of research, or time periods if using AI-based methods, and in fact it would have been feasible (though fairly laborious) for one person to read and classify the sentiment of the reviews for 200 papers.

      As well as providing more precise estimates of the parameters you are interested in (e.g. the consistency of reviews, and the size of the difference in reviewer sentiment between author genders), expanding the sample beyond this small set of papers would allow you to address other interesting questions. For example, you could ask whether the patterns observed for neuroscience are similar to those in other research disciplines, whether Nature Comms is representative of all journals (given there are other journals with public reviews), and you could test whether the male-female differences have become greater or smaller over time (e.g. by comparing the male-female differences observed in the past to the effect size observed in 2022-23). Additionally, the main analyses in this paper would have higher statistical power - for example, you only include 53 papers with a female senior author, giving you quite low power/ precision to estimate the gender difference in the average sentiment of reviews (given the high variance in sentiment between papers).

      I want to thank this reviewer for taking the time about possible ways to increase the impact of this work. I agree, these are all great suggestions, and there are many possibilities to apply ChatGPTbased natural language processing to scientific peer review. Respectfully, I chose to continue with publishing this work in the form of a proof-of-concept paper, because I currently do not have the resources to perform this (quite labor intensive) study. Below I will explain my reasoning, that I also shared with Reviewers #1 and #2.

      I initially only included Neuroscience papers, because I was uncertain whether I would be able to properly assess the reviews from different scientific disciplines (and thus judge whether ChatGPT was able to provide plausible scores). The different reviewers have provide me with a list of potential follow-up experiments, and I am currently considering different options for future work, including expanding into different fields within Nature Communications. Additionally, I am looking to team up with a journal to perform the experiments laid out in (the new) Supplementary Fig. 6 of the new paper, to study whether I can find evidence of bias across rejected and accepted manuscript papers of a journal. I am also looking into ways to automate data collection using APIs, and by utilizing the rapidly expanding databases for transparent peer review. Yet, based on this preprint, I have received messages from academics that are interested in using generative AI to study scientific texts. By revising this manuscript now, I hope to provide them with the tools to concurrently expand the analysis of peer review into different scientific disciplines and journals. The comments I received from the different reviewers made me realize that I did not describe the intent of this paper well enough in the original submission. I rewrote much of the Abstract, to emphasize the proof-of-concept nature of this study, and rewrote the Discussion to focus more on the limitations of the study.

      Also, if you could include some reviews of papers that were reviewed double-blind, you could test whether the gender-related differences in peer reviews are ameliorated by double-blind reviewing. Nature Comms (and many other journals with open review) do have some double-blinded papers, and there is evidence that that double-blinding is preferentially selected by authors who think they will experience discrimination in the peer review process (DOI: 10.1186/s41073-018-0049-z), and also that double-blinding does ameliorate bias (DOI: 10.1111/1365-2435.14259), so this seems very relevant to the ideas under study here.

      I note that the PLOS journals allow open peer review, and there is an API for PLOS which one can use to download the reviews for a given paper (e.g. try this query to get to the XML file of a paper which has open peer review: http://journals.plos.org/plosone/article/file?id=10.1371/ journal.pone.0239518&type=manuscript). Using an API could allow this project to be scaled up, because you can programmatically search for the papers with open reviews, download those reviews using the API and some code, and then score them using the same ChatGPT-based methods used for Nature Comms. Also, Publons recently merged with Web of Science (Clarivate), and you can now read all the open peer reviews on Web of Science for papers which had open review (e.g. for this paper: https://www-webofscience-com.napier.idm.oclc.org/wos/woscc/fullrecord/WOS:000615934800001). It would be possible to write to Web of Science, request access to their data or search engine, and programmatically download many thousands of papers and their associated reviews, and then use ChatGPT or a similar AI to score them all (especially if you can pass the reviews to ChatGPT for scoring programmatically, instead of manually copy-pasting the reviews into the chat box one at a time as it appears was done in the present study).

      These are great suggestions, and I have different plans for follow-up studies, including the use of APIs to download large batches of peer reviews. The analyses in this paper have been performed in February of this year, even before the ChatGPT API had been released, which did not let me automate the process at that time. As a result, these analyses have been performed manually. I realize that the field is moving rapidly, and that there are now different options to scale this up quickly.

      I plan on using the suggestions from this Reviewer for follow-up experiment in a next paper, and publish this revision as a proof-of-concept paper. In this way, different researchers can optimally use ChatGPT-based sentiment analyses for similar studies without a delay.

      As you acknowledge, there is a selection bias in this study, since you only include papers that were ultimately published in Nature Comms (missing reviews of papers that were rejected). This is a really big limitation on the usefulness of some of your analyses. For example, you found no relationship between author institutional prestige and reviewer sentiment. This could be evidence of a fair and impartial review process (which seems unlikely!), or it could be a direct result of selection bias (specifically a "collider bias", like the famous example involving height and skill among professional basketball players). The likelihood that a paper is published is positively related both to its quality and the prestige held by the authors, we might expect a flatter (or even negative) correlation between prestige and reviewer sentiment among papers that were published than among the whole set of papers (like how the correlation between height and speed/skill is less positive among NBA players than among the general population, since both height and speed/skill provide advantages in basketball).

      I agree with this reviewer that the selection bias is a major limitation of this study. I rewrote much of the Abstract and Discussion to tone down claims, and more prominently discuss the limitations of this study. I also made several suggestions for follow-up experiments.

      In the section "Consistency across reviewers", you write that there was little similarity between review sentiment scores from different reviewers from the same paper, and then write "This surprising result indicates high levels of disagreement between the reviewers' favorability of a paper, suggesting that the peer review process is subjective." However I disagree with this conclusion for three reasons:

      • Firstly, your dataset only includes papers that were published, and thus there is a selection bias against manuscripts where both/all reviewers disliked the paper - the removal of this (probably large) set of reviews will add a (potentially very strong) downward bias to your estimate of how consistent the review process is (since you are missing all those papers where the reviewers agreed). I think that one cannot properly answer the question "are reviewers consistent in their appraisals" without having access to papers that were rejected as well as those that were accepted.

      I agree with this reviewer that there is a selection bias in this study, which I acknowledged throughout the initial submission of this manuscript. Indeed, having access to reviews of rejected papers will greatly increase my confidence in this finding. However, if there is consistency across reviewers in the entire pool of (post-review rejected+accepted) manuscripts, some of that has to trickle down into the pool of accepted papers. The correlation between sentiment scores of the different reviewers is so strikingly low (or even absent) that I simply cannot envision a way in which there is consistency across reviewers in the pre-editioral decision stage. Yet, I realize that this point is debatable. Therefore, I changed the phrasing of the Discussion section, including the following sentence:

      That being said, the extremely low (or even absent) relation between how different reviewers scored the same paper was striking, at least to this author.

      • Secondly, the method used to assess whether the reviews for each paper tend to be similar (shown in Figure 3b) does not fully utilize the information contained in the data and could be replaced with another method. (In the paper 3 univariate regressions compare the sentiment scores for R1 vs R2, R1 vs R3, and R2 vs R3, which needlessly splits up the data in the case of papers with more than 2 reviewers, reducing power.) You could instead calculate the intraclass correlation coefficient (aka 'repeatability'), to determine what proportion of the variance in sentiment scores is between vs within papers (I suggest using the excellent R package rptR for this). Note that the sentiment scores are not normally distributed, and so regular regression (as you used) or one-way ANOVA (which you might be tempted to use for the ICC calculation) are not ideal - consider using a GLM or transformation (the rptR package automates the tricky calculation of repeatability for generalized models).

      I thank this reviewer for pointing me towards this option. I added this analysis to Fig. 3b, which confirmed the inconsistency in sentiment scores for reviews of the same paper (ICC = 0.055). As suggested by this reviewer, I decided to perform the ICC on log-transformed data, as ICC calculation is very sensitive to non-normally distributed data.

      • Thirdly, an alternative and very plausible hypothesis for this lack of similarity (besides peer review being highly subjective) is that ChatGPT is estimating the "true sentiment" of a review (i.e. what the reviewer intended to say) with some amount of error (e.g. due to limitations/biases in the AI, or reviewers struggling to make themselves understood due to issues such as writing in a second language, typos, or writing under time pressure), which dilutes the similarly in the estimated sentiment of the reviews. In other words, if the true sentiment values are strongly correlated, but there is random error in how those values are estimated by ChatGPT, then the correlation between reviewer scores for each paper will tend to zero as the error tends to infinity. Furthermore a nebulous quality like "sentiment" cannot be fully summarised in a single variable running from -100 to +100, and if you had used a more multi-dimensional classification system for the reviews (or qualitative assessment by human readers) you might have found that there is a bit more correspondence (I'm speculating here, but I think you cannot really exclude this and the paper doesn't mention this limitation).

      This point is well taken. I added caveats to the Discussion section on Page 5. Altogether, after taking these caveats into account, I do believe that this analysis convincingly demonstrates subjectivity in the peer review of this subset of papers. That said, I hope that my re-written discussion and additional analysis have added the necessary nuance to this point.

      In Figure 3C, you write "Contribution of paper scores to review time". This strongly implies to the reader that the sentiment scores inferred for the reviews have a causal effect on the review time. This is imprecise writing (since the scores were calculated by you after the papers were published, and thus cannot be causal - you mean that the actual reviews affected the review time, not the scores), but more importantly you cannot infer any causality here since your dataset is observational/correlational. You could fix this by re-phrasing to emphasise this, e.g. "Statistical associations between paper scores and review time".

      This is a very good point raised by this reviewer. I have corrected the phrasing so it no longer implies causality.

      For the analysis shown in Figure 4d and Figure 4e, I am not certain what you mean by "data split per lowest/median/highest sentiment score". This is ambiguous, and I am also not sure what the purpose of this analysis is or what it shows - I suggest re-writing for greater clarity (and ideally providing the code used in all your analyses) and perhaps revising the analysis. Additionally, an important missing piece of information from this analysis (and most analyses in the paper) is the effect size. For example, you don't report what is the difference in politeness score and sentiment score between male and female authors, and what is the SE and 95% CIs for this difference. From eyeballing the figure, it looks like the difference in politeness is about 4 points on your 200point scale - this is small in absolute terms, but might be quite large in relative terms given that "politeness score" usually hovered around a small part of the full 200-point scale. What is this as a standardised effect size (i.e. in terms of standard deviations, as captured by effect sizes like Cohen's d and Hedges' g)? Calculating this (and its 95% CIs) would allow you to say whether the difference between genders is a "big effect", and give an idea of your confidence in your effect size estimate and any inferences drawn from it. You even discuss the effect size in your discussion, so it would help to calculate the standardised effect size. If you're not familiar with effect size and why it's useful, I found this paper very instructive: https://onlinelibrary.wiley.com/ doi/abs/10.1111/j.1469-185X.2007.00027.x

      I agree with this reviewer that this phrasing was ambiguous. I now rephrased this on Page 4 of the manuscript:

      To study whether these more impolite reviews for female first authors were due to an overall lower politeness score, or due to one or some of the reviewers being more impolite, I split the reviews for each paper by its lowest/median/highest politeness score. I observed that the lower politeness scores for first authors with a female name was driven by significantly lower low and median scores (Fig. 4d, bottom panel). Thus, the least polite reviews a paper received were even more impolite for papers with a female first author.

      I also added effect sizes of the significant effects from Fig. 4 to its figure legend. Please note that the used statistical tests are non-parametric, so I reported the Hodges-Lehmann differences (which is the median of all possible pairwise differences between observations from the two groups).

      "Double-blind peer review has been debated before, but has come under scrutiny for various reasons" - this is vague and unhelpful. I think it's worthwhile to properly engage with the debate and the substantial body of evidence in your paper, given your main focus is on potential bias in the review process based on authors' identities (e.g. gender, institutional prestige).

      I thank the reviewer for pointing this out. I rephrased this sentence to indicate that there is evidence that it helps to remove certain forms of bias (Page 5):

      To address this issue, double-blind peer review, where the authors' names are anonymized, could be implemented. Evidence suggests that this is useful in removing certain forms of bias from reviewing8,9, but has thus far not been widely implemented, perhaps because some studies have cast doubt on its merits21,22.

      I have also added a Supplementary Fig. 6 to this paper, in which I lay out how my tool can be used to study bias by applying it to single- and double-blinded reviews (see also my answer to the other question about this topic below).

      On a related note, in the first paragraph, when discussing the potential of single-blind review to allow reviewers to essentially discriminate against papers by women, there is a key missing citation. This year, the first truly experimental test of this hypothesis was published (DOI: 10.1111/1365-2435.14259); a journal conducted a randomised controlled trial in which submitted manuscripts were reviewed either single- or double-blind. They found no effect of author gender on reviewer ratings or editorial decisions (though there was an effect of review type on success rate of authors from different countries). It would be better to cite this instead of reference 6, which as you acknowledge is methodologically flawed. This paper is also worth a read given your focus on Nature journals: DOI: 10.1186/s41073-018-0049-z.

      This point is well taken. I now cite this paper (citation #8) and rephrased this part of the Introduction (Page 1).

      "Another - arguably more simple - solution [compared to double-blind peer review] could be for reviewers to be more mindful of their language use." Here, you seem to be saying that we don't need to blind author names during peer reviewers, because it would simpler if all reviewers were simply nicer! I object to this because A) double-blind review is easy to implement, and greatly reduces the opportunity to tune the review to the author's identity (and there is some experimental evidence that it works in this regard), and B) it seems like wishful thinking to say that we don't need to implement measures that reduce the scope for bias, because all reviewers could instead stop using impolite language.

      This is a very valuable comment. I rephrased this to emphasize that this is an additional measure.

      "reviewers may want to use ChatGPT to extract a politeness score for their review before submitting" Yes, that's an interesting idea, and I can imagine that some (probably small) proportion of reviewers will be interested in doing this. But I think you should think bigger about wholesale changes to the review system that are possible because of AI like ChatGPT. For example, the submission platforms where reviewers submit their reviewers (e.g. ScholarOne, Manuscript Central) could be updated to use AI to pre-screen draft reviews, and issue a warning to reviewers, like "Our AI assistant has indicated that the writing in this review might be impolite (example phrases here) - would you like to edit your review before you submit it?" Also, reviewcredit platforms like Publons could display not only the number of reviews that someone wrote, but an AI-generated assessment of how constructive, detailed, and polite their reviews are (this would help nudge people into writing better reviews, and also give credit where it's due to careful reviewers, which is part of the aim of Publons and similar platforms). This is just off the top of my head - there are many other good ideas about how AI could transform the peer review process. Indeed, AI is already good enough to generate quite useful peer reviews and constructive criticism of draft papers, and will surely get better at this... this surely has lots of implications for science publishing over the coming decades.

      These are great suggestions for implementation of this tool. I now end the first paragraph of the Discussion (Page 4) with the following sentence:

      Such an automated language analysis of peer reviews can be used in different ways, such as afterthe-fact analyses (as has been done here), providing writing support for reviewers (for example by implementation in the journal submission portal), or by helping editors pick the best papers or most constructive reviewers.

      "Further research is required to investigate the reasons behind this effect and to identify in what level of the academic system these differences emerge." Here you could mention what this research would be - I think you'd need the full sample of reviewed papers, not just those that were accepted. Spell out what analyses would be required to test and falsify the various (very plausible and interesting) competing hypotheses that you mention for the male-female difference in sentiment scores.

      Great point. I added a Supplementary Fig. 6, in which I show a visual depiction of the experiments that can be performed to answer these questions.

      "areas of concern were discovered within the academic publishing system that require immediate attention. One such area is the inconsistency between the reviews of the same paper, highlighting the need for greater standardization in the peer review process." I disagree here. I think it is natural for there to sometimes be differences in how two or more reviewers rate the quality of a paper, even if the peer review process were carefully standardised (e.g. via the use of a detailed "peer review form", which helps guide reviewers to comment on all important aspects of the paper - some journals use these). This is because reviewers differ in their experience, expertise, or interests, and so some reviewers will catch mistakes that others miss, or request stylistic changes that others would not. More broadly, it's often not possible to write a version of the paper that satisfies all possible reviewers.

      I re-phrased part of the Discussion on Page 5 to indicate other sources of inter-reviewer variability. Specifically, I mention that some variability in sentiment can be expected based on the different backgrounds of the reviewers:

      Notably, some level of variability may be expected, for example due to different backgrounds, experiences, and biases of the reviewers. In addition, ChatGPT may not always reliably assess a reviews sentiment, adding some spurious inter-reviewer variability.

      Yet, as also mentioned in my response to one of the previous questions, I still find the the extremely low levels of consistency striking, even after taking these possible sources of interreviewer variability into account.

      "the maximum score an institution could receive was 100 (in 2023 this was Massachusetts Institute of Technology)" - this seems unnecessary information (just mention the score runs from 0-100).

      I agree with this reviewer that this was unnecessary information. This has been removed.

      "reviewers are generally familiar with the senior author of papers they review and thus are likely aware of their gender identity." This seems like a strong assumption, and you don't provide any evidence for it Speaking personally, as a reviewer and journal editor I am often not familiar with the senior author, or I am familiar with the first author - I am not sure how often I know the senior author but not the first author or vice versa. It's also not always the case that the first author is a junior scientist and the last author a senior, famous one, as you imply. I suggest that you use the same approach to score the gender of both author positions, namely inferring their gender programmatically from their name (I agree that generally the important thing for the purposes of this study is the gender that reviewers will infer from the name, not the author's actual gender, and so gender estimation from first names is the correct approach).

      I appreciate this reviewer raising this point, and I have carefully weighed the pros and cons of both approaches. Initially, my intention was to rely only on a programmatic method to identify authors' names. However, I came to realize that there were inaccuracies in senior author gender predictions made by ChatGPT/Genderize. This was evident to me due to my personal familiarity with some of these authors, either because they are famous or through personal interactions. It seemed problematic to me to proceed with this analysis knowing that these misclassifications would introduce unnecessary variability to the dataset.

      The advantage of the relatively small sample size in this study was the opportunity to manually perform this task, rather than being fully dependent on algorithms. While I attempted manual gender identification for the first author as well, this was way more challenging due to their limited online presence. The discrepancy in gender identification accuracy between first and senior authors did not go unnoticed, and I acknowledge the issue it presents. I also recognize that, unlike senior authors, reviewers may not necessarily be familiar with the first authors of the papers they evaluate, as indicated in the original submission of this paper. In light of this, I sought input from several PIs who often serve as reviewers. Their feedback confirmed that they typically possess knowledge of senior authors' identities, for example through conferences, whereas the same is not true for first authors. Yet, this may be different for other scientific disciplines, where the pool of reviewers might be bigger.

      Notably, for future studies I may make a different decision, especially when I use larger datasets that require me to automate the process. I now more elaborately explain why I made this decision on Page 7 of the manuscript.

      In the Abstract, you write "suggesting a gender disparity in academic publishing". This part of the sentence contains no information about what you think is the cause of the male/female difference, and no further interpretation of its ramifications, so I think you can just remove it (because "disparity" just means a difference, so you are effectively saying something redundant like "there was a difference between papers with male and female senior authors, suggesting there is a difference")

      I thank the reviewer for pointing this out. I replaced the latter part of this sentence with “(…) for which I discuss potential causes.”, which I think is better than a short summary of potential causes which may lack the nuance that such a topic deserves.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First of all, we would like to again thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article. With those comments in mind, we have now revised our manuscript. Please see below for a point-by-point response (our responses in green) to all comments.

      Reviewer #1 (Recommendations For The Authors):

      Sun and colleagues outline structural and mechanistic studies of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. The manuscript includes a crystal structure of the Ig-like domains of PrgB, cryo-EM structures of the majority of the intact polypeptide in DNA-bound and free forms, and an assessment of the phenotypes of E. faecalis strains expressing various PrgB mutants.

      Generally, the study has been conducted with a good level of rigor, and there is consistency in the findings. However, I do have some specific technical concerns relating to the study that necessitate the undertaking of additional experiments. These are summarized as follows:

      1) Recombinant PrgB188-1233 produced in the study purifies as a mixture of monomeric and dimeric species separatable by SEC. There is very limited discussion in the text re. the significance and/or implications of this. Is it feasible that the dimeric form is biologically relevant in the context of the in vivo situation? Or alternatively, is this simply an artifact of protein production?

      Experimental data that we published in 2018 indeed indicates that the dimer is relevant in the in vivo situation. We did not discuss this here since this was discussed in detail in the previous paper: Schmitt et al, 2018. We have now added a bit more information on this in the results section, highlighting this, so that it is clearer to the reader (lines 114-116).

      2) The authors see no evidence of the adhesive domain of PrgB in their PX structure highlighting that this must have been cleaved during crystallisation. Is this claim supported by an inspection of the crystal packing? It could be that this region of the protein is dynamic within the context of the crystal and is thus not observed. This should be clarified in the text either way.

      The crystal packing does not provide any space for the PAD. We have added this to the results section. We have added a sentence describing this in lines 122-124.

      3) The Cryo-EM structures reported are both at ~10-angstrom resolution. Are the authors truly confident in the placement of their crystal structures on these maps? Visual inspection indicates that their positioning of the PrgB domains into the EM envelopes is somewhat questionable. The authors need to provide some quantitative measures of the quality of their domain fitting. The narrative of the manuscript very much hinges on this being correct.

      This is something that the other reviewer also commented on. The fitting of the crystal structures in the maps are indeed not optimal, but was the best we could do with the available data. In line with point #6, we have now constructed new protein variants of the stalk domain (the four Ig-like domains) alone, and have assayed it’s interaction with the PAD in vitro using native gels and size exclusion chromatography. The outcome of these experiments is that the two domains do not interact in any substantial way on their own. Thus, the added experiments do not support the hypothesis that the PAD interacts with the Ig-like domains, at least not without the local high concentration provided by the linker region in the in vivo situation.

      To account for these new experiments, we have moved the cryo-EM structure to the supplement, and rewritten this part of the manuscript to say that the cryo-EM data indicated that there might be an interaction, but that we have not been able to verify this in vitro, indicating that if the interaction at all exists it must have a low affinity and is likely not physiologically relevant. In line with this, we have also further modified the text throughout the manuscript to account for this.

      4) The manuscript would be significantly strengthened if the authors could include confirmatory hydrodynamic data in support of the observed conformational reorganization of PrgB in the presence of DNA. SAXS analysis of the DNA-free and bound complexes would be ideal for this and would also help address the issues raised above in pt 3.

      To analyze PrgB radius with and without DNA, we tried both SEC-MALS and DLS experiments. It proved difficult to obtain precise and reproducible values, but the initial data indicated that no large changes were observed upon DNA binding. As we could also not measure specific interaction between the PAD and the stalk in vitro, we did not perform SAXS experiments. As mentioned in the response to point #3, we have modified the results and discussion regarding the potential interaction of th PAD and Stalk domains.

      5) The authors present binding studies of various PrgB mutant-expressing strains. A number of the mutations generated delete significant portions of the polypeptide. Can the authors confirm that these mutant proteins are correctly folded despite the introduced mutations? It could be that loss of function is simply a consequence of mutation-induced misfolding. I would like to see some confirmatory data (CD, SEC, etc.) in support of the foldedness of the mutant proteins.

      We cannot completely rule out that the folding of some of the variants is affected in E. faecalis. However, CD or SEC experiments would only give indications of the contrary if the overall fold had been majorly affected in an in vitro situation where the protein is not anchored to the E. faecalis cell wall.

      To alleviate this valid concern, we probed if all variants are correctly exported and linked to the cell-wall. Therefore we have now extracted the cell wall of E. faecalis producing wild-type or variant PrgB and performed Western blot . The results of the Western blot with cell wall extract largely matches the whole cell experiments that were in the initial manuscript. If a protein variant was largely misfolded, it would likely not be targeted and linked to the cell-wall, nor would it be stable in vivo. We have added this new data as a new fig 3 – figure supplement 1 and on lines 201-214

      6) The authors suggest a direct interaction between the PAD and the stalk domains in PrgB. The discussion of this is very generic and no evidence to support this is provided other than the 10-angstrom resolution EM map. If they believe this to be the case, then additional evidence should be provided.

      Answer: As mentioned previously, we have now performed additional in vitro experiments to probe this potential interaction, but conclude that this indication from the EM data is likely not a real high affinity interaction. In line with this, we have modified the results and discussion regarding this point, see also response to point #3 and 4.


      Reviewer #2 (Recommendations For The Authors):

      As currently presented, I don't feel that the cryoEM data support the authors' proposed model, largely because the fit of the crystal structures to the EM volumes does not seem entirely reasonable for the apo- dataset and because the EM volume for the ssDNA bound dataset is not even contiguous. For me to believe the model as it is currently built, I would want to see a dataset with the PAD deleted, showing that its proposed density disappears, or a dataset with a PAD-specific antibody as a fiducial marker. It would be nice to see some goodness of fit metric with a comparison to other crystal structures fit such low-resolution data as well. At the very least, the authors must include the standard cryoEM workflow supplementary figure showing representative micrographs, 2Ds, and 3Ds along with particle numbers.

      In line with the comments raised by reviewer #1, we have now added more experiments where we have analyzed the potential interaction between PAD and the stalk domain. From this new data, it looks like they do not interact with any substantial affinity, at least not on their own without any linker region holding them together, and that this interaction if it all exist likely is not physiologically relevant. The cryo-EM data has been moved to the supplement as we agree with both reviewers that the resolution, and the fitted model, is not good enough to draw any hard conclusions. The standard table for the cryoEM workflow was present as supplementary table 2, where eg particle numbers etc are described, but we have now also added a new supplementary fig 2 – figure supplement 2 that shows the EM processing workflow, including representative micrographs, 2D and 3D classes. We debated whether we should remove the EM data, but decided against it in line of transparency and to explain why the interaction studies with the PAD and stalk domains were performed.

      The X-ray crystallographic structure is very nice, but I was a bit surprised by the R factors in Table 1. After downloading the structure factors and coordinates from the PDB (thank you for depositing before submission!) I was able to see quite a few positive peaks in the difference map that could probably use some cleaning up. I realize I may just be a bit of a masochist when it comes to adding/deleting waters and moving around side chains to get things just right, but for such lovely data, I would have liked to see the model polished up a bit more. I was going to say that the isopeptide bond should be modelled, but I can see from a cursory Google that the authors did in fact try to find a way to model this and that it is indeed a bit of a pain.

      The model refinement proved surprisingly recalcitrant with regards to the remaining difference density, so we took the decision to only model what was solidly there (which leads to slightly higher R factors). We did indeed try to model the isopeptide bond, but we did not find a good way to do so (despite trying quite extensively), and ended up determining them as a linker in the PDB file, so that the bond shows up when one opens the structure in eg. Pymol.

      For protein production/purification in general I would have liked to see actual traces for the gel filtration and pure protein on a gel in a supplementary figure. I strongly believe that this type of information is so critical for future researchers looking to replicate or build upon published work so that they have some sense that what they are doing is working in the way it should be.

      We have now added a supplementary figure (as new Fig. 1 – figure supplement 1) that shows SEC and SDS-PAGE for the purification of PrgB188-1233.

      Finally, I think for the in vivo data it only makes sense to show the reader whether any or all the differences measured across your different mutants are statistically significant. Having done the graphing and analysis in GraphPad this should be a simple thing to achieve.

      We have now added statistical test (One way Anova) that show the statistical significance between the mutants, and show that in Fig 3 and Fig 4.

      Overall, I think it's a very nice paper and while I feel that the cryoEM data in its current form doesn't support the model of occlusion from PrgA, I also don't think that removing the cryoEM data and that specific mechanistic idea from the paper detracts from its overall message and impact.

      Thank you for those comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      p. 5, l. 87-90: The control of flgM by OmrA/B (PMID 32133913) and the antisense RNA to flhD (PMID 36000733) are other examples of known regulatory RNAs that impact the flagellar regulon.

      We thank the reviewer for pointing out these references and have added citations to them (page 5, lines 87-91).

      p.11/Fig. 3: it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA. I realize that it is outside of the scope of this study, but have the authors considered the possibility that ArcZ or McaS could have a role in the previously reported repression of rpoS by LrhA (PMID 16621809)?

      We agree that it is intriguing that ArcZ and RprA, two of the rpoS-activating sRNAs, repress lrhA, and added mention of this regulatory connection (page 12, lines 247-250).

      p. 13/l. 272: I do not understand why the authors say that "r-proteins were almost exclusively found in chimeras with MotR and FliX and no other sRNAs...", given that several other chimeras between r-prot and other sRNAs are found

      While some r-proteins encoding genes were found with other sRNAs in RIL-seq datasets, MotR and FliX generally had the highest numbers. The text was revised to better describe the RIL-seq data for r-proteins interaction partners (page 14, lines 291-295), and a new panel showing the S10 operon with all the interacting sRNAs was added to Figure 3—figure supplement 1B.

      Fig. 4 and 5: One possible improvement would be to more systematically assess the effect of base-pairing mutants of the sRNAs, such as MotRM1 or FliXM1 on fliC and rps/rpl genes in vivo. This is especially important for the mutants that affected the sRNA effects in the in vitro probing assays, such as UhpU-M2, MotR-M1 and FliX-S-M1 on fliC (Fig. S7)

      As suggested, we examined fliC mRNA levels across growth in motR-M1 and fliX-M1 chromosomal mutants. The results of these northern assays, now shown in Figure 8—figure supplement 1, are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background (page 21, lines 444446, 449-453).

      Fig. 5: it may be worth including a schematic of the whole S10 operon to highlight its length and its organization?

      As suggested, a schematic representation of the S10 operon was added to Figure 3—figure supplement 1 with a summary of the RIL-seq data for this operon.

      Probing data (Fig. 5, S7 and S9): in general, it is difficult to differentiate the thin and thick brackets, and what is indicated by the dashed brackets is not always clear. Maybe using a color-code instead could help? Highlighting the predicted pairing regions on the different gels could be useful as well.

      We thank the reviewer for this suggestion and color-coded the brackets (Figure 5, Figure 4figure supplement 2, and Figure 5-figure supplement 2). The correspondences to regions of predicted pairing are described in the figures legends.

      Fig. S10: The experimental evidence used to support FliX-dependent degradation of the rpsS mRNA is indirect (primer extension to observe higher levels of cleavage intermediates). It would be nice to be able to observe a decrease in the mRNA levels as well, either by Northern, or primer extension from a region more distant to the FliX pairing site.

      The S10 operon is long (~5 KB). We have tried multiple probes for this mRNA and detect many bands with each, likely due to extensive regulation of this operon. We think teasing out the origin of the different bands to appropriately interpret changes in patterns will require a significant amount of work.

      legend of Fig. S10: from the gel, it seems that only the plasmids differ in the samples, and it is not clear where the data corresponding to the WT strain mentioned in the legend is shown

      The samples shown in this figure are all for the indicated plasmids in the WT strain. We corrected the figure legend.

      Table S1: please define the NOR (normalized odds ratio?)

      The definition of Normalized Odds Ratio was added to the legend of Supplementary file 1.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Figure 1B. Please add a negative control (which could be in the supplementary section) from a large section showing transcripts that are not directly influenced by Hfq.

      We think the flgKLO browser in this figure serves as a negative control; flgK and flgL clearly are not enriched on Hfq in contrast to FlgO. Figure 1B was generated using published datasets that are easily accessible to the readers at a genome browser and show many other examples of transcripts that are not influenced by Hfq: https://genome.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hpc.nih.gov/~NICHD- core0/storz/trackhubs/ecoli_rilseq/hub.hub.txt&hgS_loadUrlName=https://hpc.nih.gov/~NICHDcore0/storz/trackhubs/ecoli_rilseq/session.txt&hgS_doLoadUrl=submit

      Line 158. MotR* is a more abundant version of [the constitutively overexpressed] MotR. Is there a Northern or qPCR to confirm this? While I understand the relevance of these mutated constructs, their high expression can lead to artefactual effects.

      This is a valuable point and therefore we provided a northern blot to document the relative levels of MotR and MotR* (Figure 2—figure supplement 1A).

      Figure 2. The overexpression of MotR/MotR* from a plasmid is increasing the number of flagella. However, when the MotR gene is deleted, is there a reduction of the number of flagella? Same question with FliX: what happens when the fliX gene is deleted? According to the model described in the manuscript, we should expect fewer flagella in ΔmotR background and an increased number of flagella in ΔfliX background. Both Figure 2 and Figure 8 would benefit from additional experiments with deleted motR and fliX genes.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provided such data in Figure 8 and Figure 8—figure supplement 1 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. The chromosomallyexpressed MotR-M1 and FliX-M1 base pairing mutants did show the expected phenotypes of reduced and increased numbers of flagella, respectively (Figure 8A-B). As suggested by reviewer 1, we added northern analysis that examined fliC mRNA levels across growth in motRM1 and fliX-M1 chromosomal mutants. The results of these northern assays are consistent with our model as we observed delayed expression of fliC mRNA in motR-M1 background and premature expression in fliX-M1 background. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with the expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs, respectively.

      Figure 3 is key to demonstrating the sRNAs pairing with their specific targets and potential effect on bacterial swimming. However, these results would be more relevant with endogenous expression of the sRNAs and demonstration of their effects on the same targets. A Northern blot showing the overproduced sRNA level compared to endogenous sRNA level could help us appreciate the expression ratio.

      The levels of the UhpU, MotR and FliX expressed from the overexpression plasmids are at least 100-fold higher than the endogenous levels. Thus, we agree that assays of chromosomal deletion/point mutants are important experiments. We did construct chromosomal uhpU-M1 and uhpU∆seed sequence mutants. However, under the conditions assayed, the uhpU chromosomal mutations did not result in observable effects on motility or FlhD-SPA protein levels. It is possible we would be able to detect differences between the wild type and uhpU chromosomal mutant strains under different growth conditions or in different assays, but this would require a significant amount of work. For many other sRNA chromosomal mutations have no or only subtle effects, suggesting redundancy between sRNAs or sRNA roles in fine tuning gene expression.

      Figure 4. In panel B, the empty plasmid pZE alone seems to positively affect the flagellin expression when compared to the WT background. This can also be seen in Figure 4C. There is no fliC signal with empty plasmid pBR* but a strong fliC signal with empty plasmid pZE. Maybe the authors can explain this in the manuscript.

      With respect to panel B and Figure 4—figure supplement 1A, we agree that there is some variation between the levels of flagellin in the WT and pZE control samples, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4— figure supplement 1 to better document the changes in flagellin levels.

      With respect to panel C, the pBR samples were collected in crl+ background while the pZE samples were collected in crl- background, which explains the lack of fliC signal in the pBR control sample. This is now noted in the figure legend.

      In lines 154-157, the justification for using two plasmids is described. An IPTG-inducible Plac promoter, the pBR*, is used because the constitutive overexpression of UhpU is resulting in mutated UhpU clones. These observations suggest a toxic expression level of UhpU that the cell can only tolerate when the UhpU RNA is somewhat deactivated by mutations. This does not seem like a detail and could be discussed further.

      We agree with the reviewer that this observation is important and now mention that it suggests at a critical UhpU role (page 8, lines 160-163).

      Figure 5E and I. While the bindings of MotR on rpsJ and Flix-S on rpsS are clear, the resolution of both gels in the areas of binding (upper part of both gels) could be improved.

      We found it tricky to choose the mRNA fragments for the in vitro structure probing for the regions of predicted pairing internal to CDSs. Given that we hoped to retain native RNA folding, we chose long fragments; for rpsJ, we started with the +1 of S10 leader and for rpsS, we started 147 nt into the CDS, a region that overlaps the region that was cloned to the rpsS-rplV-gfp fusion. Consequently, the region of base pairing is in the upper part of both gels. The gels were already run for an unusually long time. Thus, we do not think the resolution could be improved further. Nevertheless, we think the region of protection is evident for both mRNAs.

      Minor comments:

      Fig 1B. The promoter symbols are extremely small, please increase the size.

      As suggested, we have enlarged the promoter symbols in Figure 1B as well as in Figure 3A.

      Line 211. "the lrhA mRNA has an unusually long 5´ UTR". How long exactly?

      The 5’ UTR of the lrhA mRNA is 371 nt long. This is now mentioned in the text (page 11, line 224)

      Line 320. Should "Fig 9C" be "Fig S9C" instead?

      We thank the reviewer for noticing this typo. Callouts to supplementary figures have now been renumbered per eLife format.

      Line 384. Something seems to be missing in the sentence "a representative combined class 2 and 3 promoter".

      The sentence has been modified to clarify the designation (page 19, lines 409-411).

      Reviewer #3 (Recommendations For The Authors):

      Recommendation to clarify/strengthen the presentation of science in the paper:

      Lines 102-103: Can the authors provide some more information on how the sRNAs were initially discovered to be potentially sigma-28 dependent and selected?

      As suggested, we expanded the section discussing the discovery and the selection of these sRNAs (page 6, lines 104-109).

      Lines 192-193: It would be helpful to provide a bit more information in the main text about what are the different RIL-seq data sets (18 in total).

      As suggested, we now provide more details about the different RIL-seq datasets we used in the analysis (page 10, lines 202-205).

      It would be helpful to specify the criteria for "top" interactions in targets retrieved from RIL-seq data (Table S1 and text, e.g., line 273): e.g. number of conditions, number of chimeras, etc.

      As suggested, we now more explicitly specify the criteria for selecting targets to characterize (page 10, lines 205-206).

      Fig. 4B/ S6 and line 242: The flagellin amount in the empty vector control (pZE) looks higher than in WT, and the stated effect of MotR/MotR* OE on flagellin is not very clear from the blot. The "cross-reacting band" above flagellin also seems to vary among strains. Could the authors include a quantification of flagellin protein amount and normalize relative to a housekeeping protein (e.g., GroEL), instead of Ponceau S as loading control?

      We agree that there is some variation between the levels of flagellin in the WT and pZE control sample, possibly due to the addition of antibiotic to the pZE culture. We added quantification of the bands in Figure 4—figure supplement 1 to better document the changes in flagellin levels.

      Figure legends: It would be helpful to have a bit more information about the method used/displayed image rather than stating results in the legends.

      As suggested, we now provide a bit more information about the methods used/displayed image in the figure legends to allow for easier comprehension of the data presented in the figures (while trying to balance this with the length of the legends).

      Fig. 2: Please include a scale for all electron microscopy images or, if it is the same for all panels, state it in the figure legend. Moreover, the same image is used for the pZE control in panel C, E and Figure S4A/C. It would be better to show different fields of bacteria for the pZE sample.

      As is now mentioned in the legends to Figure 2, Figure 2—figure supplement 2, and Figure 8, the same scale was used for all panels. We thought it was better to show the same image for the pZE control in the different panels to emphasize that these samples were all analyzed on the same day.

      Fig. 2: The sRNA OE strains seem to show some heterogeneity in cell length (pZE-MotR) or width (pZE-FliX). The authors could, e.g., check whether this is a phenotype correlated to sRNA OE by quantifying these parameters for different fields and comparing to WT or comment on this in the text if this is not consistently seen.

      We also were intrigued by the slightly different sizes and widths of cells in the EM images. However, our statistical analysis did not reveal significant differences between the different samples. We now comment on this (page 53, lines 1178-1179).

      As a follow-up to this study, it would be interesting to assess the impact of MotR and FliX regulation of ribosomal protein synthesis on overall ribosome activity (e.g., via Ribo-seq), also considering that antitermination regulates rRNA transcription. In the case of MotR, the authors suggest that MotR upregulation of S10 protein might not only impact antitermination, but also lead to the formation of more active ribosomes that would increase flagellar protein synthesis (lines 359-362). However, in the RNA-seq performed in OE MotR* several transcripts encoding rRNA and ribosomal proteins are significantly downregulated compared to EVC (Supplementary Table S2). Could the authors comment on this?

      We share the reviewer’s enthusiasm for follow-up work and thank for the suggested experiments. We hope we will be able to decipher the full mechanism of MotR and FliX action on ribosomal protein synthesis in future experiments. The observation that some ribosomal protein-coding gene levels are reduced in the RNA-seq experiment with overexpression of MotR* is interesting but we do not have an explanation other than the fact that the samples were collected early in exponential growth. We now mention the observation in the text (page 19, lines 404-407).

      Considering that OE of the WT MotR appears to increase fliC mRNA abundance but has no strong impact on flagellin protein levels, can the authors speculate what is the physiological relevance of MotR* for flagellin production?

      We agree that while we do see significant increases in the flagella number and fliC mRNA abundance with MotR and MotR* overexpression, the western analysis did not reveal a striking increase in flagellin levels and also wonder how MotR strongly increases the flagella number, which requires flagellin subunits, but only has a weak effect on the intercellular levels of flagellin. One possibility explanation is that it is more difficult to see significant increases for a protein whose levels are high to begin with. These points are now discussed (page 13, lines 264-269).

      Fig. 4C: The pZE samples seem to show variable expression of fliC mRNA although the samples are collected at the same timepoints. Try to clarify in the text.

      The northern membrane on the bottom was exposed for a longer time due to the lower fliC mRNA levels in the samples with FliX overexpression. We now note these differences in the legends to Figure 4 and Figure 4—figure supplement 1.

      Fig. 7/S13: While a volcano plot for MotR is shown in Fig. 7A, quantification of GFP reporter fusion regulation is shown for MotR. Quantifications of MotR are shown in Fig. S13. Maybe swap the figures.

      Given that the data for MotR are in the supplement figures for all other figures we would also like to retain this distribution for Figure 7 (aside from the volcano plot since this experiment was only carried out for MotR).

      Lines 135-136 (Fig. S1B): on the northern blots, only sRNA levels of MotR are comparable between rich and minimal media (excluding M63 G6P and M63 gal). Most other sRNA seem to be more abundantly expressed in minimal media conditions compared to LB. Maybe rephrase.

      As suggested, the text was revised to point out the differences in the sRNA levels for cells grown in different growth media (page 7, lines 140-144).

      Lines 229-234: this paragraph seems not directly connected to the aims of the study (i.e., no effect on motility tested of these other sRNAs) and could be removed (or moved to discussion).

      We appreciate the reviewer’s suggestion but, considering Reviewer 1’s comments, think that showing the regulation of lrhA by other sRNAs has value in highlighting the complexity of the regulatory circuit. We have revised the text to incorporate Reviewer 1’s suggestions and better explain why these results are intriguing (page 12, lines 247-250).

      Line 200 and Fig. S5: For FlgO sRNA only one target was identified in RIL-seq. This gene could be specified and labeled in Fig. S5 and the text. Does FlgO also bind ProQ?

      We now mention the single FlgO target (gatC) detected in four datasets (page 10, lines 213215). In Figure 3—figure supplement 1, we labeled only targets that we followed up with in the current study. Therefore, to be consistent, we prefer not to label gatC in the FlgO plot. FlgO was found to co-immunoprecipitate with ProQ but at much lower levels than with Hfq, and to have very few RNA partners (Melamed et al., 2020).

      Lines 493-498: It is mentioned that the four sRNAs were also detected in recent RIL-seq experiments of Salmonella and EPEC. Are any of the here identified targets also found in other species or was none detected as analyses were carried out under conditions that do not favor flagella expression?

      The targets identified in this study were not detected in the Salmonella and EPEC RIL-seq datasets. However, the Salmonella and EPEC experiments were carried out under different growth conditions. Based on the sequence conservation of the Sigma 28-dependent sRNAs across several bacterial species (Figure 8—figure supplement 2), we do think overlapping targets will be found in other bacterial species under the appropriate growth conditions.

      The strongest evidence of MotR dependent target regulation is the one on rpsJ, which does not necessarily require the additional experiments with MotR. Since the authors were able to show upregulation of the rpsJ-gfp reporter upon OE of MotR WT, it would have strengthened the results if they performed the experiments in Fig. S8C with MotR WT. Similary as an increase of flagella number was seen with OE of MotR WT in Fig. 2A, the effect of the OE S10∆loop could be compared to OE MotR instead of OE MotR (Fig. 6A). At least if would be helpful, to briefly comment on why MotR* was used instead of MotR WT for these experiments.

      As suggested, we state MotR was used in some assays given the stronger effects for some phenotypes (page 10, lines 196-197). We think, given that we established MotR and MotR cause the same effects, with increased intensity for the latter, it is reasonable to use MotR* in some of the experiments.

      p. lines 482-491 and 508-511: The authors discuss that both UhpU sRNAs and RsaG sRNA from S. aureus are derived from the 3'UTR of uhpT, but conclude there is no overlap regarding flagella regulation, suggesting independent evolution of these sRNAs. However, the authors also mention that UhpU sRNA has many additional targets beyond LhrA involved in carbon and nutrient metabolism. Thus, maybe regulation of metabolic traits could be a conserved theme and function for UhpU and RsaG? Maybe try to comment on or better connect these two parts in the discussion.

      As suggested, we now comment on the possibility of the regulation of metabolic traits being a conserved theme and function for UhpU and RsaG (page 24, lines 520-527).

      Check the text for consistency regarding the use of italics for gene names (e.g., legend of Figs. 7 and 8)

      The text was corrected.

      Please introduce abbreviations, e.g., G6P (line 139), REP (line 150), ARN (line 258), NOR/U (Table S1 legend)

      As suggested, we now introduce the abbreviations for G6P (page 7, line 142), REP (page 8, lines 155-156), and NOR (Supplementary file 1 legend). Regarding ARN, these sequences are already written in parentheses in the same sentence. However, we revised this to “ARN motif sequences” (page 13, line 278).

      Fig. S1A: Highlight REP sequence mentioned in text (line 150).

      REP sequences are now highlighted in gray in Figure 1—figure supplement 1A.

      Fig. S1C: It would be helpful to list number nt positions on the sRNAs based on full-length transcripts.

      The corresponding positions based on the full-length transcripts have also been added to this figure.

      Fig. S2: Adjust the position of UhpU-S label.

      UhpU-S label position was adjusted.

      Fig. S6: Include UhpU in the figure title.

      UhpU was added to the title.

      Fig. S10: It would be helpful to indicate on the figure (or state more clearly in the legend) which RNA was extracted from WT or ΔfliCX background.

      The samples shown in the Figure are all in a WT strain. We corrected the figure legend accordingly.

      Line 290: the effect is on flagella number, not motility.

      This typo is now corrected (page 15, line 312).

      Fig. S8: One-way ANOVA (panel A legend)

      This typo is now corrected (page 64, line 1433).

      Line 320: Fig. S9C instead of 9C

      We thank the reviewer for noticing the typo. The numbering of the supplementary figures has now been changed to the eLife format.

      It would be helpful to add reference for statement in line 57.

      A reference to (Fitzgerald et al., 2014) was added as suggested.

      Add PMID:32133913 as reference for post-transcriptional regulation of the flagellar regulon in the introduction (lines 87-91)

      The indicated reference was added as suggested (page 5, lines 87-91).

      Legend Fig. S6: expand view -> expanded view

      This typo is now corrected (page 63, line 1406).

      line 513: sRNA -> sRNAs

      This typo is now corrected (page 25, line 549).

      Fig. 8G: Maybe include lrhA as target of UhpU sRNA at top of the cascade.

      As suggested lrhA has been added as a target of UhpU at the top of the cascade.

  4. Sep 2023
    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • The improvement of the gene annotations of the ferret genome was an important part of this study, and so I would recommend that the authors have a results section and figure dedicated to documenting this.

      Thank you so much for appreciating our efforts on improving gene models, which was indeed a critical part in this study. According to the reviewer’s suggestion, we added a new section to the main text, “Improvement of the gene model for scRNA-seq of ferrets” with a figure (Fig.1 C, D, E).

      • Are the references to figure S8A, B alright (line 306)? In fact, that entire figure was not well described or out of place. In general, unlike the rest of the manuscript, the section dealing with the human-ferret comparison was a little bit confusing, and the figure legends were not extremely helpful. Could the authors please revisit the main text and figure legends of this section for clarity?

      We agree with the reviewer’s recommendation. We removed references to Figure S8A, B. In place of that, we explained the reason more carefully; “We chose a recently published human dataset (Bhaduri et al, 2021) for comparison, because this study containing GW25 dataset which included more tRG cells than previous studies that did not contain GW25 data. Furthermore, we used only data at GW25”

      We also revised several parts in this section to understand more easily by additional explanations as well as in the legends of Fig. 7 and Fig. S8.

      Reviewer #2 (Recommendations For The Authors):

      I have a few very minor comments on the manuscript.

      • I would caution the authors against claiming that they have demonstrated bona fide generation of ependymal cells from tRG cells. While the expression of FOXJ1 is a very good indication, they have not demonstrated the morphological transformation of a tRG cell into an ependymal cell.

      We agree the reviewer’s opinion. We have never thought that we proved that tRG differentiates ependymal cells, but we consider that this is highly likely the case (We use the term “suggest” in the abstract). To prove this genetically, we extensively tried to knock the EGFP gene into the CRYAB gene by the CRISPR/Cas9 method, to be able to show the lineage relationship between tRG and ependymal cells. However, we have so far failed to do this for a year trial. We also tried to just label tRG with EGFP and follow it in the slice culture.

      However, we failed to keep the slice in the culture until we observed the transition from tRG shape to the ependymal shape. It seems to be a slow process. What we could do was to observe the transition from single cilia to multi-cilia, which is part of the morphological transition from epithelial neural stem cells such as Radial Glia to an ependymal-like sheet form. To prove this transition from tRG to ependymal cells (and also astrocytes) is one of the most important issue which needs some new idea, technique or strategy.

      • There are several typos throughout the manuscript that I would recommend fixing for example, page 5 line 123 says "OLIGO2" instead of "OLIG2"

      Thank you so much. We carefully read and corrected typos. We wish we corrected all of them.

      Besides these two points, the manuscript is already prepared to a high standard.

      I really appreciate reviewersʼ efforts to finish reviews in a short time, responding to our request related to the first authorʼs thesis application.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable investigation of the chromatin dynamics throughout the cell cycle by using fluorescence signals and patterns of GFP-PCNA and CY3-dUTP, which labels newly synthesized DNA. The authors report reduced chromatin mobility in S relative to G1 phase. The technology and methods used are solid, but the significance of the work is reduced by the model system employed, the HeLa cell line, which has a greatly abnormal genome.

      We have obtained data from a diploid human cell that validates the reduction of S-phase chromatin mobility.

      Public Review:

      The manuscript presented by Pabba et al. studied chromatin dynamics throughout the cell cycle. The authors used fluorescence signals and patterns of GFP-PCNA (GFP tagged proliferating cell nuclear antigen) and CY3-dUTP (which labels newly synthesized DNA but not the DNA template) to determine cell cycle stages in asynchronized HeLa (Kyoto) cells and track movements of chromatin domains. PCNA binds to replication forks and form replication foci during the S phase. The major conclusions are: (1) Labeled chromatin domains were more mobile in G1/G2 relative to the S-phase. (2) Restricted chromatin motion occurred at sites in proximity to DNA replication sites. (3) Chromatin motion was restricted by the loading of replisomes, independent of DNA synthesis. This work is based on previous work published in 2015, entitled "4D Visualization of replication foci in mammalian cells corresponding to individual replicons," in which the labeling method was demonstrated to be sound. Although interesting, reduced chromatin mobility in S relative to G1 phase is not new to the field.

      It was first shown in yeast (Heun et al. 2001; DOI:10.1126/science.1065366) that the S-phase mobility is reduced compared to the G1 phase. This was followed by other papers showing the same in yeast [(Gasser 2002; DOI: 10.1126/science.1067703), (Smith et al. 2019; DOI: 10.1091/mbc.E19-08-0469)]. The relation between chromatin motion and cell cycle progression in the mammalian genome is less studied. Over recent years there have been a few studies that addressed chromatin mobility and cell cycle progression but from a different perspective. In the publication Nozaki et al. (2017; DOI:10.1016/j.molcel.2017.06.018) chromatin motion analysis was performed on single histones. The study did not find a significant change of histone/nucleosome mobility measured during cell cycle progression. Using CRISPR/dCas9 to label random DNA loci, Ma et al. (2019; DOI:10.1083/jcb.201807162) found that chromatin motion in S-phase was significantly lower than in the G1 phase. However, most of the studies measure the chromatin motion using either insertion of ectopic loci or proteins marking the loci (dCas9) or histones. Using either ectopic loci addition or CRISPR/dCas9 might have an effect on the chromatin mobility itself and measuring single histone motion is not equivalent to measuring the motion of DNA segments. We, therefore, opted to label the DNA directly using the replication of the DNA. In this manner we preserve the native chromatin structure and, thus, motion.

      Importantly, in addition to measuring decreased DNA motion in S-phase, our study indicates that it is not the DNA synthesis per se but the loading of replisomes onto chromatin that slows down its motion. This allowed us to propose a mechanism on how chromatin motion is affected by DNA replication in S-phase.

      The genome in HeLa cells is greatly abnormal with heterogeneous aneuploidy, which makes quantification complicated and weakens the conclusions.

      We agree that the HeLa cells are aneuploid and we have addressed the heterogeneity of HeLa Kyoto within our detection methods (for clarification see point 3). To validate our conclusions in normal diploid human cells, we performed the chromatin mobility analysis using human fibroblasts (IMR90 cells in figures 2, 3 and S2) and plotted the MSD curves for different cell cycle stages. The outcome of this analysis showed that the mobility of chromatin in diploid fibroblasts in S-phase is lower than in G1 and G2. In fact, this effect is stronger in IMR90 cells than in HeLa Kyoto cells. Hence, this is not an aneuploid tumor cell phenomenon.

      The manuscript is difficult to follow in places due to insufficient clarity. The manuscript should be written in a way that can be understood without referencing previous articles. Overall, the work is moderately impactful to the field.

      Major recommendations:

      1) In Figure 1B, the illustration and images for S phase are confusing. The author should specify which is early S and which is late S. Do the yellow circles represent GFP-PCNA foci? How did the authors distinguish mid S from early S and late S (in Figure 2)? Are all images in Figure 1 scaled to the same contrast threshold?

      The yellow circles correspond to the colocalized signal of GFP-PCNA and Cy3-dUTP that overlap and represent the labeled chromatin sites that are replicated in the next cell cycle.

      We clarified all the points mentioned above and updated figure 1 and figure 2 accordingly.

      2) In Figure 2B, the y-axis is marked as "Frequency of cells" but the equation listed below is counting DNA (per focus). How to convert DNA (per focus) to DNA (per cell)? The x-axis is marked as "Genome size" without any unit (e.g., kb? Mb?) The x-axis seems to be the C factor, not the genome size.

      To determine the amount of DNA present in each labeled DNA focus, we first segmented the whole nucleus and measured the total intensity of DAPI (DNA amount) which is called IDNA TOTAL. Then the labeled replication foci are segmented and the intensity of label present in each segmented foci is measured (IRFi). Throughout the S-phase progression the amount of DNA increases twofold from early to late S-phase. The cells at each cell cycle stage were determined using the PCNA pattern. By plotting the frequency (number of cells) and the relative genome content normalized to the G1 stage we calculated the relative genome size otherwise called cell cycle correction factor for each stage from G1 to G2. The ratio of DNA intensity in labeled replication (IRFi)/ to the total DNA intensity of DAPI (IDNA total) gives the fraction of DNA present in each foci compared to the whole nucleus. This ratio was then multiplied by the genome size (Kbp) of HeLa Kyoto cells which was measured and published in Chagin et al. (2016; DOI:10.1038/ncomms11231). This gives us the approximate amount of DNA present in each labeled replication foci in Kbp. Since the genome duplicates over cell cycle stages, the measured DNA content in IRFi was corrected to the cell cycle stage (determined by PCNA) by multiplying the cell cycle correction factor.

      3) HeLa cells are known to be highly heterogeneous and heavily aneuploidy. Cells in one sample have different numbers of chromosomes ranging from 50 - 80. Therefore, GS (genome size) for each cell should not be the same. Using one constant GS in the equation for every cell introduces errors. Has the cell-to-cell variation been considered and corrected in the data? If not, the authors should provide information regarding cell-to-cell variations, such as the intensity variation of nuclear DAPI signals in synchronized cells.

      It is true that the HeLa genome is aneuploid. However, the heterogeneity of the genome is true, if one compares different HeLa strains as studied in Frattini et al. (2015; DOI:10.1038/srep15377), where they show the variability of genome and RNA expression profiles and small genomic rearrangements among different HeLa strains. However, to our knowledge, it is not studied extensively or shown whether the heterogeneity and aneuploidy would also be a cell to cell variation. Therefore, we performed a control experiment to verify the variability between HeLa Kyoto cells, where we either synchronized or not and stained with DAPI and the DNA content profiles of all cells were plotted as a histogram (supplementary figure 1B) to show that cell to cell variations is not present and by synchronizing, we see that the cell population in G1, has similar DNA content showing that the cell to cell variability is negligible in our detection methods. Nonetheless, we have obtained data using normal diploid human fibroblasts, which validated our outcome.

      STABLE:

      Macville, Merryn, et al. "Comprehensive and definitive molecular cytogenetic characterization of HeLa cells by spectral karyotyping." Cancer research 59.1 (1999): 141-150.

      UNSTABLE:

      Liu, Yansheng, et al. "Multi-omic measurements of heterogeneity in HeLa cells across laboratories." Nature biotechnology 37.3 (2019): 314-322.

      Landry, Jonathan JM, et al. "The genomic and transcriptomic landscape of a HeLa cell line." G3: Genes, Genomes, Genetics 3.8 (2013): 1213-1224.

      4) The chromatin foci are in a variety of sizes and intensities. How were boundaries of foci determined? Weak foci were picked up in one image but not in another. This is a concern because the size of the chromatin domain could influence mobility measurement. The authors should provide control experiments or better explanations for detecting and selecting chromatin foci.

      The method for detecting chromatin foci is described in “Materials and Methods” section “Automated tracking of chromatin structures in time-lapse videos”. “Chromatin structures are detected by the spot-enhancing filter (SEF) (Sage et al., 2005; doi:10.1109/TIP.2005.852787) which consists of a Laplacian-of-Gaussian (LoG) filter followed by thresholding the filtered image and determination of local maxima. The threshold is automatically determined by the mean of the absolute values of the filtered image plus a factor times the standard deviation.” For reasons of consistency, we used the same threshold factor for all images of an image sequence. Therefore, depending on the intensity distribution in an image, it can happen that weak foci are not detected in some images. Alternatively, one could manually adapt the threshold factor for all single images, which, however, would be subjective. We now added the information that we used the same threshold factor for all images of an image sequence.

      5) In Figure 3, the authors combined MSD from G1 and G2 in one group. Has any published data suggested that chromatin dynamics are the same in G1 and G2?

      To clarify this we separated G1 and G2 mobility measurements in supplementary figure S2 and updated the figures and text accordingly.

      6) In Figure 3B, cytoplasmic CY3-dUTP foci are found in the G1/G2 and S images. Are these CY3-dUTP aggregates? If so, are they also found in the nucleus? What is the mobility of the cytoplasmic CY3-dUTP foci?

      These are aggregates and not found in the nucleus. These foci were excluded from the analysis by using a nuclear mask based on the PCNA signal. This information was added to the figure 3B legend.

      7) In Figure 4, how is colocalization defined? 1.8 um is approximately the size of a chromosome territory, which is much larger than 0.5 Mb. Two foci that are 1.8 um apart should not be considered in the same chromosome.

      We agree that colocalized would indeed mean that the signals are overlapping. Therefore, we updated the figures and text as center to center distance or proximity analysis.

      Minor comments:

      1) Figure 3D should be presented by a box and whisker plot. The histogram does not show an actual distribution of the data.

      The histograms shown in figure 3D is the average mean square displacement measurement value for different cell cycle stages. These are the same data shown in the table. Therefore, the histogram is removed and the table in figure 3C is retained.

      2) Please explain Figure 3C error bars in the figure legend. Are they SD?

      The error bars of the MSD curves (highlighted in bright color around the curves) in figure 3C show the standard error of the mean (SEM) representing the deviations between the MSD curves for an image sequence. We clarified this in the legend of Figure 3C.

      3) In Figure 5C, some western blotting results seem to be assembled from replicate experiments. Comparing signals from one experiment with the same background is suggested.

      We made sure that the western blots from the same replicates are cropped and the information is also added to the respective figure legends.

    1. Author Response

      Reviewer #1 (Public Review):

      Like the "preceding" co-submitted paper, this is again a very strong and interesting paper in which the authors address a question that is raised by the finding in their co-submitted paper - how does one factor induce two different fates. The authors provide an extremely satisfying answer - only one subset of the cells neighbors a source of signaling cells that trigger that subset to adopt a specific fate. The signal here is Delta and the read-out is Notch, whose intracellular domain, in conjunction with, presumably, SuH cooperates with Bsh to distinguish L4 from L5 fate (L5 is not neighbored by signal-providing cells). Like the back-to-back paper, the data is rigorous, well-presented and presents important conclusions. There's a wealth of data on the different functions of Notch (with and without Bsh). All very satisfying.

      Thanks!

      I have again one suggestion that the authors may want to consider discussing. I'm wondering whether the open chromatin that the author convincingly measure is the CAUSE or the CONSEQUENCE of Bsh being able to activate L4 target genes. What I mean by this is that currently the authors seem to be focused on a somewhat sequential model where Notch signaling opens chromatin and this then enables Bsh to activate a specific set of target genes. But isn't it equally possible that the combined activity of Bsh/Notch(intra)/SuH opens chromatin? That's not a semantic/minor difference, it's a fundamentally different mechanism, I would think. This mechanism also solves the conundrum of specificity - how does Notch know which genes to "open" up? It would seem more intuitive to me to think that it's working together with Bsh to open up chromatin, with chromatin accessibility than being a "mere" secondary consequence. If I'm not overlooking something fundamental here, there is actually also a way to distinguish between these models - test chromatin accessibility in a Bsh mutant. If the author's model is true, chromatin accessibility should be unchanged.

      I again finish by commending the authors for this terrific piece of work.

      Thanks! It is a crucial question whether Notch signaling regulates chromatin landscape independently of a primary HDTF. We will include this discussion in the text and pursue it in our next project. We think Notch signaling may regulate chromatin accessibility independently of a primary HDTF based on our observation: in larval ventral nerve cord, all motor neurons are NotchON neurons while all sensory neurons are NotchOFF neurons; NotchON neurons share similar functional properties, despite expressing distinct HDTFs, possibly due to the common chromatin landscape regulated by Notch signaling.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors explore how Notch activity acts together with Bsh homeodomain transcription factors to establish L4 and L5 fates in the lamina of the visual system of Drosophila. They propose a model in which differential Notch activity generates different chromatin landscapes in presumptive L4 and L5, allowing the differential binding of the primary homeodomain TF Bsh (as described in the co-submitted paper), which in turn activates downstream genes specific to either neuronal type. The requirement of Notch for L4 vs. L5 fate is well supported, and complete transformation from one cell type into the other is observed when altering Notch activity. However, the role of Notch in creating differential chromatin landscapes is not directly demonstrated. It is only based on correlation, but it remains a plausible and intriguing hypothesis.

      Thanks for the positive feedback!

      Strengths:

      The authors are successful in characterizing the role of Notch to distinguish between L4 and L5 cell fates. They show that the Notch pathway is active in L4 but not in L5. They identify L1, the neuron adjacent to L4 as expressing the Delta ligand, therefore being the potential source for Notch activation in L4. Moreover, the manuscript shows molecular and morphological/connectivity transformations from one cell type into the other when Notch activity is manipulated.

      Thanks!

      Using DamID, the authors characterize the chromatin landscape of L4 and L5 neurons. They show that Bsh occupies distinct loci in each cell type. This supports their model that Bsh acts as a primary selector gene in L4/L5 that activates different target genes in L4 vs L5 based on the differential availability of open chromatin loci.

      Thanks!

      Overall, the manuscript presents an interesting example of how Notch activity cooperates with TF expression to generate diverging cell fates. Together with the accompanying paper, it helps thoroughly describe how lamina cell types L4 and L5 are specified and provides an interesting hypothesis for the role of Notch and Bsh in increasing neuronal diversity in the lamina during evolution.

      Thanks for the positive feedback on both manuscripts.

      Weaknesses:

      Differential Notch activity in L4 and L5:

      ● The manuscript focuses its attention on describing Notch activity in L4 vs L5 neurons. However, from the data presented, it is very likely that the pool of progenitors (LPCs) is already subdivided into at least two types of progenitors that will rise to L4 and L5, respectively. Evidence to support this is the activity of E(spl)-mɣ-GFP and the Dl puncta observed in the LPC region. Discussion should naturally follow that Notch-induced differences in L4/L5 might preexist L1-expressed Dl that affect newborn L4/L5. Therefore, the differences between L4 and L5 fates might be established earlier than discussed in the paper. The authors should acknowledge this possibility and discuss it in their model.

      We agree. Historically, LPCs are thought to be homogenous; our data suggests otherwise. We now emphasize this in the Discussion as requested. We are also investigating this question using single cell RNAseq on LPCs to look for molecular heterogeneities. Thanks for the great comment!

      ● The authors claim that Notch activation is caused by L1-expressed Delta. However, they use an LPC driver to knock down Dl. Dl-KD should be performed exclusively in L1, and the fate of L4 should be assessed.

      Dl is transiently expressed in newborn L1 neurons. To knock down Dl in L1, we need to express Dl-RNAi before Dl protein is expressed in newborn L1; the only known Gal4 line expressed that early is the LPC-Gal4 that we used. There is no L1-gal4 line expressed early enough to eliminate L1 expression of Dl.

      ● To test whether L4 neurons are derived from NotchON LPCs, I suggest performing MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter.

      We agree! Whether L4 neurons are derived from NotchON LPCs is a great question. However, MARCM clones in early pupa with an E(spl)-mɣ-GFP reporter will not work because E(spl)-mɣ-GFP reporter is only expressed in LPCs but not lamina neurons. We now mention this in the Discussion.

      ● The expression of different Notch targets in LPCs and L4 neurons may be further explored. I suggest using different Notch-activity reporters (i.e., E(spl)-GFP reporters) to further characterize these. differences. What cause the switch in Notch target expression from LPCs to L4 neurons should be a topic of discussion.

      Thanks! It is a great question why Notch induces Espl-mɣ in LPCs but Hey in new-born neurons. However, it is not the question we are tackling in this paper and it will be a great direction to pursue in future. We will add this to our Discussion.

      Notch role in establishing L4 vs L5 fates:

      ● The authors describe that 27G05-Gal4 causes a partial Notch Gain of Function caused by its genomic location between Notch target genes. However, this is not further elaborated. The use of this driver is especially problematic when performing Notch KD, as many of the resulting neurons express Ap, and therefore have some features of L4 neurons. Therefore, Pdm3+/Ap+ cells should always be counted as intermediate L4/L5 fate (i.e., Fig3 E-J, Fig3-Sup2), irrespective of what the mechanistic explanation for Ap activation might be. It's not accurate to assume their L5 identity. In Fig4 intermediate-fate cells are correctly counted as such.

      Thanks for the comment! We will annotate Pdm3/Ap+ as L4/L5 fate in the corresponding figures.

      ● Lines 170-173: The temporal requirement for Notch activity in L5-to-L4 transformation is not clearly delineated. In Fig4-figure supplement 1D-E, it is not stated if the shift to 29{degree sign}C is performed as in Fig4-figure supplement 1A-C.

      Thank you for catching this. We will correct it in the text.

      ● Additionally, using the same approach, it would be interesting to explore the window of competence for Notch-induced L5-to-L4 transformation: at which point in L5 maturation can fate no longer be changed by Notch GoF?

      Our data show that Bsh with Notch signaling in newborn neurons specifies L4 fate while Bsh without Notch signaling in newborn neurons specifies L5 fate. Therefore, we think the window of fate competence is during newborn neurons. We will include the data to support this.

      L4-to-L3 conversion in the absence of Bsh

      ● Although interesting, the L4-to-L3 conversion in the absence of Bsh is never shown to be dependent on Notch activity. Importantly, L3 NotchON status is assumed based on their position next to Dl-expressing L1, but it is not empirically tested. Perhaps screening Notch target reporter expression in the lamina, as suggested above, could inform this issue.

      Our data show that the L4-to-L3 conversion in the absence of Bsh and in the presence of Notch activity while the L5-to-L1 conversion in the absence of Bsh and in the absence of Notch activity. Therefore, Notch activity is necessary for the L4-to-L3 conversion. Unfortunately, currently we only have Hey as an available Notch target reporter in new-born neurons. To tackle this challenge in the future, we will profile the genome-binding targets of endogenous Notch in newborn neurons. This will identify novel genes as Notch signaling reporters in neurons for the field.

      ● Otherwise, the analysis of Bsh Loss of Function in L4 might be better suited to be included in the accompanying manuscript that specifically deals with the role of Bsh as a selector gene for L4 and L5.

      That is an interesting suggestion, but without knowing that Bsh + Notch = L4 identity the experiment would be hard to interpret. Note that we took advantage of Notch signaling to trace the cell fate in the absence of Bsh and found the L4-to-L3 conversion (see Figure 5G-K).

      Different chromatin landscape in L4 and L5 neurons

      ● A major concern is that, although L4 and L5 neurons are shown to present different chromatin landscapes (as expected for different neuronal types), it is not demonstrated that this is caused by Notch activity. The paper proves unambiguously that Notch activity, in concert with Bsh, causes the fate choice between L4 and L5. However, that this is caused by Notch creating a differential chromatin landscape is based only in correlation. (NotchON cells having a different profile than NotchOFF). Although the authors are careful not to claim that differential chromatin opening is caused directly by Notch, this is heavily suggested throughout the text and must be toned down.e.g.: Line 294: "With Notch signaling, L4 neurons generate distinct open chromatin landscape" and Line 298: "Our findings propose a model that the unique combination of HDTF and open chromatin landscape (e.g. by Notch signaling)" . These claims are not supported well enough, and alternative hypotheses should be provided in the discussion. An alternative hypothesis could be that LPCs are already specified towards L4 and L5 fates. In this context, different early Bsh targets in each cell type could play a pioneer role generating a differential chromatin landscape.

      We agree and appreciate the comment, it is well justified. We have toned down our comments and clearly state that this is a correlation that needs to be tested for a causal relationship. Thank you for requesting it!

      ● The correlation between open chromatin and Bsh loci with Differentially Expressed genes is much higher for L4 than L5. It is not clear why this is the case, and should be discussed further by the authors.

      We agree, and think in L5 neurons, the secondary HDTF Pdm3 also contributes to L5 specific gene transcription during synaptogenesis window, in addition to Bsh. We will include this in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      In this very strong and interesting paper the authors present a convincing series of experiments that reveal molecular mechanism of neuronal cell type diversification in the nervous system of Drosophila. The authors show that a homeodomain transcription factor, Bsh, fulfills several critical functions - repressing an alternative fate and inducing downstream homeodomain transcription factors with whom Bsh may collaborate to induce L4 and L5 fates (the author's accompanying paper reveals how Bsh can induce two distinct fates). The authors make elegant use of powerful genetic tools and an arsenal of satisfying cell identity markers.

      Thanks!

      I believe that this is an important study because it provides some fundamental insights into the conservation of neuronal diversification programs. It is very satisfying to see that similar organizational principles apply in different organisms to generate cell type diversity. The authors should also be commended for contextualizing their work very well, giving a broad, scholarly background to the problem of neuronal cell type diversification.

      Thanks!

      My one suggestion for the authors is to perhaps address in the Discussion (or experimentally address if they wish) how they reconcile that Bsh is on the one hand: (a) continuously expressed in L4/L4, (b) binding directly to a cohort of terminal effectors that are also continuously expressed but then, on the other hand, is not required for their maintaining L4 fate? A few questions: Is Bsh only NOT required for maintaining Ap expression or is it also NOT required for maintaining other terminal markers of L4? The former could be easily explained - Bsh simply kicks of Ap, Ap then autoregulates, but Bsh and Ap then continuously activate terminal effector genes. The second scenario would require a little more complex mechanism: Bsh binding of targets (with Notch) may open chromatin, but then once that's done, Bsh is no longer needed and Ap alone can continue to express genes. I feel that the authors should be at least discussing this. The postmitotic Bsh removal experiment in which they only checked Ap and depression of other markers is a little unsatisfying without further discussion (or experiments, such as testing terminal L4 markers). I hasten to add that this comment does not take away from my overall appreciation for the depth and quality of the data and the importance of their conclusions.

      Great suggestions, we will discuss these two hypotheses as requested.

      Bsh initiates Ap expression in L4 neurons which then maintain Ap expression independently of Bsh expression, likely through Ap autoregulation. During the synaptogenesis window, Ap expression becomes independent from Bsh expression, but Bsh and Ap are both still required to activate the synapse recognition molecule DIP-beta. Additionally, Bsh also shows putative binding to other L4 identity genes, e.g., those required for neurotransmitter choice, and electrophysiological properties, suggesting Bsh may initiates L4 identity genes as a suite of genes. The mechanism of maintaining identity features (e.g., morphology, synaptic connectivity and functional properties) in the adult remains poorly understood. It is a great question whether primary HDTF Bsh maintains the expression of L4 identity genes in the adult. To test this, in our next project, we will specifically knock out Bsh in L4 neurons of the adult fly and examine the effect on L4 morphology, connectivity and function properties.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors explore the role of the Homeodomain Transcription Factor Bsh in the specification of Lamina neuronal types in the optic lobe of Drosophila. Using the framework of terminal selector genes and compelling data, they investigate whether the same factor that establishes early cell identity is responsible for the acquisition of terminal features of the neuron (i.e., cell connectivity and synaptogenesis).

      Thanks for the positive words!

      The authors convincingly describe the sequential expression and activity of Bsh, termed here as 'primary HDTF', and of Ap in L4 or Pdm3 in L5 as 'secondary HDTFs' during the specification of these two neurons. The study demonstrates the requirement of Bsh to activate either Ap and Pdm3, and therefore to generate the L4 and L5 fates. Moreover, the authors show that in the absence of Bsh, L4 and L5 fates are transformed into a L1 or L3-like fates.

      Thanks!

      Finally, the authors used DamID and Bsh:DamID to profile the open chromatin signature and the Bsh binding sites in L4 neurons at the synaptogenesis stage. This allows the identification of putative Bsh target genes in L4, many of which were also found to be upregulated in L4 in a previous single-cell transcriptomic analysis. Among these genes, the paper focuses on Dip-β, a known regulator of L4 connectivity. They demonstrate that both Bsh and Ap are required for Dip-β, forming a feed-forward loop. Indeed, the loss of Bsh causes abnormal L4 synaptogenesis and therefore defects in several visual behaviors. The authors also propose the intriguing hypothesis that the expression of Bsh expanded the diversity of Lamina neurons from a 3 cell-type state to the current 5 cell-type state in the optic lobe.

      Thanks for the excellent summary of our findings!

      Strengths:

      Overall, this work presents a beautiful practical example of the framework of terminal selectors: Bsh acts hierarchically with Ap or Pdm3 to establish the L4 or L5 cell fates and, at least in L4, participates in the expression of terminal features of the neuron (i.e., synaptogenesis through Dip-β regulation).

      Thanks!

      The hierarchical interactions among Bsh and the activation of Ap and Pdm3 expression in L4 and L5, respectively, are well established experimentally. Using different genetic drivers, the authors show a window of competence during L4 neuron specification during which Bsh activates Ap expression. Later, as the neuron matures, Ap becomes independent of Bsh. This allows the authors to propose a coherent and well-supported model in which Bsh acts as a 'primary' selector that activates the expression of L4-specific (Ap) and L5-specific (Pdm3) 'secondary' selector genes, that together establish neuronal fate.

      Thanks again!

      Importantly, the authors describe a striking cell fate change when Bsh is knocked down from L4/L5 progenitor cells. In such cases, L1 and L3 neurons are generated at the expense of L4 and L5. The paper demonstrates that Bsh in L4/L5 represses Zfh1, which in turn acts as the primary selector for L1/L3 fates. These results point to a model where the acquisition of Bsh during evolution might have provided the grounds for the generation of new cell types, L4 and L5, expanding lamina neuronal diversity for a more refined visual behaviors in flies. This is an intriguing and novel hypothesis that should be tested from an evo-devo standpoint, for instance by identifying a species when L4 and L5 do not exist and/or Bsh is not expressed in L neurons.

      Thanks for the appreciation of our findings!

      To gain insight into how Bsh regulates neuronal fate and terminal features, the authors have profiled the open chromatin landscape and Bsh binding sites in L4 neurons at mid-pupation using the DamID technique. The paper describes a number of genes that have Bsh binding peaks in their regulatory regions and that are differentially expressed in L4 neurons, based on available scRNAseq data. Although the manuscript does not explore this candidate list in depth, many of these genes belong to classes that might explain terminal features of L4 neurons, such as neurotransmitter identity, neuropeptides or cytoskeletal regulators. Interestingly, one of these upregulated genes with a Bsh peak is Dip-β, an immunoglobulin superfamily protein that has been described by previous work from the author's lab to be relevant to establish L4 proper connectivity. This work proves that Bsh and Ap work in a feed-forward loop to regulate Dip-β expression, and therefore to establish normal L4 synapses. Furthermore, Bsh loss of function in L4 causes impairs visual behaviors.

      Thanks for the excellent summary of our findings.

      Weaknesses:

      ● The last paragraph of the introduction is written using rhetorical questions and does not read well. I suggest rewriting it in a more conventional direct style to improve readability.

      We agree, and will update the text as suggested.

      ● A significant concern is the way in which information is conveyed in the Figures. Throughout the paper, understanding of the experimental results is hindered by the lack of information in the Figure headers. Specifically, the genetic driver used for each panel should be adequately noted, together with the age of the brain and the experimental condition. For example, R27G05-Gal4 drives early expression in LPCs and L4/L5, while the 31C06-AD, 34G07-DBD Split-Gal4 combination drives expression in older L4 neurons, and the use of one or the other to drive Bsh-KD has dramatic differences in Ap expression. The indication of the driver used in each panel will facilitate the reader's grasp of the experimental results.

      We agree, and will update the figure annotation.

      ● Bsh role in L4/L5 cell fate:

      o It is not clear whether Tll+/Bsh+ LPCs are the precursors of L4/L5. Morphologically, these cells sit very close to L5, but are much more distant from L4.

      Our current data show L4 and L5 neurons are generated by different LPCs. However, currently we don’t have tools to demonstrate which subset of LPCs generate which lamina neuron type. We are currently working on a followup manuscript on LPC heterogeneity, but those experiments have just barely been started.

      o Somatic CRISPR knockout of Bsh seems to have a weaker phenotype than the knockdown using RNAi. However, in several experiments down the line, the authors use CRISPR-KO rather than RNAi to knock down Bsh activity: it should be explained why the authors made this decision. Alternatively, a null mutant could be used to consolidate the loss of function phenotype, although this is not strictly necessary given that the RNAi is highly efficient and almost completely abolishes Bsh protein.

      The reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We will include this explanation in the text.

      o Line 102: Rephrase "R27G05-Gal4 is expressed in all LPCs and turned off in lamina neurons" to "is turned off as lamina neurons mature", as it is kept on for a significant amount of time after the neurons have already been specified.

      Thanks; we will make that change.

      o Line 121: "(a) that all known lamina neuron markers become independent of Bsh regulation in neurons" is not an accurate statement, as the markers tested were not shown to be dependent on Bsh in the first place.

      Good point. We will rephrase it as “that all known lamina neuron markers are independent of Bsh regulation in neurons”.

      o Lines 129-134: Make explicit that the LPC-Gal4 was used in this experiment. This is especially important here, as these results are opposite to the Bsh Loss of Function in L4 neurons described in the previous section. This will help clarify the window of competence in which Bsh establishes L4/L5 neuronal identities through ap/pdm3 expression.

      Thanks! We will include Gal4 information in the text for every manipulation.

      ● DamID and Bsh binding profile:

      ○ Figure 5 - figure supplement 1C-E: The genotype of the Control in (C) has to be described within the panel. As it is, it can be confused with a wild type brain, when it is in fact a Bsh-KO mutant.

      Great point! Thank you for catching this and we will update it.

      ○ It Is not clear how L4-specific Differentially Expressed Genes were found. Are these genes DEG between Lamina neurons types, or are they upregulated genes with respect to all neuronal clusters? If the latter is the case, it could explain the discrepancy between scRNAseq DEGs and Bsh peaks in L4 neurons.

      We did not use “L4-specific Differentially Expressed Genes”. Instead, we used all genes that are significantly transcribed in L4 neurons (line 209-210).

      ● Dip-β regulation:

      ○ Line 234: It is not clear why CRISPR KO is used in this case, when Bsh-RNAi presents a stronger phenotype.

      As we explained it above, the reason we chose CRISPR-KO (L4-specific Gal4, uas-Cas9, and uas-Bsh-sgRNAs) is that it effectively removed Bsh expression from majority of L4 neurons. However, it failed to knock down Bsh in L4 neurons using L4-split Gal4 and Bsh-RNAi because L4-split Gal4 expression depends on Bsh. We’ll include this explanation in the text.

      ○ Figure 6N-R shows results using LPC-Gal4. It is not clear why this driver was used, as it makes a less accurate comparison with the other panels in the figure, which use L4-Split-Gal4. This discrepancy should be acknowledged and explained, or the experiment repeated with L4-Split-Gal4>Ap-RNAi.

      I think you mean 6J-M shows results using LPC-Gal4. We first tried L4-Split-Gal4>Ap-RNAi but it failed to knock down Ap because L4-Split-Gal4 expression depends on Ap. We will add this to the text.

      ○ Line 271: It is also possible that L4 activity is dispensable for motion detection and only L5 is required.

      Thanks! Work from Tuthill et al, 2013 showed that L5 is not required for any motion detection. We will include this citation in the text.

      ● Discussion: It is necessary to de-emphasize the relevance of HDTFs, or at least acknowledge that other, non-homeodomain TFs, can act as selector genes to determine neuronal identity. By restricting the discussion to HDTFs, it is not mentioned that other classes of TFs could follow the same Primary-Secondary selector activation logic.

      That is a great point, thank you! We will include this in the discussion.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe the synthesis and testing of the anti-cancer activity of a new molecule CK21 against pancreatic cancer mouse models. This part of the study is very strong showing regression of pancreatic tumors at non-toxic concentrations, which is very hard to achieve for practically uncurable pancreatic cancer. Authors synthesized CK21 as an analog of a known inhibitor of RNA synthesis which is very toxic. The authors did very little attempt to understand whether the mechanism of anti-cancer efficacy of CK2 is similar to this known inhibitor of transcription or not. One cannot compare gene expression profiles between untreated and CK21-treated cells, taking into account that CK2 may inhibit the expression of all genes. The effect of CK2 on general transcription needs to be tested first, and then based on this data absolute changes in the expression of genes may be considered for the revealing of the mechanism of activity of CK21.

      We also appreciated the toxicity concerns; thus, we designed the transcriptomic analysis on the human organoid cultured cells for early time points of 3, 6, 9 and 12 h, and with a CK21 concentration of 50nM, to ensure that at the time of harvest, the cells were ~100% viable. At these time points, many genes were upregulated but defined by IPA as enriched for cell death (apoptosis and necrosis), senescence and cell cycle arrest (Fig 5). This led us to hypothesize that the direct effect of CK21 on the tumor cells is the induction of apoptosis, but via multiple pathways.

      Reviewer #3 (Public Review):

      This manuscript describes CK21, a modified version of Triptolide, a natural compound with antcancer activities, to improve its bioavailability. The authors tested the compound in two human pancreatic cancer cell lines, in vitro and in vivo. The authors also use two human organoid lines derived from pancreatic cancer, and mouse KC and KPC cell lines. In all models, CK21 treatment induces dose-dependent cytotoxicity. In vivo, CK21 causes tumor regression. The authors perform gene expression analysis and show that treated organoids have generally lower transcription, consistent with cytotoxicity, and a reduction in the KFkB pathway activation.

      Key experiments that would strengthen the current manuscript are: the inclusion of normal cell lines and organoids, too, presumably, show no cytotoxic effect. If that is the case, the authors would have the opportunity to compare responses and determine whether a tumor-specific mechanism can be defined.

      Our in vivo studies suggest that CK21 is more specific to tumors, as CK21 ≤3 mg/kg treated mice were 100% viable and gained weight comparably to no treatment group (Fig.2d). Furthermore, in vitro studies with primary fibroblast cells indicate that comparable significant toxicity to CK21 after 72h culture was observed at 500 nM (Fig.s2). In contrast, CK21 induced significant toxicity in AsPC1 and Panc-1 cells at 50 nM (Fig. 1f.)

      The authors observe that few gene changes - besides from overall lowering in transcription, occur upon treatment with CK21. They suggest that the drug acts through inhibition of the NFkB pathway and an increase in reactive oxygen species (ROS). However, no experiments to test whether either/both of these findings explain the cytotoxic effect (rescue experiments would be particularly valuable).

      We performed a rescue study using an ROS inhibitor (acetylcysteine) but observed no significant effect (data not shown). We speculate that ROS and/or NF-B might function synergistically; additionally, it is possible that other mechanisms might be involved in the anti-tumor effects of CK21.

      In the last figure, the authors text whether CK21 is immunosuppressive by testing immunity against a mis-matched tumor cell line (using KPC tumors, mixed strain, in mixed strain mice). The immunity against HLA mis-matched cells is a very strong immune reaction, and mild immune suppression might be missed, which diminishes the value of these findings.

      KPC-960 tumor cells were derived from KPC (C57BL/6 background); therefore, KPC-960 tumors were HLA matched with host C57BL/6 mice. We were surprised to observe spontaneous rejection of the KPC-960 tumor line, since this contrasts with Torres et al. 2013. We speculate that this could be due to the increased number of passages resulting in antigenic drift, which may result in the accumulation of mutations that induce spontaneous rejection.

      We agree that there might be mild immunosuppression that we did not detect; we have included this caveat in the discussion. KC-6141 tumor cells used as CTL targets were from KC mice (mixed background – B6.129).

    1. Author Response

      Reviewer #1 (Public Review):

      Zhao et al. investigated the molecular nature of the binding site for carbohydrates within the UDP-sugars known to activate the P2Y14 receptor. In order to do so, they built a molecular model of the hP2Y14, docked the corresponding agonists, and performed MD simulation on the resulting complexes. The modeling was used to identify the key molecular interactions with a cluster of charged residues in the extracellular side of the TM region of the receptor, which they show are conserved within the P2Y receptors. The binding site of the UDP region was, not surprisingly, overlapping with the analogous ADP binding site experimentally observed for the P2Y12 receptor, and consequently, the region that recognizes the sugars could be anticipated. Nevertheless, the detailed modeling and simulation work shows the consistency of this hypothesis and provides a quantification of the particular interactions involved, pinpointing specifically the residues candidate to be involved in the recognition of sugars.

      It follows the characterization, by functional assays, of the effect of single-point mutations of these residues in the efficacy of the different UDP-sugars. Here the results show a tendency to correlate with the molecular models, however some of the data has very low statistical significance and consequently the interpretation and conclusions extracted from this data should be taken with caution. This pertains to the particular role of the identified residues in the binding of the different sugars, which in some cases should be taken as a suggestion rather than a proof, though the general conclusion of the identification of the binding region for the sugar, its conservation among P2Y receptors and the role of some specific residues in sugar recognition seems convincing and the data are conveniently presented.

      Finally, the design of ADP-sugars that activate the P2Y12 receptor, based on the transferability of the observations with the UDP-sugars for the P2Y14 receptor, is a first indication that such a recognition is possible and should happen in an analogous binding region. However, the low potencies exhibited by the ADP-sugars, in the micromolar range, are too far from the ADP agonist and the relevance of this mechanism remains to be proved. The difference between P2Y12 and P2Y14, with the last one showing much higher potencies for UDP-sugar derivatives than P2Y12 for the corresponding ADP-sugars, remains an interesting question not explored in this manuscript.

      Thanks for your valuable comments. We have revised the interpretation of the data that has relatively low statistical significance in the manuscript. The conclusions extracted from this data have also been modified as suggestions. In this work, to investigate whether sugar nucleotides can also activate human P2Y12, we tested three ADP-sugars for human P2Y12. Discovery of highly potent P2Y12 agonists requires screening of a large number of compounds. It is possible there are the other ADP-sugars, which are highly potent P2Y12 agonists. It is technically challenging to synthesize ADP-sugars. Currently, we can only obtain ADP-Glc, ADP-GlcA and ADP-Man. Once the other ADP-sugars are available for us, we will test them and try to discover highly potent agonists in the future work. The highly potent agonists will be useful chemical tools to unveil the relevance mechanism of P2Y12. To explore the nature of binding site of the P2Y12 and P2Y14, we performed more experiments of mutagenesis study and added relevant data in the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript employs multiple approaches, including molecular docking, molecular dynamic simulations, and functional experiments to uncover a distinct uridine diphosphate-sugar-binding site on P2Y14 - a key drug target for inflammation and immune responses. Overall, the manuscript is clearly written, and the experimental techniques are well-documented. However, it may benefit from further analysis, particularly in terms of validating the binding pose.

      Thanks for your comments. We used MMPBSA to analyze the ligand-binding energy for each receptor residue using MD trajectories. To further characterize the ligand-binding pose, we calculated the percentage of occurrence of hydrogen binding between the ligand and the carbohydrate-binding site (K277, E278, R253 and K77). We also calculated the ligand RMSF and ligand RMSD to show the stability of the ligand-binding pose and the simulation convergence. These data have been included in the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      Seeking a selective inhibitor that precisely inhibits on-target activities and avoids side effects is a major challenge in the field of drug discovery and therapeutics. The authors proposed an alternative method that combines multiple inhibitors to maximize on-target inhibition and minimize off-target inhibition. Focusing on the kinase-inhibitor interaction dataset, the authors developed a quantitative way to measure the selectivity for mixtures of inhibitors by using the Jenson-Sahannon distance metric. The method sounds technical.

      From their computation and assays, the multi-compound-multitarget scoring (MMS) method framework was validated to be able to select a combination of inhibitors that is more selective than a single highly selective inhibitor for one kinase target, or for multiple targets. The MMS method is a promising solution to reduce off-target effects and could be applicable to other inhibitor-target interactions. My suggestion is that a comparative analysis of MMS with other similar methods can be conducted to highlight the advantage of MMS over others.

      We thank the reviewer for this excellent summary and their suggestions. We agree that comparing new methods to prior ones is an important step in benchmarking new approaches and methods. However, to our knowledge, no other method exists for calculating selective combinations of kinase inhibitors. We compare our JSD selectivity scoring metric to other representative target-specific and non target-specific selectivity metrics (Figure 2 Figure Supplement 2).

      The paper is not well organized and not easily readable. For example, first, the captions of the figures are two long; some of these texts could be moved to methods or results sections. Second, the concept of "penalty distribution" or "penalty prior" is vital to understand the MMS method, thus, at least a brief definition and introduction should be put in the main text rather than supporting method, as well as the rationale to use it. Third, the method section can be divided into several subsections with clear organizations and connections. Fourth, what is the difference between "a less selective inhibitor profile" and "an even less selective inhibitor profile" in Figure 3? Overall, the details of the paper are difficult to understand in the current version. I suggest rewriting the paper in a more concise and logical style.

      We appreciate these suggestions and have significantly edited and revised our manuscript in order to facilitate clear communication. Specifically:

      1) We have added an additional description of the penalty distribution to the description of the MMS method in the main Results section of the manuscript as opposed to solely in the Materials and Methods section.

      2) We have provided a high-level concise summary of the MMS method in the results section in order to help orient a reader to the method. This description follows the same order (1 to 5) as the associated Figure 2, we hope this helps more clearly communicate the method.

      3) We have moved descriptive figure captions to the methods section and, in general, substantially reduce the size of figure captions.

      4) We have subdivided the Materials and Methods section as suggested.

      5) We now describe in our main text how the simulated profiles were generated by smoothing the PKIS2645-like profile with two restraints; non-zero activity for LS inhibitors, and similar on-target probability for PKIS2-645-like, RS, and LS inhibitors to facilitate direct comparisons. We provide a new figure to quantify the selectivity of these simulated inhibitors and their similarity with true compounds (Figure 3 Figure Supplement 1).

      6) We have removed content from the introduction and results sections that was less important to communicate to a general audience in order to make the manuscript more concise. We have also removed or condensed extraneous supplemental figures that were not required to communicate the central results and findings of experiments (ex: supplemental figures for Figure 3 and Figure 4 from the prior submission).

    1. Author Response

      Joint Public Review

      (1) The developed model considers the interaction of multiple signaling networks that are essential for morphogenesis and homeostasis in the intestinal tissue, as well as other elements that had been proposed as relevant in the literature. Nevertheless, the details of how these interactions are modeled couldn't be evaluated in the current revision as the model was not shared with the reviewers and it is not available yet online, nor specified in any detail in the current manuscript. Additionally, how quantitative information from Wnt and BMP signaling pathways is incorporated in a quantitative way in the model is not clear.

      Model files are provided with this reply. These are ‘.jl’ files for use with Julia. The model (the files provided with this reply) will be freely publicly available through BioModels upon acceptance of this manuscript for publication.

      The model includes abstracted values to reproduce Wnt and BMP signalling gradients and their effect on cell proliferation and differentiation to generate the three-dimensional crypt spatial cell distribution. To further clarify the implementation of the quantitative information from Wnt and BMP signalling pathways in the model, we have added the following paragraph in the Appendix Section 8) Cell fate: proliferation, differentiation, arrest, apoptosis

      "…During this migration the Wnt content in absorptive progenitors is halved in each division and, away from Wnt sources, progressively decreases, while BMP signals increase, towards the villus. In our model, differentiation into enterocytes occurs when progenitors encounter a BMP signal level, higher that their Wnt signal content. For instance, in the ileal crypt in homeostasis this occurs approximately at cell position 16 from the crypt base, where progenitors migrating from the stem cell niche reach a reduced content of Wnt signals of about 8 a.u. On the other hand, the BMP signalling level has a maximum value of 64 at approximately cell position 23 from the crypt base, where BMP signals are generated by mature enterocytes. These BMP signals diffuse towards the crypt base and, hence, decrease exponentially to reach values of 8 a.u. at approximately position 16, which, hence, enable differentiation into enterocytes. Epithelial injuries resulting in a decreased number of enterocytes reduce BMP signal production and its diffusion range which results in the enlargement of the proliferation compartment as cells encounter the required level of BMP signals for differentiation only at higher positions in the crypt."

      (2) Some conclusions by the authors are not properly justified in the text, as "Paneth cells are the main driver behind the differential mechanical environment in the niche", "Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche", the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length, and "their recovery [absorptive progenitors] started before the end of the treatment, driven by a negative feedback loop from mature enterocytes to their progenitors".

      We have reworded these statements as described below.

      The paragraph “Paneth cells are the main driver behind the differential mechanical environment in the niche, where cells with longer cycles accumulate more Wnt and Notch signals. In agreement with experimental reports {Pin, 2015 #719}, in our model Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the region” has been modified and now reads as follows “In agreement with experimental reports {Pin, 2015 #719}, Paneth cells are assumed to be stiffer and larger than other epithelial cells, requiring higher forces to be displaced and generating high intercellular pressure in the niche. Due to this increased mechanical pressure, cells in the niche have longer division cycles and can accumulate more Wnt and Notch signals.”

      The sentence “Wnt-mediated feedback loop prevents the uncontrolled expansion of the niche” has been deleted from paragraph, that now reads “To generate a niche of stable size, we implemented a negative Wnt-mediated feedback loop that resembles the reported stem cell production of RNF43/ZNRF3 ligands to increase the turnover of Wnt receptors in nearby cells {Hao, 2012 #2086;Koo, 2012 #2089;Clevers, 2013 #538;Clevers, 2013 #2098}. Similarly, in our model, a number of stem cells in excess of the homeostatic value reduces cell tethering of Wnt ligands and hence inhibits Paneth and stem cell generation (Figures 1A-B).”

      Regarding the specific effect of p27 in contrast with Wee1 phosphorylation over the cell cycle length. We have simplified the text in the main manuscript that now reads “Using the model of Csikasz-Nagy et al. {Csikasz-Nagy, 2006 #1870}, we modulated the duration of G1 through the production rate of the p27 protein. The p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and the beginning of S-phase {Morgan, 2007 #2073}. We, hence, hypothesized that rapid cycling absorptive progenitors located in regions of low mechanical pressure outside the stem cell niche have low levels of p27, which bring forward the start of S-phase to shorten G1 (Figures 2D). In support of this hypothesis, it has been demonstrated that p27 inhibition has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074} (see the Appendix for a full description).

      In the Appendix Section 2 we provide an extended explanation of the use of the p27 and Wee1 kinetic governing parameters to decrease the length of the cell cycle by decreasing mainly G1 but maintaining the length of S phase constant, which is as follows

      "Regarding G1 phase, the p27 protein has been reported to regulate the duration of G1 by preventing the activation of Cyclin E-Cdk2 which induces DNA replication and defines the beginning of S-phase {Morgan, 2007 #2073}. We hypothesized that fast cycling cells have low levels of p27 which result in earlier DNA replication, bringing forward the start of S-phase and shortening the length of G1. In support of this hypothesis, it has been experimentally demonstrated that inhibiting p27 has no effect on the proliferation of absorptive progenitors {Zheng, 2008 #2074}. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, the duration of G1 can be modulated through the parameter V_si, which is the basal production rate of p21/p27 (in the Csikasz-Nagy model, the p21 and p27 proteins are represented by a single variable, here we refer to that model quantity as p21/p27).

      Additionally, the end of S-phase is associated with the decrease of Wee1 to basal levels due to Cdc14 mediated phosphorylation of Wee1. In the Csikasz-Nagy model {Csikasz-Nagy, 2006 #1870}, this reaction is described by a Goldbeter-Koshland function, which includes the parameter KA_Wee1p to regulate the level of Cdc14 required for the phosphorylation of Wee1.

      Therefore, we modified these two parameters, V_si and KA_Wee1p, to ensure that variations of the cycle duration mostly impact on G1 while the length of S phase remains constant. We assumed that the value of the two parameters scales linearly with the duration of the division cycle, t_cycle, between a lower and upper bound, which prevent aberrant behaviour of the cell cycle model in the dynamically changing conditions of the crypt."

      The paragraph related to “their recovery started before the end of the treatment…” sentence has been amended in the text and now reads “Simulated proliferative absorptive progenitors were indirectly affected by stem cell ablation and their decrease was followed by a reduction in mature enterocytes. The progenitors recovered soon after treatment interruption to later reach values above baseline when responding to the negative feedback signalling from mature enterocytes (Figure 3A).”

      (3) Only the results of the "main" model are shown, with no information about its sensitivity to parameter values, and how their conclusions depend on specific decisions on the model. For example, the authors said that "an optimal crypt cell composition is achieved when BMP and Wnt differentiation thresholds result in progenitors dividing approximately four times before differentiating into enterocytes", but the results of alternative scenarios are not shown.

      To address this comment, we have included a new section in the Appendix, called “What-if Analysis”, and new figures (Figure S4-S8) with simulations of alternative scenarios affecting the main signalling pathways that govern crypt composition, in particular, we simulated stronger and weaker Wnt, BMP, Notch and ZNRF3/RNF43 signalling.

      We attach the new section here:

      "10) What-if Analysis

      We investigated the effect on the simulated crypt of increasing and decreasing the strength of the main signalling pathways, Wnt, BMP and ZNRF3/RNF43 signalling, and modifying the Notch thresholds. For each alternative parameterisation, except when decreasing ZNRF3/RNF43 signalling, the simulation was run for 30 days to ensure stability was reached with the new parameter set and the final 10 days were included in the analysis. When decreasing ZNRF3/RNF43 signalling, we simulated 60 days to demonstrate the expansion of the niche and analysed the final 10 days. The reference parameter set used as baseline was the ileal mouse crypt parameter set reported in Appendix Table 1. In all cases, we only consider modifications of one signalling mechanism at a time.

      To study alternative Wnt signalling scenarios, we used the WntRange parameter (Appendix Table 1), to double and halve the spreading area of Wnt signals emitted by Paneth cells while we maintained the original WntRange value for Wnt-emitting mesenchymal cells at the bottom of the crypt (Appendix Section 7.1) (Figures S4A-S4F). When WntRange was doubled, we observed increased number of stem and Paneth cells in a noticeably enlarged niche (Figures S4C-S4D), with cells choosing the stem cell fate instead of differentiating into absorptive progenitors. On the other hand, decreasing Wnt signalling, by halving WntRange in Paneth cells but maintaining its homeostatic value in mesenchymal cells, resulted in no apparent changes in the niche cell composition (Figures S4E-S4F) which resembled published experimental results of persisting functional stem cells after Paneth cell ablation {Durand, 2012 #434}.

      The ZNRF3/RNF43-mediated negative feedback mechanism regulates the size of the niche by modulating Wnt signalling. We simulated increasing and decreasing the strength of the ZNRF3/RNF43, by doubling and halving, respectively, the parameter Z described in the Appendix Section 7.2 (Figures S5A-S5F). Following the increase of the intensity of ZNRF3/RNF43 signalling, we observed a decrease in the number of stem and Paneth cells together with relatively minor changes in the transit-amplifying region (Figures S5C-S5D). On the other hand, when decreasing ZNRF3/RNF43 signalling levels, the niche expanded , resulting in a crypt dominated by Paneth and stem cells (Figures S5E-S5F ) which replicates reported experimental phenotypes {Koo, 2012 #2089}.

      To modify Notch signalling, we increased and decreased by 1 A.U. the Notch threshold required for lateral inhibition (Figures S6A-S6F). This Notch signalling threshold determines the number of contacting Notch-secreting cells (secretory lineage) to inhibit the differentiation of stem cells into the secretory lineage. Thus, increasing this Notch threshold enhances the production of secretory cells leading to the increase of Paneth, goblet and enteroendocrine cells (Figure S6C-S6D). Alternatively, decreasing the Notch threshold enhances differentiation into the absorptive lineage, reducing the number of Paneth and secretory cells (Figure S6E-S6F).

      We modified the range of diffusion of BMP signals by doubling and halving the parameter A , (Figures S7A-S7F) which denotes the amount of diffusing BMP signals towards the base of the crypt (Appendix Section 7.4). When we increased the BMP signalling range, enterocytes differentiated at lower crypt positions effectively reducing the transit-amplifying zone (Figure S7A, Figure S7B). Decreasing BMP signalling strength by halving A resulted in the increase of proliferative absorptive progenitors, which reach higher positions in the crypt (Figure S7C-S7D). The niche was largely unaffected in both cases (Figure S7E-S7F)."

      (4) Regarding the construction of the model, the authors used "counts of Ki-67 positive cells recorded by position" while the original data reported "overall cell counts per crypt and villus". Some explanation about how this conversion was made, why it is valid, as well as any potential problems, is needed. Additionally, the model is based on experiments done by others in mouse models; the similarity to the response in human intestinal crypts is not discussed.

      Ki-67 immunostaining data during 5-FU treatment was derived from the same experiments. The overall cell counts per crypt and villus are published in {Jardi, 2022 #2416}. For this manuscript, we reanalysed the intestinal samples to estimate counts of cell types by position in the crypt.

      We have clarified the text, which now reads …“The samples from this later study {Jardi, 2022 #2416} were analysed again to count Ki-67 positive cells at each position along the longitudinal crypt axis, for 30-50 individual hemi crypt units per tissue section per mouse as previously described {Williams, 2016 #2165}.”

      We agree that the understanding of the translation of results derived from animal models into a human or clinical context is of high relevance. The mouse crypt is a model of choice to study epithelial biology and exhibits remarkable similarities with the human crypt. In our team, we are focussed on developing translational modelling strategies and have a version of the model that describes a human crypt. That model assumes mostly conserved crypt biology and structure across species and includes changes in parameter values needed to compensate reported differences in morphometrics and cell cycle duration. Due to the relevance and extent of this translational work, we chose to focus on the mouse crypt entirely in this manuscript. We think the translational modelling strategy to explore the quantitative translation between human and mouse and/or other species/settings merits a full report.

      (5) The authors imply that their mathematical model of the intestinal crypt is an improvement over those already published but there is no direct comparison or review of the literature to substantiate this claim.

      An extended literature review including more details of previous ABMs to enable a direct comparison with our model is now included in the manuscript and reads as follows:

      “Several agent-based models (ABMs) have been proposed to describe the complexity and dynamic nature of the intestinal crypt. Early models were used as in silico platforms to study the dynamics and cellular organisation of the crypt. For instance, one of the pioneering ABMs was used to study the distribution and organisation of labelling and mitotic indices {Meineke, 2001 #326}. This model comprises a fixed ring of Paneth cells beneath a row of stem cells, which divide asymmetrically to produce a stem cell and a transit-amplifying cell that terminally differentiates after a fixed number of divisions. Some subsequent models are lattice-free, recapitulate neutral drift of equipotent stem cells and describe proliferation and cell fate regulated by a fixed Wnt signalling spatial gradient, which is defined by the distance from the crypt base, with proliferating cells progressing through discrete phases of the cell cycle and showing variable duration of the G1 phase {Pitt-Francis, 2009 #129}. Further model refinements can be seen in the model of Buske et al (2011), with stochastic cell growth and division time {Buske, 2011 #1}, Wnt levels defined by the fixed local curvature of the crypt and lateral inhibition driven by Notch signalling. Here, we present a lattice-free agent-based model that describes the spatiotemporal dynamics of single cells in the small intestinal crypt driven by the interaction of surface tethered Wnt signals, cell-cell Notch signalling, BMP diffusive signals, RNF43/ZNRF3-mediated feedback mechanisms and the cycle protein network responding to the crypt mechanical environment. We show that our computational model enables the simulation of the ablation and recovery of the stem cell niche as well as of how drug-induced molecular perturbations trigger a cascade of disruptive events spanning from the cell cycle to single cell arrest and/or apoptosis, altered cell migration and turnover and ultimately loss of epithelial integrity.”

      (6) The authors claim that the simulated data and the available mouse data match up. Nevertheless, the data vs the model still appear both quantitatively and qualitatively different (as presented in Figures 2E, F, and 5C, D). This puts in doubt how much the model can actually reproduce the experimental data. In conclusion, the model would benefit from further refinement, particularly if the goal is to use the model for predicting the dynamics of oncogenic drug candidates.

      To address this comment, we have made several adjustments: we refined the counting algorithm that determines cell position and improved the Ki67 and BrdU staining simulations by modifying the simulated staining criteria and adding an estimation of the experimental error to the simulated responses. A description of these changes is described in a new section in the appendix called “ABM simulation of Ki-67 and BrdU Staining”

      With these changes we think we have achieved a more satisfactory agreement between observed and predicted results and updated all figures with Ki67 and BrdU staining simulated results.

    1. Author Response

      We are grateful to the editors and the reviewers for the thorough evaluation of our manuscript and their feedback, as it allows us to provide additional clarification of our findings and improve the manuscript.

      In their evaluation reviewers raised a key conceptual point linked to the inhibitory mechanism that appeared to be insufficiently explained in the manuscript, leading to a misconception regarding the physiological relevance. They have also missed experimental data related to the concentrations of Aβ used and their relevance for Alzheimer’s disease (AD). We believe that our studies, although performed in vitro in model systems, provide novel conceptual framework and shed light on the unexplored mechanisms underlying AD.

      We discuss these points below in a provisional response to their comments.

      Reviewer #1 (Public Review):

      Summary:

      Human Abeta42 inhibits gamma-secretase activity in biochemical assays.

      Strengths:

      Determination of inhibitory concentration human Abeta42 on gamma-secretase activity in biochemical assays.

      Weaknesses:

      Human Abeta42 may concentrate up to microM order in endosomes.

      This is correct.

      If so, production of Abeta42 would be attenuated then lead to less Abeta deposition in the brain. The authors finding is interesting but does not fit the physiological condition in the brain.

      We thank the reviewer for raising this key conceptual point, as this gives us the opportunity to clarify it for the future readers.

      The characterized inhibitory mechanism is more complex than the reviewer’s interpretation, and a number of factors must be considered. Indeed, our data show that Aβ42 upon intracellular concentration inhibits γ-secretase activity, resulting in increased γ-secretase substrate (C-terminal fragment, CTF) levels. It is important however to highlight that this inhibition is competitive in nature, implying that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the substrates. The model that we put forward is that cellular uptake and intracellular concentration of Aβ42 facilitates γ-secretase inhibition, which results in the accumulation of APP-CTFs (and γ-secretase substrates in general). However, as Aβ42 levels fall, the increased concentration of substrates shifts the equilibrium towards their processing and Aβ production. As Aβ42 concentration raises again, equilibrium is shifted back towards inhibition and so on. This inhibitory mechanism will translate into pulses of (partial) γ-secretase inhibition, which will alter γ-secretase mediated signalling (arising from increased CTF levels or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signalling, implicated in memory formation (2), and potentially others (related to e.g. cadherins, p75 or neuregulins).

      It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor (semagacestat) have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (2, 3); and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (Koch et al, 2023). We will include this clarification in the discussion of the revised manuscript and create an additional figure presenting the proposed mechanism.

      It is not clear whether the FRET-based assay in living cells really reflect gamma-secretase activity.

      The specificity of this assay is supported by the γ-secretase inhibitor treatment included as a positive control (Figure 3). In addition, the following literature supports that this assay truthfully assesses γ-secretase activity in cellular context (4-7).

      Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase.

      This is correct, and therefore we have analysed the contribution of other APP-CTF degradation pathways by performing cycloheximide-based stability assay in the presence of γ-secretase inhibitor. Quantitative analysis of the levels of both APP-CTFs and APP-FL over the 5h time-course failed to reveal significant differences between Aβ42 treated cells and controls. As expected, Bafilomycin A1 treatment markedly prolonged the half-life of both proteins (Figure 7B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γ-secretase inhibition is consistent with the proposed inhibitory mechanism. Finally, we note that the inhibition will not only affect APP-CTF, but also the processing of γ-secretase substrates in general.

      Reviewer #2 (Public Review):

      Summary:

      In the current study, the authors tested the hypothesis that Aβ42 toxicity arises from its proven affinity for γ-secretases. Specifically, the increases in Aβ42, particularly in the endolysosomal compartment, promote the establishment of a product feedback inhibitory mechanism on γ-secretases, and thereby impair downstream signaling events. They showed that human Aβ42 peptides, but neither murine Aβ42 nor human Aβ17-42 (p3), inhibit γ-secretases and trigger accumulation of unprocessed substrates in neurons, including (CTFs of APP, p75 and pan-cadherin. Moreover, Aβ42 dysregulated cellular homeostasis by inducing p75-dependent neuronal death. Because γ-secretases process many other membrane proteins, including NOTCH, ERB-B2 receptor tyrosine kinase 4 (ERBB4), N-cadherin (NCAD) and p75 neurotrophin receptor (p75-NTR), revealing a broad range of downstream signaling pathways, including those critical for neuronal structure and function. Hence, they propose to identification of a selective role for the Aβ42 peptide, and raise the intriguing possibility that compromised γ-secretase activity against the CTFs of APP and/or other neuronal substrates contributes to the pathogenesis of AD. Overall, the data are not very convincing to support the main claim.

      Strengths.

      Different in vitro and cellular approaches are employed to test the hypothesis.

      Weaknesses.

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 4G). Treatment with this conditioned medium led to the increase APP-CTF levels, supporting that low nM concentrations of Aβ are sufficient for partial inhibition of γ-secretase.

      We would like to underline that Aβ is estimated to be present in the brain in concentration ranging from fM to mM, depending on the pool (soluble, aggregated, fibrillar, etc) that is considered (8, 9). However, it is rather the local than the global concentration of Aβ that is critical for the disease pathogenesis. In this regard, it is proposed that as AD progresses Aβ42 slowly accumulates in the endo-lysosomal system wherein it reaches μM concentrations that are required for aggregation and seeding (1, 10, 11). Our findings are consistent with the analysis showing that extracellular soluble Aβ42 peptide, at low nM concentrations, is taken up by cortical neurons and neuroblastoma (SH-SY5Y) cells, and concentrated in the endo-lysosomal system wherein effective peptide concentrations reach ~2.5 μM (1). Hence, a slow vesicular peptide accumulation and/or degradation imbalance (1, 11, 12) could lead to several order of magnitude increases in the effective concentration of Aβ42 over the span of years to decades in AD pathogenesis. We note that our experimental settings, using low μM concentrations of extracellular Aβ42 over 24h treatment, were designed to accelerate this 'peptide concentration’ process in vitro. As discussed in our report, a high μM Aβ peptide concentration in the endo-lysosomal system not only leads to aggregation but also facilitates γ-secretase inhibition. Of note, we are currently developing protocols and will undertake follow up studies to quantitatively define the Aβ concentration in synaptosomes and endosomes in AD brain, as well as in in vitro systems (i.e. cells treated with Aβ preparations obtained from AD brains).

      Finally, we would like to highlight that analyses of the brains of the AD affected individuals have shown that APP-CTFs accumulate in both sporadic and genetic forms of the disease (13-15); and recently, Ferrer-Raventós et al have revealed a correlation between APP-CTFs and Aβ levels at the synapse (13).

      To conclude, we would like to highlight that as clarified above, the Aβ peptide concentrations and the conditions tested fit well within pathophysiology, and that the data presented in our report collectively provide evidence in support of an Aβ42-mediated inhibitory effect on γ-secretase.

      References:

      1. X. Hu et al., Amyloid seeds formed by cellular uptake, concentration, and aggregation of the amyloid-beta peptide. Proc Natl Acad Sci U S A 106, 20324-20329 (2009).
      2. B. De Strooper, Lessons from a failed γ-secretase Alzheimer trial. Cell 159, 721-726 (2014).
      3. R. S. Doody et al., A phase 3 trial of semagacestat for treatment of Alzheimer's disease. N Engl J Med 369, 341-350 (2013).
      4. M. C. Houser et al., A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel) 20, (2020).
      5. M. C. Q. Houser et al., Limited Substrate Specificity of PS/γ-Secretase Is Supported by Novel Multiplexed FRET Analysis in Live Cells. Biosensors (Basel) 11, (2021).
      6. M. Maesako et al., Visualization of PS/γ-Secretase Activity in Living Cells. iScience 23, 101139 (2020).
      7. M. Maesako, M. C. Q. Houser, Y. Turchyna, M. S. Wolfe, O. Berezovska, Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci 42, 145-154 (2022).
      8. B. R. Roberts et al., Biochemically-defined pools of amyloid-β in sporadic Alzheimer's disease: correlation with amyloid PET. Brain 140, 1486-1498 (2017).
      9. J. A. Raskatov, What Is the "Relevant" Amyloid β42 Concentration? Chembiochem 20, 1725-1726 (2019).
      10. M. P. Schützmann et al., Endo-lysosomal Aβ concentration and pH trigger formation of Aβ oligomers that potently induce Tau missorting. Nat Commun 12, 4634 (2021).
      11. E. Wesén, G. D. M. Jeffries, M. Matson Dzebo, E. K. Esbjörner, Endocytic uptake of monomeric amyloid-β peptides is clathrin- and dynamin-independent and results in selective accumulation of Aβ(1-42) compared to Aβ(1-40). Sci Rep 7, 2021 (2017).
      12. M. F. Knauer, B. Soreghan, D. Burdick, J. Kosmoski, C. G. Glabe, Intracellular accumulation and resistance to degradation of the Alzheimer amyloid A4/beta protein. Proc Natl Acad Sci U S A 89, 7437-7441 (1992).
      13. P. Ferrer-Raventós et al., Amyloid precursor protein Neuropathol Appl Neurobiol 49, e12879 (2023).
      14. M. Pera et al., Distinct patterns of APP processing in the CNS in autosomal-dominant and sporadic Alzheimer disease. Acta Neuropathol 125, 201-213 (2013).
      15. L. Vaillant-Beuchot et al., Accumulation of amyloid precursor protein C-terminal fragments triggers mitochondrial structure, function, and mitophagy defects in Alzheimer's disease models and human brains. Acta Neuropathol 141, 39-65 (2021).